Wednesday, November 02, 2016

The Prediction Market Paradox

There’s a reason why campaigns are eager to publicize polls that show them ahead, while downplaying those in which they happen to be trailing. The perception that a candidate is losing can depress donations and volunteer effort, and lower morale and turnout among supporters. Hence polls that show tightening of a race are often advertised as indicators of momentum by the trailing party, and as outliers by the leader. The actual likelihood of victory is not independent of beliefs about this likelihood.

This gives rise to what might be called a prediction market paradox. If prices are widely believed to accurately reflect underlying probabilities, then there is an incentive for deep-pocketed partisans to try and manipulate these prices at the margin. But if the possibility of manipulation is salient and prices are treated with skepticism, then incentives to manipulate are weakened and prices will in fact be quite accurate reflections of underlying beliefs.

An interesting illustration of this phenomenon is  the recent decision by PredictIt to post an electoral college map, updated by the minute, that aggregates probabilities derived from all its state level markets. Here's what the map looks like at the moment:


There are seven categories: the safe, likely, and leaning states for each candidate and one toss-up category. States shift across categories as prediction market prices cross the relevant thresholds. This way, a broad range of probability assessments is mapped onto a much coarser set that is easy to visualize and process.

But this creates the possibility that small changes in price, of the order of one cent, can lead to reassignments across categories that generate a very different picture. The incentives to manipulate prices is amplified whenever such categorical switches are feasible.

Of course these incentives apply to both sides of the market, with some traders wishing to shift states to the left while others are pushing to the right. As a result, an unusually large number of states may be expected to bounce back and forth across boundaries, and to remain within a narrow band of prices close to those selected (somewhat arbitrarily) by the exchange as thresholds.

This seems to be what we are seeing. The boundary between the lean and likely Clinton states is determined by a 75% threshold, and we see four states (Wisconsin, Michigan, Colorado, and Pennsylvania) all within a point or two of this. Here are those above the threshold:


And those below:


New Hampshire is not far from the boundary either. 

All this could be just coincidence, but if one looks at probabilistic forecasts from other sources, there is no such pattern. The New York Times conveniently collects six probabilistic forecasts including it's own, with the current picture looking like this:


These forecasts (from the Times, FiveThirtyEight, Huffington Post, Predictwise, Princeton Election Consortium and Daily Kos respectively) don't appear to be clustered around the PredictIt thresholds at all.

Still, the evidence is anecdotal at best, and a proper analysis would have to look for a discontinuity in prices around the time that the map was created, with a clustering of prices around boundary points that could not be accounted for by random chance alone. 

Meanwhile, some caution is probably warranted in interpreting prediction market data. This is a case in which the ease of visualization, aggregation and dissemination of data can have an impact on the underlying measurements themselves, and indeed on the objective probabilities that the measures are intended to reflect.

Friday, September 23, 2016

Thine Every Flaw

There’s a verse in America the Beautiful that I absolutely adore; it represents for me the very best traditions of my adopted country:
America! America!
God mend thine ev’ry flaw,
Confirm thy soul in self-control,
Thy liberty in law.
I’ve been thinking about these words a lot over the past year or so, as the election season has revealed just how divided and how lacking in common purpose we are as a nation.

It's glaringly obvious that international trade, migration, and technological progress have brought enormous benefits to many of us. Our handheld devices are more powerful than the computers that launched our first satellites into orbit. Our system of higher education remains a magnet for eager students from every corner of the world, in part because we have attracted and retained the finest research talent. We are on the verge of a revolution in transportation and urban form as driverless cars make their presence felt. Our cultural products—movies and music among them—continue to attract strong global demand. And our Olympic medal winners encompass many different identities, religions, and countries of origin.

But globalization and technological progress have also left in their wake economic devastation and social disintegration across large swathes of the country that were previously prosperous and stable. The kind of deprivation once confined to inner cities—and tolerated for decades by the rest of society—is now pervasive in once-thriving industrial areas. In his recent and acclaimed memoir, JD Vance laments the decline of Middletown, Ohio from a proud and bustling steel town to "a relic of American industrial glory," with abandoned shops and broken windows, derelict homes, druggies and dealers, and places to be avoided after dark.

Anne Case and Angus Deaton have reported a startling increase in midlife mortality among white Americans without a college degree, "largely accounted for by increasing death rates from drug and alcohol poisonings, suicide, and chronic liver diseases and cirrhosis." Stratification by sex reveals that this phenomenon has hit white working class women especially hard. Trends in criminal justice tell a similar story: the incarceration rate for white women has risen by a staggering fifty percent since 2000, while that for black women has fallen more than 30%. Similar, but much less striking trends are in evidence for males.

All this has led to what Dani Rodrik calls the politics of anger. In its American incarnation, this anger has lifted to the helm of a major political party a man who has apparent contempt for the greatest of our traditions: due process even for those accused of the most heinous crimes, the prohibition of cruel and unusual punishment, and freedom from discrimination on the basis of religion or race. He lacks the self-control for which the verse above pleads, and his appeal to liberty and law is opportunistic and entirely self-serving.

This has been too much for some in his own party to stomach. Meg Whitman, a Republican candidate for Governor of California as recently as 2010, has been actively campaigning for Hillary Clinton. And if unconfirmed reports are to be believed, former president George H.W. Bush intends to vote for her too.

But even if we manage to dodge this bullet in November, the conditions that have fueled Trump's rise will remain in place, and the anger will intensify rather than abate. Something has got to be done to prevent our social fabric from fraying further. But what?

Perhaps protectionist and exclusionary policies can provide some measure of short term relief, but much of the dislocation that results from globalization is also a consequence of technological progress, and giving up on the latter is a recipe for economic suicide. Targeted interventions that support retraining and transition to growing sectors of the economy have to be part of the solution, but these are piecemeal efforts with varying effectiveness and the potential for bureaucratic mismanagement.

An alternative approach is to target inequality and poverty directly, through cash transfer schemes such as a universal basic income or a negative income tax. But payments such as these are not contingent on the performance of the economy as a whole, and therefore provide no incentives for people to support policies that are beneficial in the aggregate but impose costs on them as individuals.

What we need is a distributive mechanism that allows for all to benefit when the country benefits. Debraj Ray has recently proposed something along these lines, a universal basic share. This is simply a share of nominal GDP,  the value of which will ebb and flow with the nation's aggregate income. Aside from some obvious advantages relative to a basic income, such as the absence of any need for indexation, this would give all citizens a stake in the prosperity of the country as a whole.

How might such a scheme be implemented? I have previously proposed the creation of individual accounts at the Federal Reserve for every citizen, including minors, which could be credited with the profits of open market operations. These profits are currently transferred to the Treasury. Any shortfall relative to the basic income share would then have to be made up by transfers from the Treasury to the Fed. One considerable benefit of such accounts is that they would do away with the need for deposit insurance, and would remove at a stroke the implicit subsidy that such insurance provides for proprietary trading at commercial banks. 

Policies of this kind already exist. For instance, the Alaska Permanent Fund collects and invests a portion of the revenue from mineral leases, and periodically distributes dividends to all qualified residents of the state.

The hope is that an initiative such as this can distribute more evenly the benefits from policies that raise aggregate incomes, whether through trade, migration, or technological progress. This ought to mitigate the political obstacles to the implementation of such policies. And perhaps the sense of common ownership will help bridge some of the deep divisions that have become so salient during this electoral season.

Through his rhetoric, Donald Trump has emboldened and empowered some of the most virulently racist and anti-Semitic elements in our society. Just take a look, for instance, at the messages received on twitter by the political theorist Danielle Allen, in response to her concerns about a Trump nomination. They are disheartening in the extreme.

But Trump has the support of about 40% of registered voters, which in my estimation is about 88 million people or 36% of the adult population. While many of them may hold views on some matters that are immensely distasteful and deeply hurtful to others, I think that JD Vance is right to point out that it is "difficult in the abstract to appreciate that those with morally objectionable viewpoints can still be good people." 

I have been an American for just six years, and it is far too soon for me write off so substantial a fraction of my fellow citizens. Call it the naive optimism of the newly naturalized if you like, but I really do think that we can get past this. With or without divine intervention, we can mend our individual and collective flaws.

Thursday, July 21, 2016

A Fallacy of Composition

Peter Moskos is a sociologist by training, a professor at John Jay College of Criminal Justice, and a former Baltimore City police officer. In responding to the shooting of Philando Castile, he had this to say:
Honestly, in this shooting, with this cop, in this locale, I don't think there's a chance in hell Castile would have been shot had he been white. 
Nor did he think this was an entirely isolated incident; it reminded him of the (non-fatal) shooting of Levar Jones by Sean Groubert at a traffic stop in South Carolina. I had exactly the same reaction when I saw the Castile video, as did others. Even the Governor of Minnesota conceded that the shooting "probably would not have happened if he were white."

And yet, Moskos was unsurprised by Roland Fryer's recent claims of an absence of racial bias in police shootings:
I was not surprised by Fryer's conclusions... if one wishes to reduce police-involved shootings... there are good liberal reasons to de-emphasize the significance of race in policing.

Jonathan Ayers, Andrew Thomas, Diaz Zerifino, James Boyd, Bobby Canipe, Dylan Noble, Dillon Taylor, Michael Parker, Loren Simpson, Dion Damen, James Scott, Brandon Stanley, Daniel Shaver, and Gil Collar were all killed by police in questionable to bad circumstances... What they have in common is none were black and very few people seemed to know or care when they were killed. 
Moskos is not arguing here that the police can do no wrong; he is arguing instead that in the aggregate, whites and blacks are about equally likely to be victims of bad shootings. 

How can these two views be reconciled? If there is bias in individual incidents, ought it not to show up in aggregate data? Doesn't the congruence between the racial composition of arrestees nationwide and the racial composition of victims of police killings indicate an absence of bias, as Sendhil Mullainathan claimed a few months ago?

I have argued previously that it does not, because of systematic differences in the qualitative nature of encounters. If police initiate more encounters with blacks that are not objectively threatening (but may in some cases be subjectively perceived to be threatening) then parity in killings per encounter can indicate the presence rather than absence of bias. As Andrew Gelman put it at the time, it's all about the denominator

But Moskos offers another, quite different reason why bias in individual incidents might not be detected in aggregate data: large regional variations in the use of lethal force. 

To see the argument, consider a simple example of two cities that I'll call Eastville and Westchester. In each of the cities there are 500 police-citizen encounters annually, but the racial composition differs: 40% of Eastville encounters and 20% of Westchester encounters involve blacks. There are also large regional differences in the use of lethal force: in Eastville 1% of encounters result in a police killing while the corresponding percentage in Westchester is 5%. That's a total of 30 killings, 5 in one city and 25 in the other.

Now suppose that there is racial bias in police use of lethal force in both cities. In Eastville, 60% of those killed are black (instead of the 40% we would see in the absence of bias). And in Westchester the corresponding proportion is 24% (instead of the no-bias benchmark of 20%). Then we would see 3 blacks killed in one city and 6 in the other. That's a total of 9 black victims out of 30. The black share of those killed is 30%, which is precisely the black share of total encounters. Looking at the aggregate data, we see no bias. And yet, by construction, the rate of killing per encounter reflects bias in both cities. 

This is just a simple example to make a logical point. Does it have empirical relevance? Are regional variations in killings large enough to have such an effect? Here is Moskos again:
Last year in California, police shot and killed 188 people. That's a rate of 4.8 per million. New York, Michigan, and Pennsylvania collectively have 3.4 million more people than California (and 3.85 million more African Americans). In these three states, police shot and killed... 53 people. That's a rate of 1.2 per million. That's a big difference.

Were police in California able to lower their rate of lethal force to the level of New York, Michigan, and Pennsylvania... 139 fewer people would be killed by police. And this is just in California... If we could bring the national rate of people shot and killed by police (3 per million) down to the level found in, say, New York City... we'd reduce the total number of people killed by police 77 percent, from 990 to 231!
This is a staggeringly large effect. 

Additional evidence for large regional variations comes from a recent report by the Center for Policing Equity. The analysis there is based on data provided voluntarily by a dozen (unnamed) departments. Take a close look at Table 6 in that document, which reports use of force rates per thousand arrests. The medians for lethal force are 0.29 and 0.18 for blacks and whites respectively, but the largest recorded rates are much higher: 1.35 for blacks and 3.91 for whites. There is at least one law enforcement agency that is killing whites at a rate more than 20 times greater than that of the median agency.

On the reasons for these disparities, one can only speculate:
I really don't know what some departments and states are doing right and others wrong. But it's hard for me to believe that the residents of California are so much more violent and threatening to cops than the good people of New York or Pennsylvania. I suspect lower rates of lethal force has a lot to do with recruitment, training, verbal skills, deescalation techniques, not policing alone, and more restrictive gun laws. 
Moskos expands on these points in a recent conversation with Glenn Loury.

All of this must be interpreted with caution, since the information we have available is so patchy and deficient. As I wrote in a recent opinion piece with Willemien Kets, there is a desperate need for better data, collected and distributed in a comprehensive and uniform manner. Without this we are just groping in the dark.

Thursday, July 14, 2016

On Arrest Filters and Empirical Inferences

I've been thinking a bit more about Roland Fryer's working paper on police use of force, prompted by this thread by Europile and excellent posts by Michelle Phelps and Ezekeil Kweku.

The Europile thread contains a quick, precise, and insightful summary of the empirical exercise conducted by Fryer to look for racial bias in police shootings. There are two distinct pools of observations: an arrest pool and a shooting pool. The arrest pool is composed of "a random sample of police-civilian interactions from the Houston police department from arrests codes in which lethal force is more likely to be justified: attempted capital murder of a public safety officer, aggravated assault on a public safety officer, resisting arrest, evading arrest, and interfering in arrest." The shooting pool is a sample of interactions that resulted in the discharge of a firearm by an officer, also in Houston. 

Importantly, the latter pool is not a subset of the former, or even a subset of the set of arrests from which the former pool is drawn. Put another way, had the interactions in the shooting pool been resolved without incident, many of them would never have made it into the arrest pool. Think of the Castile traffic stop: had this resulted in a traffic violation or a warning or nothing at all, it would not have been recorded in arrest data of this kind.

The analysis in the paper is based on a comparison between the two pools. The arrest pool is 58% black while the shooting pool is 52% black, which is the basis for Fryer's claim that blacks are less likely to be shot by whites in the raw data. He understands, of course, that there may be differences in behavioral and contextual factors that make the black subset of the arrest pool different from the white, and attempts to correct for this using regression analysis. He reports that doing so "does not significantly alter the raw racial differences."

This analysis is useful, as far as it goes. But does this really imply that the video evidence that has animated the black lives matter movement is highly selective and deeply misleading, as initial reports on the paper suggested? 

Not at all. The protests are about the killing of innocents, not about the treatment of those whose actions would legitimately plant them in the serious arrest pool. What Fryer's paper suggests (if one takes the incident categorization by police at face value) is that at least in Houston, those who would assault or attempt to kill a public safety officer are treated in much the same way, regardless of race. 

But think of the cases that animate the protest movement, for instance the list of eleven compiled here. Families of six of the eleven have already received large settlements (without admission of fault). Six led to civil rights investigations by the justice department. With one or two possible exceptions, it doesn't appear to me that these interactions would have made it past Fryer's arrest filter had they been handled more professionally. 

The point is this: if there is little or no racial bias in the way police handle genuinely dangerous suspects, but there is bias that leads some mundane interactions to turn potentially deadly, then the kind of analysis conducted by Fryer would not be helpful in detecting it. Which in turn means that the breathless manner in which the paper was initially reported was really quite irresponsible. 

For this the author bears some responsibility, having inserted the following into his discussion of the Houston findings:
Given the stream of video "evidence", which many take to be indicative of structural racism in police departments across America, the ensuing and understandable outrage in black communities across America, and the results from our previous analysis of non-lethal uses of force, the results displayed in Table 5 are startling... Blacks are 23.8 percent less likely to be shot by police, relative to whites.
His claim that this was "the most surprising result of my career" was an invitation to misunderstand and misreport the findings, which are important but clearly limited in relevance and scope.

---

Update. If you follow the links at the start of this post, you'll see a case made that Fryer's own findings of bias in the use of non-lethal force suggest that the composition of the arrest pool will be altered by bias in the charging of innocents for resisting or evading arrest.

It occurred to me that the same data used to examine use of non-lethal force (from the citizen's perspective) could also be used to get an estimate of this effect. This is the Bureau of Justice Statistics Police-Public Contact Survey. If anyone had done already this please let me know, I'd be interested to see the findings.

Monday, July 11, 2016

Police Use of Force: Notes on a Study

A new empirical analysis of police use of force by Harvard economist Roland Fryer is attracting national attention. The paper deals with both lethal and non-lethal force, using a variety of different data sets, some public and some painstakingly assembled by the author and his team. Given the harrowing events of the past week, it's likely that his results on shootings will attract the most attention, but it's worth carefully considering both sets of findings.

Fryer provides evidence of significant racial disparities in the experience of non-lethal force at the hands of police, even in data that relies on self-reports by officers. Using official statistics from New York City’s Stop, Question and Frisk program, he finds that blacks and Latinos are more likely to be held, pushed, cuffed, sprayed or struck than whites who are stopped. This remains the case even after controlling for a broad range of demographic, behavioral, and environmental characteristics. And using data from a nationally representative sample of civilians, which does not rely on officer accounts, he finds evidence of even larger disparities in treatment.

But Fryer also reports an absence of racial bias in police shootings for a select group of jurisdictions. He recognizes that a proper analysis of police bias in the use of lethal force requires data not only on those incidents in which shootings occurred, but also those in which suspects were successfully pacified and disarmed. Data of this kind is extremely hard to come by, but he has managed to obtain incident reports on arrests in Houston that can be used for this purpose. 

The focus is on arrest categories that are more likely to involve incidents resulting in justified use of lethal force. It turns out that in this arrest data 58% of the population is black, while in the shooting data the corresponding share is 52%. This immediately implies that in the absence of controls for other features of the interaction, blacks in the arrest population are less likely to be shot than whites. He finds that controlling for other features of the interaction "does not significantly alter the raw racial differences." Here is how Fryer characterizes these findings:
Given the stream of video "evidence", which many take to be indicative of structural racism in police departments across America, the ensuing and understandable outrage in black communities across America, and the results from our previous analysis of non-lethal uses of force, the results displayed in Table 5 are startling... Blacks are 23.8 percent less likely to be shot by police, relative to whites.
He describes this as "the most surprising result of my career."

While it is entirely possible that the Houston Police Department doesn't exhibit systematic racial bias in the use of lethal force, I'm not sure such an emphatic conclusion is warranted. A close look at the arrest data (Table 1D) alongside the shooting data (Table 1C, column 2) reveals a number of puzzles that should be a cause for concern. In the arrest data only 5% of suspects were armed, and yet 56% of suspects "attacked or drew weapon." This would suggest that over half of suspects attacked without a weapon (firearms, knives and vehicles are all classified as weapons). Moreover, there are large differences across groups in behavior: two-thirds of whites and one-half of blacks attacked, a difference that is statistically significant (the reported p-value is 0.006).  

What this means is that the pool of black arrestees and the pool of white arrestees are systematically different, at least as far as behavior is concerned. So the raw data comparison described as startling in the quote above is not really valid. (I made a similar point in response to a piece by Sendhil Mullainathan a few months ago). Still, Fryer controls for these differences in behavioral and contextual characteristics and finds that the basic picture doesn't change. This has to be taken seriously. The key question, to my mind, is whether these controls are adequate. 

I personally would be more convinced if the arrestee pool looked more like the shooting victim pool. For instance, 18% of arrestees, but only 4% of shooting victims are female. I suspect that many of the interactions in the arrestee pool are not threatening, even from the subjective perspective of the officers involved. And others are so obviously threatening---for instance those involving suicide-by-cop---that no discretion or judgement is really necessary. Pruning these from the data might give us a clearer picture of bias in the use of discretionary lethal force. 

Despite these concerns, I think that there is a case to be made that there is no systematic bias against blacks in the lethal use of force within the Houston Police Department. What one ought not to conclude, however, is that this applies nationally. The analysis of other jurisdictions considered in the paper is restricted to encounters in which shootings actually occurred, and cannot therefore be used to answer the same kinds of questions that the Houston data allows. 

One last point about shootings: I'm not sure why there are quotation marks around the word "evidence" in the above quote. Video evidence, for all its flaws, is still very powerful evidence. It was video evidence that led to the indictment of Micheal Slager on murder charges, and the conviction of Sean Groubert for assault and battery. It is selective and cannot establish the presence of racial bias in individual cases, but surely it can't be dismissed out of hand.

Finally, consider Fryer's analysis of non-lethal force, which is consistent with earlier findings. Aside from being fundamentally unjust, disparities in the use of non-lethal force have some really important implications for crime rates. The harassment of entire groups based on racial or ethnic identity is a major obstacle to witness cooperation in serious cases, including homicide. In fact, given the importance of corroboration, a belief that other witnesses will not step forward can be self-fulfilling.

With witnesses routinely unwilling to come forward in some neighborhoods, people can be killed with near impunity. And this significantly increases the incentives to kill preemptively, in a climate of reciprocal fear. Low clearance rates for homicide are directly responsible for high rates of killing, and both of these are held in place by distrust of the criminal justice system by potential witnesses. The excessive and discriminatory use of non-lethal force by police thus ends up having indirect lethal effects.

Thursday, July 07, 2016

Deadly Stereotypes

This video is hard to watch but important to think about and learn from:


Here's what appears to have happened. At around 9pm on July 6, Philando Castile was stopped for a broken taillight while driving in Falcon Heights, Minnesota. He was accompanied by his girlfriend, Lavisha Reynolds, and her young daughter. On being asked for his license and registration, Castile informed the officer that he had a firearm in the vehicle, and a concealed carry permit. He then reached for his wallet and was fatally shot. The video above captures the aftermath of the shooting, and was streamed live to a facebook account by Reynolds. 

The incident immediately brought to mind the shooting of Levar Jones by Sean Groubert in September 2014, which was captured on the officer's dashcam video. Again, there was a traffic stop, a request for documents, and multiple shots fired as Jones reached for his wallet:


Jones was hit but survived the shooting, and Groubert would later plead guilty to assault and battery.

What ties these incidents together is that they seem to have been motivated primarily by fear rather than anger or malice. Moreover, this fear turned out to have been unwarranted: neither Jones nor Castile posed an objective threat to the respective officers. The same was true of Amadou Diallo back in 1999, and in the more recent cases of Tamir Rice and John Crawford.

Whether or not the fear was reasonable under the individual circumstances of each case is harder to ascertain, and there is usually enough doubt to preclude criminal prosecution. Nevertheless, there are rare instances in which the unreasonableness of the fear is recognized: Groubert's employment with the South Carolina Department of Public Safety was terminated on the explicit grounds that he "reacted to a perceived threat where there was none."

A question of great moral and social importance is whether or not such fear is driven, in part, by exaggerated stereotypes of black male violence held by some subset of officers. The anecdotal evidence certainly suggests that such stereotypes matter on average, even if they are not implicated in every case. There is also some evidence of implicit bias from video game simulations.

Further evidence can be found in a dataset assembled by The Guardian. According to this source, there were a total of 1,145 police killings in 2015 alone, about half of which involved suspects armed with a gun. A further 13% of those killed were armed with a knife. There is no question, therefore that police officers often face armed and dangerous suspects. However, 18% of whites killed by police in 2015 were unarmed while 52% had a gun; the corresponding figures for blacks were 25% and 46%. This suggests that within the set of encounters that result in police killings, those involving black suspects are less objectively threatening to the officers involved. One possible explanation is that any given encounter is more likely to be perceived by the officer as threatening when the suspect happens to be black.

In the Guardian data, slightly more than half of those killed by police were white, 27% were black, and 17% Latino. The proportion of those killed who were black is roughly the same as the proportion of total arrestees who are black, which has led some to argue that "removing police racial bias will have little effect on the killing rate." But this claim depends on the questionable assumption that encounters involving black citizens are as likely to be objectively threatening to officers and encounters with white citizens. As I have argued previously, there are reasons to believe that they are not.

The health of our society depends on an effective and trusted criminal justice system. In fact, the system cannot be effective if it isn't trusted. Distrust makes witnesses to crimes unwilling to come forward and depresses clearance rates. This allows serious crimes, including homicide, to be committed with impunity. Fear of homicide victimization raises incentives for preemptive killing, resulting in epidemics of violence. At the heart of it all are stereotypes, affecting interactions between victims and offenders, parties to disputes, prosecutors and witnesses, and officers and suspects. And the very same stereotypes also affect the urgency and concern with which the general public views mass incarceration

What can be done? The screening and training of officers has got to take into account the possibility that stereotypes can be deadly. Psychologists have found that exposure to counterstereotypical exemplars can reduce implicit bias, and residency requirements can serve as a screening device. Finally, the construction of a complete and consistent national database of incidents remains imperative. Public action requires broad engagement with the issue and some agreement on the nature of the problem, and this will not be possible while arguments continue to rely on anecdotal and indirect evidence. Such evidence is too quickly dismissed by skeptics and too easily filtered by stereotypes, no matter how shocking and heartbreaking and deeply persuasive a sympathetic observer finds it to be.

Sunday, April 10, 2016

Fee-Structure Distortions in Prediction Markets

Since the launch of the pioneering Iowa Electronic Markets almost thirty years ago, prediction markets have grown to become a familiar fixture in the forecasting landscape. Among the most recent entrants is PredictIt, which has been operating for about a year under a no-action letter from the CFTC.

Both IEM and PredictIt offer contracts structured as binary options: if the referenced event occurs, the buyer of the contract gets a fixed payment at the expense of the seller, and otherwise gets nothing. The price of the contract (relative to the winning payment) may then be interpreted as a probability; an assessment by the "market" of the likelihood that the event will occur. These probabilities can be calibrated against actual outcomes over multiple events, and compared with survey and model based forecasts. Comparisons of this kind have generally found the forecasting performance of markets to be superior on average to those based on more traditional methods.

But interpreting prices as probabilities requires, at a minimum, that the set of prices referencing mutually exclusive outcomes sum to at most one. This condition is routinely violated on PredictIt. For instance, in the market for the presidential election winner by party, we currently have:


Based on the prices at last trade, there is an absurd 108% likelihood that someone or other will be elected president. Furthermore, the price of betting against all three listed outcomes (by buying the corresponding no contracts) is $1.96, even though the payout from this bundle is sure to be $2.00. Since these contracts are margin-linked (the exchange only requires a trader to post his or her worst-case loss) the cost of buying this bundle would be precisely zero in the absence of fees, and this would be as pure an opportunity for arbitrage as one is likely to find.

On IEM, or the now defunct Intrade, such a pattern of pricing would never be observed except perhaps for an instant. The discrepancy would be spotted by an algorithm and trades executed until the opportunity had been fully exploited. Profits would be small on any given trade, but would add up quickly: the most active account on Intrade during the last presidential election cycle traded close to four million contracts for a profit of $62,000 with minimal risk and effort. This trader had a median holding period of zero milliseconds. That is, the trader typically sold multiple candidate contracts simultaneously (with the trades having identical timestamps) in a manner that could not possibly have been done manually.

Why don't we see this in PredictIt? The simple answer is the fee structure. Whenever a position is closed at a profit the exchange takes 10% of the gains; losing trades don't incur fees. Taking account of this fee structure, the worst-case outcome for a trader betting against all three outcomes in the example above would be a win by someone other than a major party nominee. In this case the trader would lose $0.95 and gain $0.99, incurring fees on the latter of around ten cents. The result would be a net loss rather than a gain, and hence no opportunity for arbitrage. Prices could remain at these levels indefinitely.

Still, algorithmic arbitrage can prevent prices from getting too far out of line with meaningful probabilities. The extent to which this happens depends on whether the events in question include some that are considered highly unlikely. In a market with only two possibilities (such as that referencing confirmation of Merrick Garland) price distortion will be lowest if both outcomes are considered equally likely. For instance, if the prices of the two contracts were each 53, betting against both would cost 94, and fees would be a shade above 5 no matter what happens. These prices could not be sustained, so the distortion would be at most 5%.   

But in the same market, prices of 99 and 10 for the two outcomes could be sustained, for a distortion of 9%. The cost of betting against both would be 91 but if the less likely outcome occurs, the fee would wipe out all gains. Hence no opportunity for arbitrage, and no pressure on prices to change. 

Given that PredictIt is operating as an experimental research facility with the purpose of generating useful data for academic research, this situation is unfortunate. It would be easy for the exchange to apply fees only to net profits in a given market, after taking account of all losses and gains, as suggested here. This does not require any change in the manner in which margin is calculated at contract purchase, only a refund once the market closes. If this is done, prices should snap into line and begin to represent meaningful probabilities. The decline in revenue would be partially offset by increased participation. And the transition itself would generate interesting data for researchers, consistent with the stated mission of the enterprise.