How I Acted Like A Pundit And Screwed Up On Donald Trump | FiveThirtyEight

Since Donald Trump effectively wrapped up the Republican nomination this month, I’ve seen a lot of critical self-assessments from empirically minded journalists — FiveThirtyEight included, twice over — about what they got wrong on Trump. This instinct to be accountable for one’s predictions is good since the conceit of “data journalism,” at least as I see it, is to apply the scientific method to the news. That means observing the world, formulating hypotheses about it, and making those hypotheses falsifiable. (Falsifiability is one of the big reasons we make predictions.) When those hypotheses fail, you should re-evaluate the evidence before moving on to the next subject. The distinguishing feature of the scientific method is not that it always gets the answer right, but that it fails forward by learning from its mistakes.

But with some time to reflect on the problem, I also wonder if there’s been too much #datajournalist self-flagellation. Trump is one of the most astonishing stories in American political history. If you really expected the Republican front-runner to be bragging about the size of his anatomy in a debate, or to be spending his first week as the presumptive nominee feuding with the Republican speaker of the House and embroiled in a controversy over a tweet about a taco salad, then more power to you. Since relatively few people predicted Trump’s rise, however, I want to think through his nomination while trying to avoid the seduction of hindsight bias. What should we have known about Trump and when should we have known it?

It’s tempting to make a defense along the following lines:

Almost nobody expected Trump’s nomination, and there were good reasons to think it was unlikely. Sometimes unlikely events occur, but data journalists shouldn’t be blamed every time an upset happens, particularly if they have a track record of getting most things right and doing a good job of quantifying uncertainty.

We could emphasize that track record; the methods of data journalism have been highly successful at forecasting elections. That includes quite a bit of success this year. The FiveThirtyEight “polls-only” model has correctly predicted the winner in 52 of 57 (91 percent) primaries and caucuses so far in 2016, and our related “polls-plus” model has gone 51-for-57 (89 percent). Furthermore, the forecasts have been well-calibrated, meaning that upsets have occurred about as often as they’re supposed to but not more often.

But I don’t think this defense is complete — at least if we’re talking about FiveThirtyEight’s Trump forecasts. We didn’t just get unlucky: We made a big mistake, along with a couple of marginal ones.

The big mistake is a curious one for a website that focuses on statistics. Unlike virtually every other forecast we publish at FiveThirtyEight — including the primary and caucus projections I just mentioned — our early estimates of Trump’s chances weren’t based on a statistical model. Instead, they were what we “subjective odds” — which is to say, educated guesses. In other words, we were basically acting like pundits, but attaching numbers to our estimates. And we succumbed to some of the same biases that pundits often suffer, such as not changing our minds quickly enough in the face of new evidence. Without a model as a fortification, we found ourselves rambling around the countryside like all the other pundit-barbarians, randomly setting fire to things.

There’s a lot more to the story, so I’m going to proceed in five sections:

1. Our early forecasts of Trump’s nomination chances weren’t based on a statistical model, which may have been most of the problem.

2. Trump’s nomination is just one event, and that makes it hard to judge the accuracy of a probabilistic forecast.

3. The historical evidence clearly suggested that Trump was an underdog, but the sample size probably wasn’t large enough to assign him quite so low a probability of winning.

4. Trump’s nomination is potentially a point in favor of “polls-only” as opposed to “fundamentals” models.

5. There’s a danger in hindsight bias, and in overcorrecting after an unexpected event such as Trump’s nomination.

 

Our early forecasts of Trump’s nomination chances weren’t based on a statistical model, which may have been most of the problem.

Usually when you see a probability listed at FiveThirtyEight — for example, that Hillary Clinton has a 93 percent chance to win the New Jersey primary — the percentage reflects the output from a statistical model. To be more precise, it’s the output from a computer program that takes inputs (e.g., poll results), runs them through a bunch of computer code, and produces a series of statistics (such as each candidate’s probability of winning and her projected share of the vote), which are then published to our website. The process is, more or less, fully automated: Any time a staffer enters new poll results into our database, the program runs itself and publishes a new set of forecasts. There’s a lot of judgment involved when we build the model, but once the campaign begins, we’re just pressing the “go” button and not making judgment calls or tweaking the numbers in individual states.

Anyway, that’s how things usually work at FiveThirtyEight. But it’s not how it worked for those skeptical forecasts about Trump’s chance of becoming the Republican nominee. Despite the lack of a model, we put his chances in percentage terms on a number of occasions. In order of appearance — I may be missing a couple of instances — we put them at 2 percent (in August), 5 percent (in September), 6 percent (in November), around 7 percent (in early December), and 12 percent to 13 percent (in early January). Then, in mid-January, a couple of things swayed us toward a significantly less skeptical position on Trump.

First, it was becoming clearer that Republican “party elites” either didn’t have a plan to stop Trump or had a stupid plan. Also, that was about when we launched our state-by-state forecast models, which showed Trump competitive with Cruz in Iowa and favored in New Hampshire. From that point onward, we were reasonably in line with the consensus view about Trump, although the consensus view shifted around quite a lot. By mid-February, after his win in New Hampshire, we put Trump’s chances of winning the nomination at 45 percent to 50 percent, about where betting markets had him. By late February, after he’d won South Carolina and Nevada, we said, at about the same time as most others, that Trump would “probably be the GOP nominee.”

But why didn’t we build a model for the nomination process? My thinking was this: Statistical models work well when you have a lot of data, and when the system you’re studying has a relatively low level of structural complexity. The presidential nomination process fails on both counts. On the data side, the current nomination process dates back only to 1972, and the data availability is spotty, especially in the early years. Meanwhile, the nomination process is among the most complex systems that I’ve studied. Nomination races usually have multiple candidates; some simplifying assumptions you can make in head-to-head races don’t work very well in those cases. Also, the primaries are held sequentially, so what happens in one state can affect all the later ones. (Howard Dean didn’t even come close to defeating John Kerry in 2004, for example, finishing with barely more than 100 delegates to Kerry’s roughly 2,700, but if Dean had held on to win Iowa, he might have become the nominee.) To make matters worse, the delegate rules themselves are complicated, especially on the GOP side, and they can change quite a bit from year to year. The primaries may literally be chaotic, in the sense that chaos theory is defined. Under these conditions, any model is going to be highly sensitive to its assumptions — both in terms of which variables are chosen and how the model is parameterized.

The thing is, though, that if the nomination is hard to forecast with a model, it’s just as hard to forecast without a model. We don’t have enough historical data to know which factors are really predictive over the long run? Small, seemingly random events can potentially set the whole process on a different trajectory? Those are problems in understanding the primaries period, whether you’re building a model or not.

And there’s one big advantage a model can provide that ad-hoc predictions won’t, which is how its forecasts evolve over time. Generally speaking, the complexity of a problem decreases as you get closer to the finish line. The deeper you get into the primaries, for example, the fewer candidates there are, the more reliable the polls become, and the less time there is for random events to intervene, all of which make the process less chaotic. Thus, a well-designed model will generally converge toward the right answer, even if the initial assumptions behind it are questionable.

Suppose, for instance, we’d designed a model that initially applied a fairly low weight to the polls — as compared with other factors like endorsements — but increased the weight on polls as the election drew closer. Based on having spent some time last week playing around with a couple of would-be models, I suspect that at some point — maybe in late November after Trump had gained in polls following the Paris terror attacks — the model would have shown Trump’s chances of winning the nomination growing significantly.

A model might also have helped to keep our expectations in check for some of the other candidates. A simple, two-variable model that looked at national polls and endorsements would have noticed that Marco Rubio wasn’t doing especially well on either front, for instance, and by the time he was beginning to make up ground in both departments, it was getting late in the game.

Without having a model, I found, I was subject to a lot of the same biases as the pundits I usually criticize. In particular, I got anchored on my initial forecast and was slow to update my priors in the face of new data. And I found myself selectively interpreting the evidence and engaging in some lazy reasoning.

Another way to put it is that a model gives you discipline, and discipline is a valuable resource when everyone is losing their mind in the midst of a campaign. Was an article like this one — the headline was “Dear Media, Stop Freaking Out About Donald Trump’s Polls” — intended as a critique of Trump’s media coverage or as a skeptical analysis of his chances of winning the nomination? Both, but it’s all sort of a muddle.

 

Trump’s nomination is just one event, and that makes it hard to judge the accuracy of a probabilistic forecast.

The campaign has seemed to last forever, but from the standpoint of scoring a forecast, the Republican nomination is just one event. Sometimes, low-probability events come through. Earlier this month, Leicester City won the English Premier League despite having been a 5,000-to-1 underdog at the start of the season, according to U.K. bookmakers. By contrast, our 5 percent chance estimate for Trump in September 2015 gave him odds of “only” about 20-to-1 against.

What should you think about an argument along the lines of “sorry, but the 20-to-1 underdog just so happened to come through this time!” It seems hard to disprove, but it also seems to shirk responsibility. How, exactly, do you evaluate a probabilistic forecast?

The right way is with something called calibration. Calibration works like this: Out of all events that you forecast to have (for example) a 10 percent chance of occurring, they should happen around 10 percent of the time — not much more often but also not much less often. Calibration works well when you have large sample sizes. For example, we’ve forecast every NBA regular season and playoff game this year. The biggest upset came on April 5, when the Minnesota Timberwolves beat the Golden State Warriors despite having only a 4 percent chance of winning, according to our model. A colossal failure of prediction? Not according to calibration. Out of all games this year where we’ve had one team as at least a 90 percent favorite, they’ve won 99 out of 108 times, or around 92 percent of the time, almost exactly as often as they’re supposed to win.

Another, more pertinent example of a well-calibrated model is our state-by-state forecasts thus far throughout the primaries. Earlier this month, Bernie Sanders won in Indiana when our “polls-only” forecast gave him just a 15 percent chance and our “polls-plus” forecast gave him only a 10 percent chance. More impressively, he won in Michigan, where both models gave him under a 1 percent chance. But there have been dozens of primaries and only a few upsets, and the favorites are winning about as often as they’re supposed to. In the 31 cases where our “polls-only” model gave a candidate at least a 95 percent chance of winning a state, he or she won 30 times, with Clinton in Michigan being the only loss. Conversely, of the 93 times when we gave a candidate less than a 5 percent chance of winning, Sanders in Michigan was the only winner.

WIN PROBABILITY RANGENO. FORECASTSEXPECTED NO. WINNERSACTUAL NO. WINNERSCalibration for FiveThirtyEight ”polls-only” forecast
95-100%3130.530
75-94%1512.513
50-74%116.99
25-49%124.02
5-24%222.41
0-4%930.91

Based on election day forecasts in 2016 primaries and caucuses. Probabilities listed as ”>99%” and ”99%” and ”

http://fivethirtyeight.com/features/how-i-acted-like-a-pundit-and-screwed-up-on-donald-trump/?ex_cid=538twitter