Donald Trump, Nate Silver, and the Value of Data Journalism

Those who underestimated Trump fell victim to a flawed supposition, widely shared, that a major party would never select such an extreme candidate.PHOTOGRAPH BY JOEY FOLEY / WIREIMAGE / GETTY

On Thursday, the Timess media columnist, Jim Rutenberg, took journalists to task for underestimating Donald Trump’s prospects of winning the Republican nomination. “Wrong, wrong, wrong—to the very end, we got it wrong,” Rutenberg wrote. He singled out data journalists, particularly Nate Silver, of FiveThirtyEight, who for a long time was down on Trump’s prospects. Admonishing the profession to return to J-school basics in the months ahead, Rutenberg concluded that “a good place to start would be to get a good night’s sleep, and then talk to some voters.”

It is certainly true that many commentators were too quick to dismiss Trump’s chances. (Last summer, I was one of them.) And because, in the past, Silver has been scathing about the value of traditional newspaper and television commentary, it wouldn’t be surprising if, now that he has come a cropper, some old-school journalists were taking pleasure in his misfortune. But let’s be clear about where he and others erred. To portray this as a case of data journalists getting it wrong and shoe-leather journalists getting it right would be misleading. The truth is more complicated.

Because Silver has his own Web site and an enviable record of forecasting Presidential elections, I’ll concentrate on him. From what I’ve read of his coverage of the G.O.P. primary, which is quite a bit, he didn’t rely too heavily on polling figures or fall victim to misleading data. To the contrary, his basic error was to downplay the incoming evidence from the pollsters, whose business is based on talking to voters, and to assume that Trump’s steady lead wouldn’t result in his winning the nomination. Not doing traditional reporting wasn’t the issue: the slipup was analytical. Ultimately, Silver fell victim to a flawed supposition, widely shared, that a major political party would never select a candidate as extreme as Trump.

To his credit, Silver has conceded that he went astray, and has sought to explain why. In an article posted on Wednesday, he wrote, “Other than being early skeptics of Jeb Bush, we basically got the Republican race wrong.” He also linked to some of his earlier posts, including one, from August, entitled “Donald Trump’s Six Stages of Doom,” and another, from November, headlined, “Dear Media, Stop Freaking Out About Donald Trump’s Polls.” In that last article, Silver argued that the surveys showing Trump well ahead of his rivals didn’t have much predictive value because most G.O.P. primary voters hadn’t made up their minds. He also said that “nobody remotely like Trump has won a major-party nomination in the modern era.”

Coming from a very able statistician, this was a somewhat surprising fact to emphasize. For one thing, the sample size is small. If we take “the modern era” to mean postwar elections, and we include both parties, there are only thirty-four data points to work with. And, in two of the thirty-four instances, parties did select candidates who were outside the mainstream: Barry Goldwater, the G.O.P. candidate in 1964, and George McGovern, the Democratic candidate in 1972. (Although Goldwater and McGovern both had a lot more political experience than Trump has.) Silver was well aware of the sample-size problem: he referred to it in his November post, and in others. But he maintained his skeptical attitude toward Trump’s candidacy even as evidence accumulated that Trump was extending his support well beyond the stereotypical angry blue-collar guys who ride Harley-Davidsons.

In December, for example, a CNN/WMUR poll of likely voters in the New Hampshire Republican primary indicated that forty-seven per cent of self-identified moderates had a favorable opinion of Trump, as did forty-nine per cent of women and fifty-eight per cent of college graduates. It was numbers like these that persuaded some analysts, myself included, to rethink Trump’s candidacy and take it more seriously. In early January, Sam Wang, of the Princeton Election Consortium, who also started out as Trump skeptic, wrote a post querying the argument, which the Times’ Nate Cohn had also made, that the primary polls didn’t have much predictive value. After looking at past data, Wang wrote, “Donald Trump is in as strong a position to get his party’s nomination as Hillary Clinton in 2016, George W. Bush in 2000, or Al Gore in 2000.”

Silver did say, in January, that he had become less dubious about Trump’s prospects because Republican élites weren’t doing much to combat his candidacy. But he continued to take a different line. Even after Trump easily won two out of the first three voting contests, in New Hampshire and South Carolina, Silver referred to himself as a Trump skeptic, suggesting that the front-runner’s ceiling of support was low. As recently as early April, after Ted Cruz won the Wisconsin primary, Silver commented, “The threshold Trump needs to win states is increasing considerably faster than the share of the vote he’s getting, which isn’t increasing much at all.”

To be sure, Silver wasn’t the only data journalist to express doubts about Trump’s candidacy late in the game. In another mea culpa posted on Wednesday, Cohn wrote, “I do think we—and specifically, I—underestimated Mr. Trump. There were bad assumptions, misinterpretations of the data, and missed connections all along the way.” But Cohn also argued, rightly, in my view, that Trump’s victory wasn’t inevitable, and that many unusual factors helped bring it about, including the huge amount of media attention Trump received, as well as the large number of candidates in the race, which prevented the Republican Party from uniting against him until it was too late. At the end of his piece, though, Cohn concluded, “We were just overconfident. There haven’t been very many presidential elections in the modern era of primaries. There certainly haven’t been enough to rule out the possibility that a true outsider could win the nomination.”

For journalists of all types, the warning about being too confident of your own analysis should be one of the takeaways from Trump’s victory. Many social phenomena, elections included, have an irreducible element of uncertainty attached to them, which makes predicting them a fraught enterprise. The fact that, in recent Presidential elections, polling data from the last few months of campaigning has proved a fairly reliable indicator of the outcome shouldn’t be allowed to disguise this fact.

A second takeaway, related to the first, is that, when you are writing about politics, taking account of history, particularly recent history, is crucial. Today’s G.O.P. isn’t the G.O.P. of the nineteen-sixties, seventies, or eighties. As Rutenberg noted in his Times article, the Party has shifted sharply to the right and experienced a series of internal revolts that undermined the power of the “Republican establishment,” thereby creating the conditions in which Trump could prosper. When things are changing, it can be dangerous to make predictions on the basis of past precedent, which is what statistical voting models do—as do theories like the one that says a candidate can’t be nominated without the support of party élites.

Data journalism has its limits, then. Paradoxically, though, the third conclusion to be drawn from this episode is that data, and particularly polling data, can be invaluable. If Silver had paid more heed to Trump's early polling numbers and less attention to his prior beliefs about how the Republican Party selects candidates, he would have fared better. As he noted on Twitter on Thursday, “Roughly speaking (I’m generalizing a LOT) polls got Trump *right* while other types of empirical evidence (e.g. endorsements) got him wrong.” Silver continued on in another tweet, writing, “There are lessons from that, but they’re almost totally orthogonal to the ‘data journalism’ vs. ‘traditional journalism’ beef.”

That sounds about right. I would only add that political data journalism isn’t confined to trying to divine the future, and it’s often most useful when it avoids making predictions, particularly ones that come with a level of potentially spurious precision attached to them. (“So-and-so has an eighty-three per cent chance of winning the election.”) There is now a lot of good data-driven reporting, and the outfits that specialize in it—like FiveThirtyEight, the Upshot, Vox, and Wonkblog—have expanded and enriched U.S. journalism. If you want to find out what types of people voted for Trump in the G.O.P. primaries, I can recommend an informative post based on an analysis of exit polling. And, if you’d like to learn more about the scale of the task facing Trump in November, I can point you to another recent article that is crammed with relevant facts and figures.

The authors of these two pieces are Silver and Cohn.