Forget Nate Silver. There’s a new king of the presidential election data mountain. His name is Sam Wang, Ph.D.
Haven’t heard of him just yet? Don’t worry. You will. Because Wang has sailed True North all along, while Silver has been cautiously trying to tack his FiveThirtyEight data sailboat (weighted down with ESPN gold bars) through treacherous, Category-Five-level-hurricane headwinds in what has easily been the craziest presidential campaign in the modern political era.
When the smoke clears on Tuesday—and it will clear—what will emerge is Wang and his Princeton Election Consortium website and calculations (which have been used, in part, to drive some of the election poll conclusions at The New York Times’ Upshot blog and The Huffington Post’s election site). What will be vindicated is precisely the sort of math approach that Silver once rode to fame and fortune.
Wang says his method differs from Silver’s in its approach to uncertainty. “They score individual pollsters, and they want to predict things like individual-state vote shares,” he wrote in his blog on Sunday. “Achieving these goals requires building a model with lots of parameters, and running regressions and other statistical procedures to estimate those parameters. However, every parameter has an uncertainty attached to it. When all those parameters get put together to estimate the overall outcome, the resulting total is highly uncertain.” By contrast, he says, PEC’s model relies on a snapshot of all state polls every day, and then makes sure unrelated fluctuations are averaged out.
Math Is What Matters
Make no mistake about it. Wang’s website is godawful ugly. He could use some serious website design help. You have to be an intense political geek to wade through it and get to the good stuff.
Most likely, it’s because presidential forecasting isn’t Wang’s real job. He’s a professor of neuroscience at Princeton. His undergraduate degree from CalTech was in physics. His Ph.D. from Stanford was in neuroscience. Wang wandered into the election forecasting world, which he’s been at with PEC since 2004, through passion and arguments with a physics colleague.
“I first got interested in poll aggregation in October 2000, when I went around telling people that the Bush-Gore race would come down to Florida,” he told me. “Then in summer 2004, I got into a lively discussion with a colleague who is a string theorist. I argued that John Kerry had a good shot at winning because he needed two out of three states: Pennsylvania, Ohio, and Florida. Kerry was ahead in Pennsylvania and near-tied in the other two states, so the odds were about 3-to-1 in his favor. The string theorist agreed! (So) I wrote a MATLAB script and posted the results as a primitive HTML page—which promptly went viral.”
This year, Wang called the election at 8:55 PM on October 18. He promised to eat more than just his hat if Clinton loses: “It is totally over. If Trump wins more than 240 electoral votes, I will eat a bug,” Wang tweeted to his 23,000 followers. He expects Clinton to receive at least 298 electoral votes.
Wang has been the intrepid election data explorer furthest out this election cycle, never once wavering from his certainty of a Clinton win. The only real uncertainty left on Tuesday, he said, is how many people show up to vote. But even that doesn’t change the presidential election outcome.
“Pollsters have pretty good judgment, but their average estimate of who will vote may be off,” Wang told me. “To account for this, the snapshot gets converted to a Meta-Margin, which is defined as how far all polls would have to move, in the same direction, to create a perfect toss-up. To turn the Meta-margin into a win probability, the final step is to estimate how likely it is that the Meta-Margin is so far off that the other candidate is favored.”
Even if you factor this voting uncertainty into his election model by 5 percent—which is an unprecedented level historically, Wang says—Clinton still wins. It is precisely this sort of deep analysis that has endeared Wang to both financial analysts who make a living with math-based market predictions and to political journalism analysts who handicap elections.
So what is “median-based probability estimation” exactly, and why has it given Wang and his drama-engorged political junkies such confidence during a presidential campaign buffeted by gale force winds?
Finding Consistency in the Chaos
It’s like this: When you go the emergency room at a hospital, everything around you is chaos. People are emotional and scared. But nurses and technicians calmly test the only data that matters —your vital signs like blood pressure, body temperature, etc.—nearly every hour. They make sure the math and data tell a consistent story through the chaos, so that physicians have a baseline when they’re asked to make a diagnosis. That’s what Wang’s system does. It uses meaningful data and math vital signs, and tests them over and over even when everything around you is utter chaos.
“The calculation is built upon state polls, which are more accurate than national polls,” Wang told me. “At this point any other source of data adds unnecessary uncertainty. For each state, the code calculates the median and its standard error, which tells us how far off the median may be. That gets turned into a probability for each of 56 contests: for the 50 states, the District of Columbia, and five Congressional districts that have a special rule.”
Wang then uses equations to plot the possibilities. “A compounding procedure is used to calculate the exact distribution of all possibilities, from 0 to 538 electoral votes,” he said. “The median of that is the snapshot of where conditions appear to be today. As this snapshot changes from day to day, this procedure automatically gets rid of unrelated fluctuations between states. An example of that is random sampling error. However, it keeps the changes that are correlated among states. If many states move together, the snapshot moves too.”
Wang has said for months that it was a five-point race; that there haven’t been dramatic swings in polling, only non-responses from depressed voters in the middle of news cycle swings; and that this has actually been the most stable election in a long time. What’s different, Wang has said to those willing to listen, is the media coverage of the “full meltdown” of emotion as Trump has seized control of the GOP.
The Race to Call the Race
Natalie Jackson, the senior polling editor at The Huffington Post, told me that Wang uses the HuffPost Pollster data feed and that they both use the same polls, which explains the similarity.1 “Our forecast has been in line with Sam’s for most of the time it’s been up (we posted Oct. 3), and our probability of Clinton winning never dipped below 84 percent,” she said. “The polling data has never consistently shown anything but a Clinton win.”
Jackson, who coordinates site’s Pollster section, said that the data is truly what matters, and it’s been consistent at the presidential level. “With everything we know about polling in general elections—that opinions are fairly stable, and fluctuations in national polls aren’t necessarily reflecting people changing their voting decisions—it makes sense to keep a calm, steady approach to aggregating and forecasting,” she said. “It might not be the best way to generate news, but it’s a very good way to model noisy data.”
The Huffington Post Washington bureau chief Ryan Grim has been in a very public feud with Silver in recent days over precisely this question of “noisy data.” Grim accused Silver of deliberately skewing his own data at FiveThirtyEight with what amounts to political punditry. Silver fired back on social media with some ugly language. Grim stands by the HuffPost election model. “There’s room to debate where and whether forecasting belongs in our politics and our campaigns,” Grim told me. “But if you’re gonna do it, then you shouldn’t shrink from what the numbers tell you. I’m glad that our team didn’t, even if it scared me along the way.”
So when the smoke clears on Tuesday; when enough non-white and female voters haven’t been harassed or intimidated enough to stay home; when Clinton crosses the finish line with something close to 300 Electoral College votes and a popular vote victory somewhere between two and five percentage points; and Nate Silver is telling his 1.7 million Twitter followers that he’d been right all along this election, Sam Wang will be standing tall above the fray, draped in his “median-based probability election” cloak.
Long live the new election data king.
1 Updated 4:45 pm PST 11-7-16 to clarify the relationship between the Huffington Post and Wang’s site.