Polls, Forecasts, and Fake News

John Carney

Data Stories

Data-Informed Voter Series

We live in an age of near-constant polling and what feels like overwhelming amounts of polling during presidential election years. After the 2016 election, Donald Trump’s unexpected victory shook the general public’s faith in polling, since most prognostications indicated a Clinton victory. While issues certainly existed with polling in 2016, my goal is to convince you, the reader, that the polls were not nearly as “wrong” as you may have been led to believe. To start, let’s get something out of the way right now: forecasts and polls are not the same thing.

Forecasts vs. Polling

In today’s media environment, it’s tempting to see a candidate leading a national poll and assume that person will win the election. This is a bad idea. National polls, which often apply weighting techniques for population and demographics, still show you the aggregate of how people will most likely vote if the election is held at the time they speak to pollsters.

But elections don’t actually work this way. For one thing, the election isn’t held that day (except in the case of exit polls), and all manner of things can change in the intervening time between the poll and the election. In 2016 for example, we saw a letter to Congress from the FBI director about a major party candidate. Politics aside, this was an unusual historical event we simply could not predict nor have the right data to accurately forecast against.

Also, the US population doesn’t cast a vote directly for the President. If you live in a state or in the District of Columbia, you cast your vote for a candidate like everyone else. In most states (excluding Nebraska and Maine), whoever wins the majority of votes (even if by one vote), then gets all of that state’s electoral votes. This system is referred to as “winner-take-all.”

Since political views aren’t distributed evenly among the states, a national polling average is, at best, a weak signal for who will most likely win the Presidency. This is why more advanced forecasts, like Nate Silver’s FiveThirtyEight, first predict who will win each individual state and then compute the expected winner by electoral votes per state, rather than simply looking at national polling averages.

This matters more than you might think since Democratic-leaning individuals tend to cluster in cities, often highly concentrated in coastal states such as California and New York. For instance, Missouri has about 6.1 million people in the whole state, which is just under the 7.1 million people living in the nine counties of the San Francisco Bay Area.

Electoral votes are fixed every ten years (at 538) and distributed to the states by population. Also, every state is guaranteed at least three electoral votes by the Constitution. This essentially limits the amount of influence more populous states have on the Presidential election.

Considering these factors, there is just no way to easily or confidently determine who will win the Presidential election from just national polls alone. If you want a clearer window into who will likely win, focus on tools and forecasts that predict state outcomes first and then the national outcome second.

Polling Issues in the 2016 Presidential Election

A number of polling issues impacted the 2016 election and underestimated Donald Trump’s level of support. Many hypotheses exist, but a few of the most prominent include:

Failure to weight by education: underestimates the responses of less educated voters
The “Shy Trump Voter” effect: less likely to have influenced polling errors
Nonresponse bias: people simply didn’t respond to polls in a systematic way, skewing the results

Any of these criticisms, to the extent they are testable, could be the ultimate culprit. But most importantly, despite these issues, the polls were generally right about who would win the popular vote: Hillary Clinton.

Why “You Can’t Trust Polling!” is Fake News

On the 2016 popular vote, nearly half the polls that RealClearPolitics reported at or near election day were within the margin of error. That is, the margin of error was greater than the lead of the winning candidate. In statistical terms, that means those polls show a dead heat. The overall picture of Clinton leading the popular vote by around three points is nearly exactly what happened. The polling, despite every issue outlined above, got it nearly right. Nearly right isn’t so terrible when you remember pollsters try to predict the future. Outlier polls happen, but the overall polling was right on the money.

Let’s also take a look at FiveThirtyEight’s Election Forecast the day of the election. They gave Trump a 28.2% chance of winning the election. That’s not “no chance.” That’s not even a particularly low chance! That’s just shy of three out of ten. So, in the site’s simulation-based model, Trump won the election about 28 times out of 100. While certainly not a “no path to victory” situation, most people would rather be on the larger side of that split.

You might be thinking, “Okay, now you’re just splitting hairs.” I’m not. Low-probability events happen all the time. For example, getting a flat tire isn’t supremely likely on any given day, but I would bet all of the money in my pockets that the majority of people reading this have experienced one. Think of it another way: you don’t go to a car insurance company (visit this site) and buy insurance for the vast majority of days you drive perfectly. You buy it for the one day you make a mistake.

Finally, let’s address the elephant in the room: Michigan, Pennsylvania, and Wisconsin each went counter to their forecasts. The margin in each state was razor-thin. Just under 80,000 people total tipped those three elections, and thus the whole contest, in Trump’s favor. It’s worth noting this swing was considered possible in the 80% bands provided by the FiveThirtyEight model, meaning it was absolutely within the realm of possibility. Indeed, while most of the site’s simulations showed Clinton winning those states, the model showed Trump winning enough times to justify those overlapping ranges, which means the race was closer than the headline numbers suggested.

If you hear there’s a 30% chance of rain, you may pack an umbrella or reconsider a trip to the beach. So it was with the 2016 election: the chance of a Trump victory was not sky-high, but it was certainly possible.

So What Can We Learn?

Polls, forecasts, and even pundit prognostications are powerful tools, because they provide windows into complex electoral processes. Be careful though, because those windows aren’t crystal clear. Over-reliance on these tools can cause over-confidence or play into your biases. As one political party learned the hard way, favorable polls and forecasts are no guarantee. Such tools are imperfect at best, as clearly evidenced by the polling misfires of 2016. Despite this, if most polls agree on an outcome, it’s generally pretty safe to think that picture reflects reality. Just don’t expect it to be a done deal. There are no guarantees when you’re dealing with probabilities.

Data-Informed Voter Series

The Data-Informed Voter Series is a 2020 passion project for a team of Evolytics analysts. We aim to be as politically neutral as possible while discussing the data, implications, and interpretations we see in the news. We discuss topics as a pseudo-editorial board with the aim of informing voters on how a professional analyst would interpret data during an election cycle. This project team consists of John Carney, Jay Farias, Liam Huffman, Brian Johnson, Anoush Kabalyan, Laura Sutter, and Krissy Tripp.

Sign up to receive analytics tips and insights straight to your inbox.

By entering your email address, you consent to receive communications from Evolytics. You may unsubscribe at any time by clicking the unsubscribe link located at the bottom of any email. This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Written By

John Carney

John Carney, Senior Analyst, Data Operations guides clients through the process of turning data access into data-driven insight. His primary specialty involves blending skills in data science, data engineering, and data visualization to generate reliable and action-focused solutions. With this well-rounded set of skills, John often acts as a bridge between the Data Operations and Decision Science teams at Evolytics. A believer in the power of data to empower all, John occasionally speaks at conferences on topics such as building cross-platform pipeline automation and enabling data science team collaboration through system standardization.

Read Previous Post

Blog

Polls, Forecasts, and Fake News

John Carney

Data-Informed Voter Series

Forecasts vs. Polling

Polling Issues in the 2016 Presidential Election

Why “You Can’t Trust Polling!” is Fake News

So What Can We Learn?

Data-Informed Voter Series

Related Data Stories Blog Posts

Measuring Success in A/B/n Testing Programs

What’s the Optimal Success Rate for an A/B Testing Program?

Google to Auto-Migrate UA Properties to GA4 Unless You Act Now

Sign up to receive analytics tips and insights straight to your inbox.

Written By

John Carney