How angry should we be with the pollsters?

Matthew Hirschler
8 min readNov 5, 2020

David Runciman said as he was digesting the initial results of the US election that ‘the polling was not good, and people’s expectations were gravely disappointed’.

But let’s unpack that for a moment. The polling and people’s expectations are two different things.

Clearly, polling informs people’s expectations, but, there is a gap between the survey results polling produce and how that is interpreted by the public. Let’s look at the polling first and then come back to the ways people’s expectations have been managed later.

Polls operate with margins of error. Any poll that extrapolates the intention of voters by taking a survey sample has to control for random variation in the sample. The margin of error is larger for smaller samples because statistically, a small sample is more likely to randomly find a group of people who tend to vote in one way or another.

The margins of error that one should work too with a random sample are listed below:

That essentially means if you poll 2400 randomly selected people and that projects a 50/50 race between two candidates the range of possible results is 49/51 one way or 51/49 another way.

The issue political opinion polling runs into is that the act of answering a pollster’s question over the phone or via internet advertising contaminates the sample. People willing to do those things have certain characteristics and therefore not perfectly random; they are more likely to interested in politics; they are more likely to trust institutions; they are more likely to be time rich; the list goes on.

To overcome those biases pollsters have to reweight samples and increase the margins of error. Reweighting samples with a candidate like Trump in a Covid context is difficult. He’s not a typical Republican candidate; it’s not a typical time.

It’s also worth noting that because of the election system, where States provide electoral college votes, the only polling worth looking at is State polling. National polls are as meaningless as the popular vote in the real election.

The first thing anyone looking at any poll should look at is the sample size, because that provides the margin or error that any data should be viewed with.

Much of the state polling that informs the poll of poll models have samples as low as 600. That means that for some polling we should expect a margin of error of at least 4%, which in electoral politics is a huge margin.

The way that virtually everyone interacts with polls is via the media. To take an example The FT*, like many other news organisations, combined polls from key states and showed that Biden was on course for victory in reporting and with charts like this:

For a state to be included in this chart it must have had more than one recent poll and an average poll margin of less than 10 percentage points. It’s not clear what the sample sizes were, so anyone reading the chart has to do so with extreme caution — I’d allow for at least 4 percentage points of variance without further explanation (which there isn’t).

That’s not the way most people read this coverage — it’s not the way it’s presented. Biden supporters (and anyone with a modicum of human decency) would be encouraged by the FT analysis; it showed that Biden would win the election and even had a chance of winning Republican strongholds like Texas.

What is so striking about this chart is that 0 is the definitive line. Every state is either a Trump lead or a Biden lead. A responsible representation of the data would be to plot a margin of error corridor where any state where the candidates are only separated by 4 percentage points are ‘too close to call’.

Trump has quite rightly been derided for calling the election while it was still close to call.

It’s important to recognise that one of the key reasons he can do it is that shoddy reporting of opinion polling means the media call the election in the places to close to call, and then any slight deviation from the expectations in early results can be manipulated.

This is Trump’s ‘re-election’** strategy. Make election night look like a success, compound expectations and then cry foul play when Democrat votes turn up in the mail.

It wouldn’t work if people’s expectations weren’t badly mismanaged by the way the election is reported. The Republican party knows this and realises the way to cast doubt on the legitimacy of an election they are likely to lose is to sequence information in an advantageous way.

There is a reason that Democrat votes will be counted later. As the Guardian reported this morning:

Did you know that much of the stress, agitation and uncertainty about the election result in the United States over the past two days did not have to happen?

That the drawn-out ballot counts we saw and are seeing in Wisconsin, Michigan and Pennsylvania do not owe to the races being particularly close in those states, which they were not, but to artificially produced bottlenecks?

The long counts are another kind of voter suppression, the product of rules imposed in those states by Republican-controlled legislatures that in Wisconsin and Pennsylvania allowed for no early processing of the mail-in vote — despite the pandemic — and in Michigan allowed for only one day of early processing.

The sense of there being a dynamic in these races in which Biden “came from behind” is artificial, the result of vote tallies from densely and highly populated, disproportionately Democratic areas — ie, cities — taking longer.

Everyone saw this problem coming. They also saw how Trump would attempt to take advantage of the uncertainty by stealing the election, which he is, although the effort, as historically dangerous and destructive as it is, does not look particularly brilliant.

Trump knew that to pull off this strategy he’d have to create a sense that he was winning, what made that sense so strong was that over-reliance on polls meant many thought it would be done and dusted on election night, which was incredibly unrealistic given the high mail-in votes. The majority of real and legitimate information in the form of results for battleground states was always going to come later than election night conjecture on how Trump was doing.

Back to the FT chart of polls. Knowing what we know now is Biden is on course to win Arizona and every state where he was polling better than Arizona except Florida.

Biden’s lead in Florida at the time this chart was produced was just over 2 percentage points. The current result with 96% of votes counted is Trump ahead by 3.4 so the polls were wrong by about 6 points at this stage.

The last 4% of ballots will favour Biden, which means the polls will end up 5–6 percentage points out. That is an example of bad polling, it’s beyond a reasonable margin of error.

Every other poll has called the likely result correctly apart from North Carolina which is still too close to call.

That’s not the same as saying all the polls were accurate. The Trump vote was pretty significantly underestimated in Wisconson where Biden won a closer races than predicted. Trump’s vote was also underestimated in Iowa and Ohio which were projected to be narrow wins, but were comfortably held. Montana, Kansas, Indiana and Utah were outside shots for Biden, but again the Trump vote was underestimated, he held all those states but the final results are still some way off so we don’t know how wrong the polls were there.

That said, results from the states that the Democrat’s need to get to victory look like they’ll land within the polling’s margin of error.

As I write Biden’s path to the White House will be secured with Nevada and Arizona — both look like they are going his way. It looks like he’ll also add Pennsylvania later and possibly Georgia too which would take him to 306 electoral college votes (although maybe not in that order).

Taken in total this outcome is well within the bounds of possibilities reported by models based on polls. The likes of the Economist who predicted Trump had a 5% chance of winning.

Trump came closer than most thought, which reflects a reasonably poor night for the pollsters. There was an under-representation of the Trump vote, and in quite a few key states that under-representation was beyond the margin of error. However, more than half of the key states showed in the FT chart above have ended up in the margin of error predicted by the polls.

A bad night, but not a terrible night, a lot was called wrong by the polls, but interestingly those wrong calls haven’t translated into actual electoral college results that are anyone reading the polls properly should be surprised by.

The problem is that Biden getting to 270 won’t be the end of the story. Trump will try and force a result by nakedly undermining the democratic process.

It’s a long road for Trump, and a lot has to go in his favour with the remaining votes and in the courts. But if he’s successful it will because he can paint a picture of an election being stolen. The idea he’d had a brilliant night beyond expectations doesn’t stack up if we understood the bounds of possibilities explained to us by the polls.

Essentially, Trump’s strategy wouldn’t work if polls weren’t misreported.

If polling can play a pretty central roll in a demagogue trying to undermine an election then we need to question its role in our political reporting.

The issue we face is there’s a media arms race to the best information and predictions. Polling is quite an easy thing to jump on. Newspapers and media brands can create visualisations of data, interactive online tools and sophisticated modelling to see the future. It’s a race to the best prediction.

The problem with that race is that it doesn’t encourage responsible reporting. Presenting a chart with lots of states blanked out as ‘too close to call’ encourages eyeballs to a competitor who’s happy to present a flimsy prediction — if it’s wrong the pollsters get the blame instead of the respected media titles in any case. The pollsters can take the hit because they will endlessly be commissioned for work by the media in their race for predictions and coverage next time. Naturally they’ll promise to fix their methods and not make the same mistakes again.

The problem isn’t really the methods pollsters use though. This time they didn’t get it that wrong. The problem is the centrality of polling in pre-election reporting. In two-party democracies polls only need to be a bit wrong to be pretty useless. Fighting a political campaign for president in the US is requires each candidate to compile a compelling case that builds a large enough electoral coalition to win. These are competing priorities — the coalition needed is enormous, and every group a candidate adds to their base risks alienating another. That means only in really exceptional circumstances will a candidate win by more than a small margins, a candidate can only afford to compromise their position to the point where they have enough to just get over the line. These elections won by small margins are precisely the things that polls will never be able to predict with pinpoint accuracy.

*I’ve picked on the FT because it’s such an excellent paper, if they’re responsible for shoddy reporting it shows how big systematic the problem is

**it’s not an election strategy really, because elections are about winning votes as opposed to power-grabbing

--

--