There have been a huge number of polls of close states in the US election that have been within a point, typically 48-47 with the rest going to small parties or undecided. In the last couple of days there have been a few articles about how unlikely these polls are.1
I just wanted to make a quick note about why the result might not be at all surprising, and why it might tell us something important about the limits of polling.
First start with the toy story about why you might think it is surprising. Imagine that in some particular election, the actual population is split 50/50. And also imagine that every poll is run by simply finding 500 people at random, asking them how they’ll vote. Circumstances are ideal; the people are selected at random, and they all tell the truth. Then you’d expect a binomial distribution of voters, with a good number of polls coming out 52/48, 53/47, and even the odd poll 54/46 or beyond.
What has been widely noted is that we’re not seeing anything like that, especially in Wisconsin. But also, that’s not at all how polling works in 2024.
So take a case that’s also very stylised, but in one respect more realistic. The election is split because the populace is split on religious grounds. About half the population are from religion A, and they all support party A. The other rough half of the population are from religion B, and they support party B. There was a big survey done a little while ago, which said that As and Bs were each exactly 50% of the population.
The pollsters in this election weight their sample by religion. So if they get 275 religious As, and 225 religious Bs in the sample, they count each religious A for 10/11 of a voter, and they count each B for 10/9 of a voter, so the samples by religion match their belief that the electorate is, by religion, 50/50.
What happens? Well, every poll comes out precisely 50/50. And it does so even if everyone is sampling completely at random, that everyone tells the truth, and so on.
Now this example is unrealistic in two respects: there are no demographic characteristics that map 100% onto political support, and not every pollster uses the same weights. But these are exaggerations, not completely made up facts. There are demographic characteristics that map 90% or more onto political support, e.g., who one voted for last time, and which party one ‘identifies’ with. And while pollsters do not use the same weights, the reputable ones do (with the notable exception of Quinnipac) use similar ones. So while we shouldn’t expect every poll to come in exactly the same way, to the extent that the real world resembles my toy example, we should expect them to come in pretty close.
Note that I have not said that the pollsters are engaged in any funny business of choosing the weights after seeing the results, or simply suppressing the results that look surprising. In the story I gave, they could preregister their weighting scheme, and publish every poll, and you’ll still get everyone saying 50/50.
This does not mean the election will be 50/50. Everything turns on the accuracy of that initial survey of the population by religion. If it’s wrong, all the polls will be wrong, and in the same direction.
That, I suspect, is what is happening in the real world. The polls are using similar weighting schemes. Those weighting schemes are reasonable but fallible. If they are wrong, they are wrong in ways that will lead to correlated errors.
My extreme view is that given how much work is being done by the weights, and how big the error bars are in the measurements that go into those weights, there isn’t a lot of evidential value in any of the polls. Just saying “It will be the same result as last time” will probably get you just as close to the final score as carefully analysing the polls. Saying “It will be the same as last time except in Nevada, where the early vote looks bad” might do even better.
See, for example, Nate Silver, who treats the polls as being like the first model I describe, and calculates a 1 in 9.5 trillion chance of getting results this close; Robert Tait in The Guardian, and Josh Clinton and John Lapinski at NBC. Clinton and Lapinski do note that weights could be important here, but they highlight the possibility of changing the weights after seeing the raw data. The Washington Post had a very good article on how weights work. I’m sure there was some discussion on ex-Twitter about the point 'I’m making here, but I can’t find the links and Twitter search is useless. So I’m not claiming much originality for this, except that my toy model involving religion is I think a bit easier to understand than any version of the point I saw there.