Le sondage du NouvelObs Ifop-France-Soir publié vendredi 22 juillet: Un cas classique de mauvaise méthodologie
NouvelObs Ifop-France-Soir poll: Lack of accuracy and rigor due to bad methodology
By: La Septieme Wilaya
I was reading the NouvelObs this morning and an article handicapping the French presidential race caught my eye. It was a poll conducted and published by Ifop-France-Soir (published Friday July 22nd). It was not so much the results of the poll that triggered my curiosity or the absence of sampling error (you have to go to the Ifop website and download the entire poll to find the sampling error), but the methodology; it was an online poll done according to the quota sampling methodology. When I read this–i.e., quota sampling and online polling–I am usually very suspicious of the results.
So what is the problem with quota sampling or as it is more accurately known non-probability quota sampling? (Henceforth, I am going to refer to non-probability quota sampling or quota sampling as non-probability sampling). Well, before we focus on the problems of non-probability sampling, we first need to understand what it is. Unlike probability sampling, non-probability sampling is not based on random sampling. With random sampling, each member of the population has the same probability at being included in a given sample. So, non-probability sampling is not based or cannot depend upon the rationale of probability theory. If the probability that each member of the population could be selected in the sample is not the same, therefore a sample derived by non-probability sampling technique may or may not be representative of the entire population. With this in mind, the accuracy and rigorousness of the study (in this case the Ifop-France-Soir poll) based on non-probability sampling is at best shaky or at worse total nonsense.
The built-in assumption in non-probability sampling is that the assembled sample has the same proportions and the same characteristics and traits as the entire population. Though this is feasible, it is extremely hard to make sure that our sample of individuals gathered to be asked a certain set of questions about a certain number of candidates has truly all the proportions and characteristics of the entire population without random sampling. This is by no means saying that non-probability sampling can never be used. There are instances, of course, where non-probability sampling is preferable (and we leave that conversation aside for another day), but certainly not in the case of assessing the opinion of the public in an election year.
To understand why non-probability sampling lacks accuracy, we need to understand how it is usually done? First, the pollster divides the population into exclusive subgroups. Second, the pollster must identify the proportions of these subgroups. Then s/he selects a set of individuals from the various subgroups while taking into consideration the proportions mentioned in the previous step. The final step is to make sure that the sample is representative of the entire population. Not to bore the reader with statistical mambo-jumbo, a set of standard controls is used to ensure that the proportions of the subgroups and the individuals selected are representative of the entire population. In most cases, three controls are used generally: sex or gender, age, and social status. Each control is usually divided into sub-categories like age brackets or income levels. However, it is up to the interviewers or the pollster to determine which individual belongs to which subgroup. So, there is a wide room for interpretation here; and who says room for interpretation says measurement error and systemic bias as well. And we go from having a representative sample to having a skewed and biased one.
Although non-probability sampling might have the appearance of a sampling technique that is representative of the population, well it is not, especially in public opinion surveys. We need always to remember that only the selected characteristics of the population are taken into consideration in forming the subgroups. What does that mean? It means that the characteristics forming our subgroups are most likely overrepresented in relation to the entire population. If, as is the case of the Ifop-France-Soir poll, we are only selecting our subgroups on gender, age, and professional status, there are plenty of other socioeconomic characteristics that we are not controlling for in our sample. By not controlling for these characteristics, our sample has most likely a skewed representation of the characteristics chosen—i.e., overrepresentation of the characteristics chosen and under or overrepresentation of the characteristics not chosen (since we really don’t know). Briefly stated, we really do not know that the sample we have is truly representative of the entire population, therefore the numbers derived from such a sample are statistically meaningless. Add to serious problem, internet polling where subjects self-select themselves to answer a set of question, and you have a poll that lacks scientific basis. Briefly stated, a poll that cannot be trusted.
In sum, serious polling or serious public opinion researchers tend to prefer probabilistic or random sampling methodologies over non-probabilistic quota sampling ones because we simply consider random sampling to be more accurate and rigorous. It is backed by probability theories and years of scientific research.