Question the Data: Go Beyond Fake News

Illustration by Devon Manney

One of my favorite television shows was Numbers, which ran from 2005-2010. It was about a prodigy who helped solve crimes using math. The tagline was, “We all use numbers every day.” It’s true, whether we’re checking the time, running our credit cards, or counting change. We use numbers even when we aren’t looking; after all, computers and the Internet are built on numbers —ones and zeros, to be exact.

But do we understand what’s going on with those numbers? Do we understand the data and how it’s driving policy? Most of the time, we don’t. Look at the pseudoscience around vaccines. According to the World Health Organization, 140,000 people died from measles in 2018, down from about 2.6 million per year before the measles vaccine was introduced. No deaths from the vaccine have been reported. However, the anti-vaccine movement, based on a fraudulent study by Andrew Wakefield, continues to propagandize. We have only to check the daily newsfeed to see the effects of pseudoscience in our current coronavirus crisis.

Innumeracy around polling may not be as deadly as that around disease, but it can be quite harmful. For example, a Monmouth University poll in Pennsylvania shortly prior to the election showed Joe Biden leading Donald Trump by 7%. This poll was based on just over 500 likely voters. The final count came in at a 1.2% Biden lead. Because the predicted difference was so small, had more people stayed home rather than going to vote, the result could have been disastrous. Such miscalculations can come from not factoring in the margin of error (typically around 3-4%) or because the poll sample was skewed.  For example, if polling firms call only voters with landlines, they might reach a higher percentage of older voters than are in the population as a whole.

This isn’t an academic question. Polls in the presidential races of 1936 and 1948 incorrectly predicted that Republicans Alf Landon and Thomas Dewey would win their respective races in landslides, based on early skewed polling and early voting numbers.

In 2016, pollsters told us that Hillary Clinton would take both the Electoral College and the popular vote. In 2020, despite today’s more sophisticated techniques that supposedly adjust for class, gender, and race, polling indicated a big win for Democrats. Instead, Democrats eked out narrow margins in the House and Senate. We on the Left want to win, and polls can be designed to help us understand how to advance our ideas and boost our candidates. That is, they can if we design polls to tell us what we need to know from people who have the information.

Beyond just looking at flawed polls, however, we must understand how math (and science) affect policy. Consider climate change. We know the average temperature on earth is higher than it was only decades ago. And so, we support a Green New Deal and oppose the Dakota Access Pipeline. But math, science, and, in particular, data science have an impact on other climate-related issues; and until recently, few were questioning assumptions about race, sex, and class. For instance, we know that climate change means hotter summers and more hurricanes, but do we think about how those hotter summers affect people without access to air conditioning or how crop failures could lead to food scarcity, especially in poorer communities? Do we notice how vast migrations from drought-stricken areas fuel anti-immigrant sentiments and regional warfare?

Leading up to the financial crisis of 2008/2009, deregulation in the financial industry had allowed banks to engage in risky hedge fund trading, often involving mortgage-backed securities. The profitability of these financial products led to pressure on mortgage lenders to offer lower and lower interest rate mortgages in order to reel in more home buyers.  The availability of low-interest mortgages drove up demand for houses and thus home prices.  Many of these low-interest loans, however, started reverting to much higher interest rates around the same time the “housing bubble” burst, owing to rising interest rates in general and a stalling housing market. House values plummeted, leaving homeowners “under water,” that is, owing more money than the house was now worth. Only those who had some grounding in mathematics would have understood what a mess they were walking into when they signed the contract.

School testing, redlining, and the use of algorithms in hiring as well as advertising are all data-based activities that affect our lives. We must examine how the biases of data scientists are incorporated into models that predict educational achievement or health care needs. In the current pandemic crisis, we learned only after it was on the market that one vaccine had not been tested on enough people over the age of 65 to determine its usefulness with that population.

Do you know how the digits in your ZIP code affect your auto insurance rates or mortgage rates? Do you know how much your smartphone can reveal to everyone from advertisers to law enforcement?

Data can be a resource for the Left as well. In January, the Washington Post reported that a consumer protection group had shown that factors such as education level and occupation are used by auto insurance companies when setting rates. Although this may not be a surprise, it is helpful to have this kind of data as part of our argument for change.

A left slogan should be “Question the Data.” Ask, “What’s the date on the information in this email I’m being asked to forward?” “What’s the source for this statement?” “Who was involved in this supposedly scientific study?” Lives depend on our looking at, and behind, the numbers.