How Data Analysis Drives Election Day

Data and analytics are driving more than e-discovery. We’ve talked about this before: smart analytics tools and exploding availability of data occur in every corner of modern life. At home, they suggest what music to listen to next, what books to read, what movies to watch. At work, they tell us if an employee may be committing a data breach, how closely related one document is to another, and the likelihood that a custodian’s text message is responsive to an e-discovery request.

In the public eye, they also give us insights into diverse cultural phenomena. Where are most retirees choosing to settle down, and why might that be? What demographic is most inclined to pursue a degree in sociology? Who is going to win the next presidential election?

In the legal industry, we’re just starting to implement data analytics in our daily discourse. Analytics usage is on the rise among corporate legal teams, and they’re finally seeing the benefit that other industries—including election analysts—have seen for years: there are much more insightful lessons to be learned from data when it is compared with, categorized amongst, and contrasted against more data.

The History of Election Polling

Some of the most famous early election polling harkens back to 1920. During that election—as well as those in 1924, 1928, and 1932—the Literary Digest conducted a straw poll to predict the nation’s next president. Readers mailed in cards with their choice, and the publication evaluated them to suggest a winner. For each of these four elections, the Literary Digest predicted correctly.

Come 1936, though, a new polling name came to town. Leading up to that election, the Literary Digest predicted, based on another readership poll, that Alfred Landon would win the presidential title. George Gallup—yes, that Gallup—however, suspected that they’d get it wrong. Based on his own poll, Gallup predicted that Franklin Delano Roosevelt would be the big winner—and even went so far as to predict just how wrong the Literary Digest would be. He was correct on the winner, and correct on the Literary Digest’s error within about 1 percent.

The problem for the Literary Digest poll came from its methodology. While a large sample of 2.4 million people participated, the Literary Digest had reached out to them via some rather skewed mailing lists: first, their own readers; second, registered car owners; and third, telephone users. While these lists were readily available, all of their names represented Americans with above average incomes—after all, magazine subscriptions, automobiles, and telephones were luxuries in the middle of a Great Depression. This meant the poll was sorely lacking a holistic view of data on Americans’ opinions about the election, much like you’d be lacking a holistic view of your e-discovery data if you referenced only one type of document using analytics. For example, what might you be missing in clusters that included emails but not their attachments?

Gallup, meanwhile, used statistical random sampling to gather his stats from a much smaller—but much more diverse—group of just 50,000 people. (Random sampling also contributes to workflows like technology-assisted review, in which the computer is able to learn the responsiveness of various concepts in a data set as expert reviewers offer it examples of responsive and non-responsive documents.)

In hindsight, the 1936 Literary Digest poll has been called “the mother of all botched political polls.” It was also the beginning of the science of analyzing public opinion.

Gallup and the organization that bears his name have since been known as one of the nation’s go-to sources for accurate and sound election polling, among many other types of public opinion polls. Statistical random sampling has been the methodology of record for decades.

Today’s Polling Methodologies

While this historical approach has long been a reliable and fairly accurate way of predicting elections and gauging public opinion, times are changing.

Today’s polling efforts are often inhibited by the differentiation of cell phones and landlines, as well as the generally lower response rate of citizens. As The New York Times put it last year, “We are less sure how to conduct good survey research now than we were four years ago, and much less than eight years ago.”

Hurdles like these are also making the commission of high-quality polls far more expensive, thus limiting the number of polls conducted, and introducing opportunities for unfortunate compromise on that quality.

So, how is the world compensating? First, by trusting the methodology and supporting polling efforts. And second, by comparing polling data with yet more data so that, together, they can shed new light on public opinion.

Nate Silver, founder and editor in chief of FiveThirtyEight, is a well-known public opinion analyst with an influential following. He has pointed out the vulnerabilities of the polling system, but he’s also expressed great support for their necessity and hope for their continued success.

FiveThirtyEight is a powerful example of how complex data analytics can offer fascinating insights into any subject—whether you’re analyzing Denzel Washington’s highly successful film archetypes or evaluating the accuracy of scientific research.

The election predictions showcased on FiveThirtyEight dig deeper than polling data. In his overall predictions, Silver uses polling data in combination with historical data and current data on the economy to identify, state-by-state, where each candidate stands on a given day.

The publication also takes things deeper on a regular basis, with articles that examine demographics, pop culture data, and personal sentiments that provide more insight into the thinking of one candidate’s famous supporters or another’s average supporters, as well as broader historical data that influences the topical issues throughout the campaigning period.

This is the future of election forecasting. We’re moving beyond simple polls to real, data-driven predictions.

In a similar vein, analytics tools for e-discovery help case teams get deeper insight into their data—often made up of increasingly diverse types of information. Rather than simply thumbing through documents in isolation, one at a time, analytics can help spot trends from a bird’s-eye view of the data set. Tools such as clustering and customized dashboards display data trends on the whole. Teams are now able to actually see how search terms interact across an entire data set, or how a certain custodian’s data involves a key concept in the case—and the pictures these perspectives can paint for case teams can yield more robust review strategies.

All of that means weaving stronger threads from a large unit of information with complex stories to tell—not unlike the ways in which diverse data sources can tell pollsters a lot more about public opinion than just one variable otherwise would.

Why Polled Public Opinions Matter

Especially in a country as large and diverse as the United States, conducting polls provides invaluable insight into what the national community is thinking—and it’s been shown that these insights drive our elections in no small way. Whether it’s by influencing campaign strategies or creating a bandwagon effect, election predictions can change the elections themselves, and therefore, they can change the world.

In the modern era of big data, analytical capabilities, and just plain smart people, we can use connectivity and creativity to have a real impact on our work as well as our world.

Sam Bock is a member of the marketing team at Relativity, and serves as editor of The Relativity Blog.