Big Data meets politics: Election Fingerprinting

Democratic societies are built around the principle of free and fair elections, and that each citizen’s vote should count equally. However, electoral processes are more complicated than a simple statement. In 7Puentes we wanted to show present trends on Big Data Analysis that interdisciplinary data scientists are developing applying them to analyze Argentina’s 2015 Presidential Pre-elections, held last August. We wanted to identify election  irregularities, if they were. You can be surprised by the results, or not.

Social sciences have been traditionally a controversial area where not so many patterns and laws have been found. Besides intrinsic prediction difficulties on social sciences, the lack of available data had blocked the path. In the past, even if the data was available, for example in the case of politics and elections, few organizations had the computational resources and know-how to analyze and visualize the data.

We have replicated on Argentina’s 2015 Presidential Pre-elections a visual and numeric analysis that was originally published on 2012 [1] including results on elections from many European countries, including Russia and Uganda, suspected of fraud.

A common mistake is trying to use Benford’s Law or similar to detect election fraud, but this kind of quantitative method has not been widely accepted nor useful unless you assure a specific kind of probabilistic distribution in particular digits exists on the data. We made the same Benford’s Law study with no definitive results.

[..], Benford’s law [..] experienced a renaissance as a potential election fraud detection tool [..]. In its original and naive formulation, Benford’s law is the observation that, for many real world processes, the logarithm of the first significant digit is uniformly distributed. Deviations from this law may indicate that other, possibly fraudulent mechanisms are at work. For instance, suppose a significant number of reported vote counts in districts is completely made up and invented by someone preferring to pick numbers, which are multiples of 10. The digit 0 would then occur much more often as the last digit in the vote counts compared with uncorrupted numbers. Voting results from Russia [..], Germany [..], Argentina [..], and Nigeria [..] have been tested for the presence of election fraud using variations of this idea of digit-based analysis. However, the validity of Benford’s law as a fraud detection method is subject to controversy [..]. The problem is that one needs to firmly establish a baseline of the expected distribution of digit occurrences for fair elections. Only then it can be asserted if actual numbers are over or underrepresented and thus, suspicious. What is missing in this context is a theory that links specific fraud mechanisms to statistical anomalies (10).  [1]

In the more interesting analysis of voter turnout very interesting features can be found:

  • It is free from aggregation scale. Similar results are obtained on election units, districts, departments, provinces, etc. Geopolitical aggregation change of scale does not result in contradicting results.
  • It is based on very basic statistical analysis such as frequencies and normal distributions, no complex probabilistic distributions are used such as the Benford’s Law, Second Digit Law.
  • On plain sight many patterns can be observed and then any hypothesis can be easily tested on data. Extreme fraud, units with 100% votes for one party, and incremental fraud, units with a suspicious percentage of fraudulent ballots added, can be detected.

One visualization is plotted per party, at least for the winning party. These plots can be considered a fingerprint of the election with each country showing different patterns. For example countries with mandatory voting obligations versus countries where voting is optional can be distinguished.

In Argentina in 2015 some conclusions can be drawn:

  • There is a significant polarization between the two strongest and antagonic political forces (Cambiemos and Frente para la Victoria), not only in terms of votes but also from a geographical point of view. That means that there are several zones of Argentina where one of these two politicial forces are hegemonic in terms of votes.
  • We cannot identifiy statistics irregularities. There aren’t suspicius vote’s concrentations or distributions. A light dispersion is observed but on plain sight no apparent signs of extreme fraud can be observed.

[1] Statistical detection of systematic election irregularities. PNAS. October 9, 2012, vol. 109. no. 41. 16469–16473 http://www.pnas.org/content/109/41/16469.full.pdf