As it was discussed in previous posts, we had encountered the anomaly that our simulated elections were failing the chi squared hypotheses testing described in “Election Forensics: Vote Counts and Benford’s Law” (Mebane 2006) for the distribution of the second digit in the set of vote totals at the precinct level inside each county. Figure 1 below shows the formula used to generate the second digit chi square statistic where “qB2i” denotes the expected relative frequency with which the second significant digit is “i”; “d2i” denotes the number of times the second digit is “i” among the J precincts (inside a chosen county); and “d2” denotes the total number of digits 0 through 9 present.
Initially, we were using this chi statistic transformed as the corresponding p-value in order to compare it with alpha=0.05. If the chi statistic being observed was below alpha, then we assumed the null hypothesis to be rejected (H0: The second digits of the observed set reflect the Benford’s distribution).
The reason why our hypotheses testing was failing so often was because our alpha=0.05 (chi sq stat=16.92) was very high therefore allowing many statistics to fail. A way to fix this problem is to decrease the alpha value so to increase the window for accepting the null hypothesis. The way Mebane 2006 accounts for this observation is by using the procedure of False Discovery Rate from Benjamini and Hochberg (1995). Therefore, the way we implement the controlling of FDR is as follows:
1. For every county
– For every candidate’s totals at the precinct level inside county
– Calculate the chi square statistic for the second digit
Note: Step 1 produces three chi square statistics at every county (one for each candidate).
2. Let t = 1,…,T, T the total number of chi square statistics in entire state.
-i.e. Alaska has 442 Counties. If there are three candidates, T=3*442=1,323
3.Put all the chi square statistics into list “St” and convert them to the corresponding p-value.
– Using Chi Square with 9 degrees of freedom and disabling lower the lower bound (by assigning p-value = 1-chisquarePvalue)
4. Sort the p-values in “St” form smallest to greatest to produce “S(t)” (the sorted list).
5. Choosing test level alpha=0.05, then we calculate S(d+1) > (d + 1)α/T and find number d which denotes the number of tests rejected by the FDR criterion.
Figure 2 shows the smallest new alpha=0.00034 (chi sq stat=30.65) that needs to be crossed in ordered to trigger the rejection of H0.
With the implementation of such procedure, we have been able to produce a close to 0% false discovery rate.
Future reports will cover how we are injecting fraud on an election and how Benford’s Law is useful in detecting the correct county or even candidate which was favored in the manipulation.