Rice University logo
 
 

Archive for the ‘Uncategorized’ Category

False Positive Rate over Alpha (Policy Based Simulated Elections)

Wednesday, November 28th, 2012

In the previous post and step 5 of the controlling FDR procedure used to test if the second digits resemble the Benford’s distribution, we used a fixed alpha equal to 0.05. If this alpha is increased, the chi square statistics from the vote counts are less likely to pass the test. In the other hand, if alpha is decreased, then the chi square statistics are more likely to pass the test. Therefore, in order to measure the effect of alpha, we have generated 200 elections using our Policy Based procedure (explained in previous posts) and evaluated FDR in these elections registering wether each election passes the test with the specific alpha being currently used. Then for that alpha, we calculate the percentage of elections that did not pass the Benford test. This percentage we call FPR (false positive rate) because each of the tested elections are genuine elections not containing any manipulation.

We have calculated the FPR for different alphas in the range of 5*10^(-7) up to 0.05 in logarithmic steps of 0.2. The following figure shows the alphas in the logarithmic scale vs the FPR.

The next step will be to generate another 200 new elections using the same Policy Based procedure, include a type of fraud, and calculate the FNR (false negative rate) in order to see the effect alpha on the missed spoiled elections by Benford’s Law.

Improvement of Controlling False Discovery Rate (FDR)

Saturday, November 24th, 2012

As it was discussed in previous posts, we had encountered the anomaly that our simulated elections were failing the chi squared hypotheses testing described in “Election Forensics: Vote Counts and Benford’s Law” (Mebane 2006) for the distribution of the second digit in the set of vote totals at the precinct level inside each county. Figure 1 below shows the formula used to generate the second digit chi square statistic where “qB2i” denotes the expected relative frequency with which the second significant digit is “i”; “d2i” denotes the number of times the second digit is “i” among the J precincts (inside a chosen county); and “d2” denotes the total number of digits 0 through 9 present.

Figure 1. 2BL Chi Square Statistic

Initially, we were using this chi statistic transformed as the corresponding p-value in order to compare it with alpha=0.05. If the chi statistic being observed was below alpha, then we assumed the null hypothesis to be rejected (H0: The second digits of the observed set reflect the Benford’s distribution).

Figure 2. Chi Square Distribution. Red: 16.919. Orange:30.6537

The reason why our hypotheses testing was failing so often was because our alpha=0.05 (chi sq stat=16.92) was very high therefore allowing many statistics to fail.  A way to fix this problem is to decrease the alpha value so to increase the window for accepting the null hypothesis. The way Mebane 2006 accounts for this observation is by using the procedure of False Discovery Rate from Benjamini and Hochberg (1995). Therefore, the way we implement the controlling of FDR is as follows:

1. For every county
– For every candidate’s totals at the precinct level inside county
– Calculate the chi square statistic for the second digit
Note: Step 1 produces three chi square statistics at every county (one for each candidate).

2. Let t = 1,…,T, T the total number of chi square statistics in entire state.
-i.e. Alaska has 442 Counties. If there are three candidates, T=3*442=1,323

3.Put all the chi square statistics into list “St” and convert them to the corresponding p-value.
– Using Chi Square with 9 degrees of freedom and disabling lower the lower bound (by assigning p-value = 1-chisquarePvalue)

4. Sort the p-values in “St”  form smallest to greatest to produce “S(t)” (the sorted list).

5. Choosing test level alpha=0.05, then we calculate S(d+1) > (d + 1)α/T  and find number d which denotes the number of tests rejected by the FDR criterion.

Figure 2 shows the smallest new alpha=0.00034 (chi sq stat=30.65) that needs to be crossed in ordered to trigger the rejection of H0.

With the implementation of such procedure, we have been able to produce a close to 0% false discovery rate.

Future reports will cover how we are injecting fraud on an election and how Benford’s Law is useful in detecting the correct county or even candidate which was favored in the manipulation.

Implementation of FDR

Thursday, November 22nd, 2012

The problem mentioned in the previous blog of having the false positives as 10% was able to reduced to 0.04%. This  was made possible with the help of concept called as False Discovery Rate (FDR) which is used for the removal of Type 1 error (False positives) more details is explained in the paper “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” This actually helps in modulating the critical value which is used as the basis for the criterion for the rejection of the Hypothesis (which in our case is the deviation from 2BL). By modulating the critical values properly we were able to reduce the number rejections significantly which in the previous versions was fixed as 0.05 goes on changing depending upon the number of counties the state exists.

Along with that the other gain we made was we scale our system to do the election and verification for the multiple states instead of just one. Right now we had 8 states involved in our experiment which are Alaska, Delaware, Washington D.C. , Hawaii, North Dakota, South Dakota, Vermont and Wyoming. There are some  difficulties faced in collecting the census data for other states but we are trying to collect more data samples.

After this step now we are trying to modulating and extending the work done in our Module2-phase1 by improving the quality of  frauds in such a way that they would go unnoticed by the 2BL.

Some of our earlier samples of results:

Pmanu : 0.2062639  Pchelsea : 0.9531511   Parsenal : 0.1787494  threshold was: 0.01  for state : Hawaii
Pmanu : 0.6252274  Pchelsea : 0.1884166   Parsenal : 0.8031289  threshold was: 0.001724138  for state : Alaska
Pmanu : 0.4862842  Pchelsea : 0.8163978   Parsenal : 0.5799487  threshold was: 0.0009433962  for state : North Dakota
Pmanu : 0.9049317  Pchelsea : 0.5891186   Parsenal : 0.1960104  threshold was: 0.0007575758  for state : South Dakota
Pmanu : 0.08608333  Pchelsea : 0.8747973   Parsenal : 0.4532776  threshold was: 0.003571429  for state : Vermont
Pmanu : 0.498102  Pchelsea : 0.04824834   Parsenal : 0.09058552  threshold was: 0.002173913  for state : Wyoming
Pmanu : 0.9393464  Pchelsea : 0.3613848   Parsenal : 0.08792899  threshold was: 0.01666667  for state : Delaware
Pmanu : 0.05444741  Pchelsea : 0.005691999   Parsenal : 0.2015466  threshold was: 0.001282051  for state : Washington, D.C.
Tries Passes MANU-FAILED CHELSEA-FAILED ARSENAL-FAILED
1     1      1           0              0              0

where Pxxxx is the critical value for the party in state: xxxx and its goes below threshold: **** 2BL is triggered, but right now every value is far very high then the threshold and so no False Positives are observed.

 

 

Reducing the Numbers of False Positives

Sunday, November 18th, 2012

To reduce the number of false positives mentioned in our previous blog and to have the more realistic behavior in voting, instead of just randomly distributing the votes in the precincts among the candidates, we planned to do with different approach. The new approach we tried was to simulate the voting based on the policies.  In this mechanism we selected 8 different set of policies like :

1:Should abortion remain a legal option in America?;
2:Should law enforcement be allowed to use racial profiling?;
3:Should the federal deficit be reduced without raising any taxes?;
4:Are the March 2010 federal health care reform laws “Obamacare” good for America?;
5: Should state and local law enforcement be empowered to enforce federal immigration laws?;
6: Should gay marriage be legal?;
7: Should marijuana be a medical option?;
8: Should the wealthiest 1 of Americans be taxed more heavily?;

source for policies(http://2012election.procon.org/view.source-summary-chart.php)

and after this we allocated the likeliness of each policies for all voter  whose value comes from uniform distribution and from that  we calculated the distance of voter with each candidate and the candidate with the min distance is the person whom the voter would vote.

The results of this mechanism conforms was better than than our previous approach as by this the number of false positives reduced from 30% to 10% but the downside is this approach is much slower than our previous one.

So the next step is

1) First to reduce the false-positives from 10% for this we have started working on the approach called as FDR(False Discovery rate) .

2) Will be trying to make this process faster and will try to simulate for whole USA, as right now we are just doing only for one state(Alaska) which is small and have total of 442 precincts.

Policy Based Election

Friday, November 16th, 2012

We have been closer studying how to generate vote counts for three candidate for each precinct since commonly there are at least two dominant candidates in the US presidential elections and others candidates which can be compacted into one. Therefore, in order to produce a more realistic election, we implement the behavioral model form the paper “Condorcet Efficiency and the Behavioral Model of the Vote ” (Adams 1997) which works by calculating the distance (Figure 1) from each voter to every candidate based on “m” number of policies.

Figure 1. Quadratic Loss Function

In the previous equation, Pi(K) denotes the distance from voter i’s polices to candidate K’s polices. “xij” denotes the degree to which voter i leans to favor or oppose policy j and “bij” is the weight that voter i associates to policy j.

Module 2 and 3 Phases 1 Done

Wednesday, October 31st, 2012

We have finished implementing the following modules:

Module 2, Phase 1: Introducing basic fraud techniques such as ballot stuffing and faulty vote machines.

Module 3, Pase 1: We have implemented the Bedford test which tells us wether the resulting counts aggregated to the county level resemble the Benford’s probability distribution of the second digits. If the Benford statistic for a specific candidate produces a p-value lower than 5% then the counts for such candidate are said to be untrusted.

We have encounter a problem where even though our simulated election on one state is not tampered with, the counts for one candidate don’t pass the Benford test. The reason is because the kind of complexity that can produce counts with digits that follow Benford’s Law refers to processes that are statistical mixtures (e.g., Janvresse and de la Rue (2004)), which means that random portions of the data come from different statistical distributions. So the way we are randomly assigning votes to each of the candidates needs to be rethought. There are some limits that apply to the extent of the mixing, however. If the number of distinct distributions is large, then the result is likely to be well approximated by some simple random process that does not satisfy Benford’s Law. So if we are to believe that in general Benford’s Law should be expected to describe the digits in vote counts, we need to have a behaviorally realistic process that involves mixing among a small number of distributions.

Module-1 Phase-1 Over

Sunday, October 7th, 2012

This phase is implemented, in this phase we were able to collect the information about various states of USA up to their precinct level like the population,id,county relation, sex  and others. In this phase we just concentrated on the population of  the  precincts and left the other aspects like sex, races, economic background for other phases. We were in this round was able to generate the random votes for each candidate of precincts and then we collated those votes for the county level and then went up  to the state level. At this level for the reason of simplicity we just followed “Winner Takes All” policy and allocated all the electoral of that state to the winning candidate and assigned 0 for other candidates. For this round we played with 3 parties only “MANU” , “CHELSEA” and “ARSENAL”.

Abstract

Friday, October 5th, 2012

After declaring an election winner, the question of whether the outcome truly reflects what people have voted rises among skeptics of the democratic process. To assure that all the  elements involved in the election process are infallible and no fraudulent activity has taken place is still questionable. Furthermore, even if those elements were actually present, could their profusion could make a significant impact in selection of an undeserving candidate? In this project we aim to answer all these questions by analyzing the impacts of different fraudulent activities on the election system by simulating a mock election, based on the United States Presidential electoral system. Our target is to build a simulated election that would contain a mixture of fraudulent events to help us analyze the fraud detection methods such as Benford’s law and Risk-limiting audits.