Rice University logo
 
 

Proposal

 

1      Motivation
Election has became an important organ of all the government organization to come to any consensus decision. Elections not only is used in making a small decision in the organization (like whether a subsystem should have a that particular functionality) but also in the selection of the president of a democracy. The importance of elections has stimulated the evolution of various mechanisms for vote tampering like voter registration fraud, absentee ballot voter fraud, voter impersonation, and voting more than once. Analysis of several allegation made regarding election fraud have often proved to be baseless. For example, 2000 election held in St. Louis, Missouri, in which several critics have made an allegation of vote tampering, but later such charges could not be substantiated and were rejected, similarly case happened during the Wisconsin 2004 presidential election also where such allegations were made but later dropped due to not having enough evidence. Such allegation often results in recounting of votes which causes billions of dollars to be wasted. The Pew Center for the States mentioned that on an average 15-cents to 30-cents [1] per ballot are spent doing recounting, along with that Kevin Kennedy, director of the Government Accountability Board, mentioned that statewide recount costs more than $1 million for Wisconsin only because of programming costs for voting machines and reimbursement to municipal clerks for their assistance.

2 Method
The project will consist of 3 main modules that will be developed in phases of incremental complexity. This development technique will allow us to build simple versions for each module that will serve as a base structure for the entire system.

I . Simulation of Vote Machines
In this module we are planning to simulate the actual voting system for the United States presidential elections at the precinct level.

(a) Phase 1: This phase will consist of simulating a simple election, which will generate a fixed number of votes for each precinct randomly distributed among candidate A, B, and C. The votes will be aggregated at the state level and the winner will receive the state ́ total electoral votes. For simplicity, the states of Maine and Nebraska will be model with the winner-take-all basis.

(b) Phase 2:In this phase we will refine the election system to account for the special handling of electoral votes for Maine and Nebraska. In addition to that we will map the census tract data
to the precinct level data, focus will be on the population data only for this phase and other aspects of census data like age, sex, race would be considered in the third phase. These data will
be used to generate a total amount of votes that reflect the precinct’s population. Furthermore, we will look into the voter registration datasets for each state and find a way to account for the percentage turnout of each precinct.

(c) Phase 3: This phase will have the purpose to improve the random distribution of votes among the parties by adding the effect of vote controlling factors such as the economic reform, immigration and war policies, and many more. We plan to do this by segmenting the population into groups of similar political views and assigning a probability of impact each of these controlling factors have upon these groups. In order to produce the probability of impacts we plan to use machine learning and data mining techniques to find clusters of patterns of electoral support for previous US elections using the census demographic data. A simpler approach would be to search previous work that has done a similar analysis. The final product of this phase will consist of an election simulation that will receive a value for each controlling factor and according to their impact the votes from each group in each precinct will be distributed among the candidates.

II. Introduction of Frauds in Voting
In this module we plan to measure the impact of the most common fraud techniques by introducing them in the election simulations. By applying the Monte Carlo Simulation [6] we will test diverse ways of applying each fraud technique and will study the level at which such techniques influences the whole election system.

(a) Phase 1: In this phase we will study the most common fraud techniques and device the way of how we will implement them in our election simulator. At the same time we will implement a
simple function which will modify precinct level results by adding votes to the favored candidate while subtracting the same combine amount from the other candidates. We could randomly select
a number of precincts that will exhibit this type of fraud.

(b) Phase 2: This phase will be devoted to the implementation of the Monte Carlo method which will be used to measure the impacts of the simple fraud technique implemented in phase 1. Then we will device a method for finding the level at which such fraud techniques would give a preference to the undeserving candidate.

(c) Phase 3: In this phase we will implement at least other 2 fraud techniques such as voter registration fraud and voter impersonation, then analyze them in the same way as explained in phase
2.

III. Detection of Anomaly in Voting
In this module we will analyze two fraud detection methods by applying them in our election simulator influenced by a mixture of fraud techniques. Since we will know exactly what is the impact of fraud we will be able to interpret the detection methods.

(a) Phase 1: In this phase we will implement the Benford’s Law[3] algorithm that is based on the second digit as implemented by Mebane.

(b) Phase 2: In this phase we will apply the simplest risk-limiting audit[5] method for recounting votes on our election simulator as Stark [4] implemented it.

(c) Phase 3: Once we have implemented more complex phases of modules 1 and 2, we will come back to apply this two fraud techniques and produce a more complete analysis of how they perform.

3  Expected Results
We are expecting to showcase the distribution of votes count for an actual system for an electoral system for USA Presidential election. After that we would be showing the variation in the election results happened due to the introduction of various types of frauds in different conjunction. On the results of election we would be analyzing the Benford ́ Law and Risk Limiting Audit to see their efficacy and try modulating the frauds in such a way that would be able to bypass these tests [7]. As preliminary results we have implemented the Benford’s Law of second-digit distribution on the vote counts for each of the presidential candidates in the past 2012 Mexican general elections[2]. We encountered that the second digit distribution of the candidate who won the election had the lowest p-value and therefore was strongly rejected. In the other hand, the second digit distribution of digits in the vote counts for the candidate who was in second place was strongly accepted. According to the mexican coverage of the presidential process, millions rose claiming that the party of the candidate who won used fraudulent techniques such as vote buying, ballot stuffing, and precinct level manipulation of results in order to win the election.

References
[1] Kirsten Adshead. Vote recount may cost taxpayers $1 million-plus. http://wisconsinreporter.com/ vote-recount-may-cost-taxpayers-1-million-plus, 2011.
[2] Instituto Federal Electoral. 2011-2012 federal electoral process. http://www.ife.org.mx/portal/site/ifev2/Estadisticas_y_Resultados_Electorales/, 2011.
[3] Theodore P. Hill. A statistical derivation of the significant-digit law. Statistical Science, 10:354–363,1995.
[4] P. Stark and M. Lindeman. A gentle introduction to risk-limiting audits. Security Privacy, IEEE,PP(99):1, 2012.
[5] Philip Stark. Risk-limiting vote-tabulation audits: The importance of cluster size. CHANCE, 23:9–12,2010.
[6] Osnat Stramer. Monte carlo statistical methods. Journal of the American Statistical Association,96(453):339–355, 2001.
[7] Wendy K Tam Cho and Brian J Gaines. Breaking the (benford) law. The American Statistician,61(3):218–223, 2007.