Convened by the American Association for the
Advancement of Science
Main | Participants
My recent research that is most relevant for this conference goes the
heading “Election Forensics.” I am writing a book for which that is the
working title. The book brings together work I have done since the 2000
American presidential election to develop statistical tools for diagnosing
election anomalies and possibly detecting election fraud. Two tools I’ve
developed include methods for robust estimation and outlier detection
and methods that use the second digit Benford’s Law. Probably the most
efficient way to communicate the scope of this work is to include the
abstracts from a few of the papers I’ve produced or am working on.
The Butterfly Did It: The
Aberrant Vote for Buchanan in Palm Beach County, Florida (with Jonathan
N. Wand, Kenneth Shotts, Jasjeet S. Sekhon, Michael Herron and Henry E.
Brady). 2001. American Political Science Review 95 (December): 793–810.
We show
that the butterfly ballot used in Palm Beach County (PBC), Florida, in
the 2000 presidential election caused more than 2,000 Democratic voters
to vote by mistake for Reform candidate Pat Buchanan, a number larger
than George W. Bush’s certified margin of victory in Florida. We use multiple
methods and several kinds of data to rule out alternative explanations
for the votes Buchanan received in PBC. Among 3,053 U.S. counties where
Buchanan was on the ballot, PBC has the most anomalous excess of votes
for Buchanan. In PBC Buchanan’s proportion of the vote on election-day
ballots is four times larger than his proportion on absentee (non-butterfly)
ballots, but Buchanan’s proportion does not differ significantly between
election-day and absentee ballots in any other Florida county. Unlike
other Reform candidates in PBC, Buchanan tended to receive election-day
votes in Democratic precincts and from individuals who voted for the Democratic
U.S. Senate candidate. Robust estimation of overdispersed binomial regression
models underpins much of the analysis.
Robust Estimation and Outlier
Detection for Overdispersed Multinomial Models of Count Data (with Jasjeet
Sekhon) American Journal of Political Science 48 (April): 392–411.
We develop
a robust estimator—the hyperbolic tangent (tanh) estimator—for overdispersed
multinomial regression models of count data. The tanh estimator provides
accurate estimates and reliable inferences even when the specified model
is not good for as much as half of the data. Seriously ill-fitted counts—outliers—are
identified as part of the estimation. A Monte Carlo sampling experiment
shows that the tanh estimator produces good results at practical sample
sizes even when ten percent of the data are generated by a significantly
different process. The experiment shows that, with contaminated data,
estimation fails using four other estimators: the nonrobust maximum likelihood
estimator, the additive logistic model and two SUR models. Using the tanh
estimator to analyze data from Florida for the 2000 presidential election
matches well-known features of the election that the other four estimators
fail to capture. In an analysis of data from the 1993 Polish parliamentary
election, the tanh estimator gives sharper inferences than does a previously
proposed heteroscedastic SUR model.
The Wrong Man is President!
Overvotes in the 2000 Presidential Election in Florida. 2004. Perspectives
on Politics 2 (September): 525–535.
Using ballot-level
data from the NORC Florida ballots project and ballot-image files, I argue
that overvoted ballots in the 2000 presidential election in Florida included
more than 50,000 votes that were intended to go to either Bush or Gore
but instead were discarded. The primary reason for this was defective
election administration in the state, especially the failure to use systems
that warn the voter when there are too many marks on the ballot and allow
the voter to make corrections. If the best type of vote tabulation system
used in the state in 2000—precinct-tabulated optical scan ballots—had
been used everywhere in Florida, Gore would have won by more than 30,000
votes. The experience in Florida points to the need to gather ballot-level
data to evaluate the success of election reform efforts now underway in
much of the United States.
Inferences from the DNC Provisional
Ballot Voter Survey. 2005. Section V of Democracy at Risk: The 2004 Election
in Ohio. Democratic National Committee, Voting Rights Institute.
A survey
conducted in Cuyahoga County, Ohio, shows that the single most important
cause of voters casting a provisional ballot in the county in the November
2004 election was residential mobility. About 60 percent of the provisional
ballots were cast by those who either were voting in Ohio for the first
time or who had previously voted in Ohio but had since moved. Among those
who had previously voted in Ohio and not moved since doing so, voters
younger than 55 years of age were much more likely to cast a provisional
ballot than older voters were. Among those who had previously voted in
Ohio but since moved, African American voters were more likely than white
voters were to cast a provisional ballot.
Ohio 2004 Election: Turnout,
Residual Votes and Votes in Precincts and Wards (with Michael Herron).
2005. Section VI of Democracy at Risk: The 2004 Election in Ohio. Democratic
National Committee, Voting Rights Institute.
During the
first five months of 2005, the DNC Ohio 2004 Investigative Project collected
extensive data from precincts throughout Ohio. Problems with election
administration seriously affected the 2004 election. Not providing a sufficient
number of voting machines in each precinct was associated with roughly
a two to three percent reduction in voter turnout presumably due to delays
that deterred many people from voting. Strong similarities at the precinct
level between the vote for Kerry (instead of Bush) in 2004 and the vote
for the Democratic candidate for governor in 2002 (Hagan) present strong
evidence against the claim that widespread fraud systematically misallocated
votes from Kerry to Bush. In most counties we also observe the pattern
we expect in the relationship between Kerry’s support and other precinct-level
factors: Kerry’s support across precincts increases with the support for
the Democratic candidate for Senator in 2004 (Fingerhut), decreases with
the support for Issue 1 and increases with the proportion African American.
Only in Cuyahoga County is the relationship between Kerry’s vote and the
support for Issue 1 significantly unusual. Over all precincts and wards
in the analysis, the proportion voting for Kerry decreases as turnout
in 2004 increases, even when turnout in the 2002 election is taken into
account. This suggests that voter mobilization efforts focused on turnout
on balance hurt Kerry, at least if one takes 2002 as the baseline. The
presidential residual vote rate (here defined as the fraction of ballots
without a vote for either Bush, Kerry, Bedarnik or Peroutka) is inversely
related to the number of voting machines per registered voter in both
DRE precincts and precincts using precinct-tabulated optical scan machines:
more machines meant a lower residual vote rate. The mechanism that most
likely produces this effect is easy to understand: with fewer machines
per voter, polling places become more crowded and voters are less likely
to take the time to check or correct their ballots.
Voting Machine Allocation
in Franklin County, Ohio, 2004: Response to U.S. Department of Justice
Letter of June 29, 2005. 2005. Working paper.
The allocation
of voting machines in Franklin County was clearly biased against voters
in precincts with high proportions of African Americans when measured
using the standard of the November, 2004, electorate. In precincts with
high proportions of African American voters there were 13.6 percent more
active voters per voting machine than in precincts having low proportions
of African American voters. While shortages of voting machines caused
long delays in voting throughout the county, the allocation of voting
machines among the county’s precincts affected different voters differently.
The most severe effects in terms of reduced voter turnout were incident
on voters in precincts that had high proportions of African Americans.
The most conservative estimate—based on the reported size of the active
electorate in November—is that typically the shortages of machines reduced
voter turnout by slightly more than four percent in precincts in which
high proportions of the voters were African American, while shortages
in precincts where very few voters were African American reduced voter
turnout by slightly less than 1.5 percent.
If the allocation
of voting machines is compared to information about the size of the active
electorate that was available to Franklin County election officials at
the end of April, 2004, then the allocation of machines is not biased
against voters who were active at that time in precincts having high proportions
of African Americans. But if election officials did use that information
to make their allocation plans, then they made plans that involved using
a total number of machines that was nearly 45 percent too small. Even
using the April measure of the size of the active electorate, 5,023 working
voting machines were needed, not 2,800 machines as data supplied by the
county indicate were actually deployed on election day.
Election Forensics: Vote
Counts and Benford’s Law. 2006. Presented at the 2006 Summer Meeting of
the Political Methodology Society, UC-Davis, July 20–22.
How can
we be sure that the declared election winner actually got the most votes?
Was the election stolen? This paper considers a statistical method based
on the pattern of digits in vote counts (the second-digit Benford’s Law,
or 2BL) that may be useful for detecting fraud or other anomalies. The
method seems to be useful for vote counts at the precinct level but not
for counts at the level of individual voting machines, at least not when
the way voters are assigned to machines induces a pattern I call “roughly
equal division with leftovers” (REDWL). I demonstrate two mechanisms that
can cause precinct vote counts in general to satisfy 2BL. I use simulations
to illustrate that the 2BL test can be very sensitive when vote counts
are subjected to various kinds of manipulation. I use data from the 2004
election in Florida and the 2006 election in Mexico to illustrate use
of the 2BL tests.
Election Forensics: The Second-digit
Benford’s Law Test and Recent American Presidential Elections. 2006. Presented
at the Election Fraud Conference, Salt Lake City, Utah, September 29–30.
While the
technology to conduct elections continues to be imperfect, it is useful
to investigate methods for detecting problems that may occur. A method
that seems to have many good properties is to test whether the second
digits of reported vote counts occur with the frequencies specified by
Benford’s Law. I illustrate use of this test by applying it to precinct-level
votes reported in recent American presidential elections. The test is
significant for votes reported from some notorious places. But the test
is not sensitive to distortions we know significantly affected many votes.
In particular, the test does not indicate problems for Florida in 2000.
Regarding Ohio in 2004, the test does not overturn previous judgments
that manipulation of reported vote totals did not determine the election
outcome, but it does suggest there were significant problems in the state.
The test is worth taking seriously as a statistical test for election
fraud.
Election Forensics: Statistics,
Recounts and Fraud. 2007. To be presented at the Annual Meeting of the
Midwest Political Science Association.
Often recounts
are cited as an essential tool for detecting election fraud. Several states
have laws that mandate random recounts of a fraction of the ballots cast.
But recounts can detect only some kinds of fraud, and some plans for doing
recounts are not very efficient. Supplementing a recount plan with auxiliary
statistical analysis, or using statistical analysis to guide the recount
plan, can do better. Statistics for outlier detection and using the second
digit Benford’s Law can even detect fraud that a recount will overlook.
I review the relevant statistical results and look at data from several
cases from American, Mexican and other elections.
Election Forensics: Statistical
Interventions in Election Controversies. 2007. To be presented at the
Annual Meeting of the American Political Science Association.
Controversies
about elections are not exactly commonplace, but neither are they rare.
In recent years a suite of statistical tools has been developed for diagnosing
election anomalies and possibly detecting election fraud. The tools include
methods for outlier detection and methods that use the second digit Benford’s
Law. These methods interact in complex ways with post-election recounts
and audits. I review how such methods have been used (or almost used)
in recent election controversies in various places including the United
States, Mexico and Bangladesh.
|