Skip to main content

CIIDH Data: Dictionary

AAAS/CIIDH database data dictionary
Version date: 2000.01.29
Current version: ATV20.1
Patrick Ball & Herbert F. Spirer

The unit of analysis for each record in this structure is VIOLATION.

Each violation was of a particular type, happened at a particular time and place, and was committed by zero, one, or several organizational perpetrators. The violation was committed against zero or one named (individually identified) victim, and zero or more anonymous (unidentified) additional victims. The violation was reported one or more times in one, two, or three source types.

Note that to count the number of times individuals suffered particular violations, users should sum either the variable c_nmd (to count the number of NAMED individuals) or c_tot (to count the total number of individuals, named and anonymous). In Stata, this can be accomplished by using frequency weights. Other statistics programs have similar features. To repeat: the number of records is not the same as the number of violations.

The dataset is available in several formats: Stata version 6 (recommended), delimited ASCII (csv), dBase III (dbf), SPSS portable file, and SPSS for Windows. Note that for the Stata and SPSS (Windows and portable) versions of the dataset, the variable labels and value labels are already applied to the data. However, for the ASCII and dbf versions, you will have to handle the labeling on your own. Note that there are 17,423 records in this dataset, which is too large to be imported into most spreadsheets.

The categorical variables are coded as integers. Although this is convenient for statistical packages, it can be difficult for human beings to interpret data coded in this way. The value labels for the integer codes are here. The value label list includes the number of times each category appears in the data. Note: these are frequencies of records, not of violations. To count violations, you must use the weights in c_tot and c_nmd.

Variable list

Victim variables
Variable name Variable type Value labels Variable label
v_num str9   Victim ID
v_sur1 str13   Victim First surname
v_sur2 str15   Victim Second surname
v_nam1 str13   Victim First names
v_age byte   Victim Age
v_dob int   Victim date of birth
v_p94 long   Population of v_must (1994 census)
v_occ byte Yes Victim Occupation
v_ind byte Yes Victim Ethnic category
v_sex byte Yes Victim Sex
v_eth byte Yes Victim Maternal language (proxy for eth.)
v_must int Yes Victim Muncipio of birth
Violation variables
Variable name Variable type Value labels Variable label
n_grp int   Number in group (killings and disappearances)
n_ovkl byte   Whether the killing was “overkill” (see Note 2 below)
n_mon byte   Month of violation
n_year int   Year of violation
n_dtcd byte Yes Date precision (violation)
n_rgim byte Yes Regime code (for date of violation)
n_p94 long   Population of m_mucd (1994 census)
n_type byte Yes Type of violation (note 1, below)
n_ur byte Yes Violation location: Rural or urban
n_must int Yes Municipio of the violation
n_dpst int Yes Departamento of the violation
Perpetrator variables
Variable name Variable type Value labels Variable label
p_civ byte   1=participation of civilians
p_arm byte   1=participation of army
p_pac byte   1=participation of PACs
p_pol byte   1=participation of police
p_par byte   1=participation of paramilitary groups
p_urn byte   1=participation of URNG
Reporting variables
Variable name Variable type Value labels Variable label
r_per byte   Number of times this violation was reported in the press
r_doc byte   Number of times this violation was reported in documentary sources
r_ent byte   Number of times this violation was reported in interviews with witnesses
r_date int   If R_per>0, R_date is the date of the first press report of the violation (in the ASCII version, this is formatted as mm/dd/yyyy)
Case (multiplier) variables
Variable name Variable type Value labels Variable label
c_nmd Byte   1=this violation includes a named victim
c_tot Int   The total number of victims (named and anonymous) who suffered this violation

Note 1: the violation type codes are the following:

Category Meaning Record count
DM Disappeared, later found killed 218
Ds Disappeared 1546
Hr Injured (in Army attack) 411
Mu Killed 11862
Se Kidnapped 2903
To Tortured 483
Total 17423

The important part of Note 1 is that to count disappeared people, you must sum c_nmd or c_tot including Ds + DM; to count killed people, sum c_nmd or c_tot with DM + Mu; to count killed and disappeared, sum c_nmd or c_tot for Ds + DM + Mu. DM is a compound category including people who were both disappeared and later their bodies appeared. In Stata, you could create new variables to represent people who were killed and disappeared with the following commands. (note the difference between the record counts in the table above and the frequency counts using c_tot in the examples below).

/* this creates a variable with the value and the label in one field */ 
. vallab n_type, g(sn_type)

/* now show the tabulation, counting anonymous victims */ 
. ta sn_type [fw=c_tot] 

    Type of |
  violation |      Freq.     Percent        Cum.
      23 DM |        272        0.63        0.63
      24 Ds |       2759        6.41        7.04
      25 Hr |       1085        2.52        9.56
      26 Mu |      34210       79.43       88.99
      27 Se |       3466        8.05       97.03
      28 To |       1278        2.97      100.00
      Total |      43070      100.00

we're interested in violations with n_type = 23, 24, and 26.  
The new variable is created below. 

. ge killdis=1 if n_type==23 | n_type==24 | n_type==26
(3797 missing values generated)

. replace killdis=0 if killdis==.
(3797 real changes made)

. ta killdis [fw=c_tot]

    killdis |      Freq.     Percent        Cum.
          0 |       5829       13.53       13.53
          1 |      37241       86.47      100.00
      Total |      43070      100.00

Note 2: “overkill” is defined as people who were killed by methods beyond the necessary, including torturing to death or burning, as well as cases in which bodies were mutilated after death.

Notes on the original data

The original data from which this dataset was generated include 19 tables linked in a relational database collected and systematized by the International Center for Human Rights Research in Guatemala. That full dataset, including narrative summaries, occupies approximately 50 megabytes.

There are many variables that were not included in this output, from antemortem information about victims of forced disappearance (color of pants when last seen, dental or bone conditions), to specific types of torture, to data about the perpetrators (vehicle type, weapon caliber).

It would be very complicated to put most of the excluded variables in the dataset. For example, since each violation may have been committed by various perpetrators, there may be various weapons that were used. If we attempt to put the weapons data into the flat structure we are using for this published data, we will need dozens of fields to represent each perpetrator’s possible weapon.

Most of the variables not included in this dataset are sparse. For example, there is data on the type of weapons used in particular violations for approximately one-third of the violations originally coded. Other variables have non-missing data for only a few dozen records. If researchers have particular questions about variables they would like to have included in future versions of this dataset, we are willing to discuss their needs. If there are sufficient requests for new variables, we may issue a new version of this dataset. A review of the dataset’s full variables is here.

Error checking

We have devoted hundreds of hours to checking the dataset to control for multiple reports of the same incidents. Many of the victims in this dataset have the same names and may appear to be the same person. We have reviewed every pair of victims with the same or similar names against the narrative information that was stored with the original data. The narrative information includes portions of the original testimony, quotations of original newspaper or documentary accounts, and the coders’ commentary on what they found in the source materials; this narrative information cannot be published because it includes too much data on the original witnesses to be securely released. Whenever victims appeared to be the same person, based on an overall analysis of the names, places and dates of birth, types, times and places of the violations, and qualitative data in the narrative, we combined the records. Note that we did not delete the original records; instead we created meta-records that linked all the data pertaining to the same person. This way we are able to report the r_* series variables, analyzing how frequently some violations are reported relative to other violations.