A Guide to Geocoding International Locations
In the process of conducting projects under the Geospatial Technologies and Human Rights Project at the American Association for the Advancement of Science, researchers have frequently needed to locate obscure or remote locations that do not appear on conventional mapping sources. Information concerning these locations may come from many sources, including field workers, news media, or reports from organizations that do not incorporate geographic information in their data. To help other researchers with this difficulty, this Guide to Geocoding has been created to demonstrate the data sources and best practices established by AAAS. Because there are many existing guides for geocoding locations in areas such as the United States, this guide will focus on areas where location information can be extremely difficult to determine.
What is Geocoding?
Geocoding is a process by which spatial reference information is assigned to a location. This reference data can be in the form of latitude and longitude coordinates, street addresses, or a variety of other spatial scales, ranging from the local level (e.g. census tracts), up through the district and provincial levels. After determining the spatial location for the area of interest, geographic coordinates can be used to map the location using geographic mapping software or applications.
Step 1: Create a Spreadsheet
Before geocoding, it is important to structure your data in a spreadsheet and save it as an XLS, CSV, or TXT (text) file to keep the information organized and most amenable to use in mapping applications. Create a new column for each piece of geographic information you have. This may result in separate columns labeled “City,” “Postal_Code,” “Address,” etc. Also be sure to include a column labeled “Latitude” and another called “Longitude.”
Step 2: Geocoding
There are a number of resources available for determining the spatial location of an area of interest. As a result of research into many options, AAAS has found several data sources and methods to be the most useful. Foremost among these include the GeoNet Names Server and Fuzzy Matching Systems. Other useful options include open source data projects such as Wikimapia and Open Street Map, as well as services such as batch geocoding.
A. Using the GEONET Names Server
Geocoding can be done in several ways. However, when trying to locate small towns in relatively undeveloped countries, you will likely have to make use of the GEONET Names Server provided by the US Government. The server provides access to extensive lists of every country’s place names along with their corresponding latitudes and longitudes. Using these lists, it is possible to match most of your place names to the known locations in the GEONET Names database.
A country file can be downloaded and searched using either a spreadsheet or GIS program by following these steps:
- Go to http://earth-info.nga.mil/gns/html
- Click on “Country Files” under the “Research and Reference” menu to the left
- Find your country of interest and download the zipped file
- Unzip the file
After unzipping the file, you will be able to view the text file in Excel. Sorting the spreadsheet by the “FULL_NAME_ND_RO” column will then allow you to scroll alphabetically until you have found a potential match between your locations and the list’s entries. When you find a match, scroll sideways to the “LAT” and “LONG” columns and record the corresponding numbers into the “Latitude” and “Longitude” columns of the spreadsheet you created in Step 1.
If you have a GIS program, the GEONet data can also be displayed visually. The following steps are specific to ESRI ArcMap, but should be similar for most other GIS software:
- After unzipping the GEONet file, add the TXT to ArcMap using the ‘Add Data’ button
- Right-click the layer, and choose ‘Display XY Data’
- Make sure ‘LONG’ is chosen for the X Field and ‘LAT’ is chosen for the Y Field
- After hitting OK, the locations will appear as points
- To save the points as a shapefile, right-click the new layer and select Data —> Export Data
- When you have successfully imported the locations, right-click the layer and select ‘View Attribute Table’
- Sort by column or use the “Find” feature to search for a particular location
B. Fuzzy Matchers
Due to the numerous languages and dialects of certain areas, it is common for spellings of place names to vary substantially from one source to the next. If such variation is suspected, then another geocoding option to consider is a fuzzy matcher, which frequently yields quality results even in cases where the exact spelling of a particular location is uncertain. A fuzzy matcher makes use of the standard Levenshtein Distance algorithm to calculate the closest matches between an inputted place name and a place name in the country of interest.
GeoNames is one type of fuzzy matcher that searches a user’s entry against its databases and outputs similarly spelled names and their geographic coordinates. You can then choose which coordinates to use based on how similar the matched results are. In addition to GeoNames, AAAS has created fuzzy matchers covering Pakistan, Burma, Darfur and Ethiopia, available here.
C. Automated Geocoding
Batchgeo.com is an online tool that geocodes hundreds of entries at a time. To use it, simply save your spreadsheet as a text file and copy the data onto the site. The tool can geocode up to 900 entries at a time in a matter of minutes, and works best with certain countries listed on the site. While Batchgeo.com will process most of your entries, there will likely be some that cannot be resolved, which will require the use of the manual geocoding methods described above.
D. Wikimapia and OpenStreetMap
As a last resort, openly editable mapping sites like Wikimapia and OpenStreetMap are sometimes helpful in identifying very obscure locations that might not be covered by the GEONet server. People with local knowledge may contribute to these open source sites. If a match is identified, you should be aware that the information may be based on just one person’s knowledge.
Step 3: Displaying the Data
A. Displaying Locations in ArcMap
Once the latitude and longitude have been found for each place, ESRI ArcMap can be used to display them on a map.
- Add the spreadsheet (XLS, TXT, or CSV) to ArcMap
- Right-click the layer, and choose ‘Display XY Data’
- Choose ‘Longitude’ for the X Field and ‘Latitude’ for the Y Field
- After hitting ‘OK’, the locations will appear as points
- To save as a shapefile, right-click the new layer and select Data —> Export Data
B. Displaying Locations in Google Earth
You can also display the geocoded locations in Google Earth using the Earth Point tool available here. This tool converts a XLS, TXT, or CSV to a Google Earth-compatible KML. To convert a file to KML:
- Make sure the columns with the geographic coordinates are labeled “Latitude” and “Longitude”
- Go to http://www.earthpoint.us/ExcelToKml.aspx
- Click “Browse” and select the file you wish to convert
- Click “View on Google Earth” to see the locations
- To save, right-click the new layer and choose “Save Place As”
A PDF version of this geocoding guide is available here.