What to do With the Results

When Easy Georeferencer has completed its work, it produces a shapefile* that can be immediately opened by any GIS software or mapping website. However, if you wish to ensure the quality of the georeferenced data, there are a number of aspects in the output data that can be useful. Listed below is an overview of some common ways to make use of Easy Georeferencer’s output data, starting with the easiest and ending with the most accurate. These strategies are:

  1. Hit-and-Run
  2. Only-the-Best
  3. Rescue-Mission
  4. All-in

1: Hit-and-Run
Simply use the shapefile in your map and consider yourself finished! The shapefile is a spatial representation of the cases in your dataset that were able to be georeferenced, and is marked with “_GIS” in the filename. Accepting this shapefile without scrutiny is the most naïve approach but may be appropriate in many circumstances when time-efficiency and the overall patterns is the most important aspect and when potentially high errors are acceptable (for instance when georeferencing very large cross-national datasets).

2: Only-the-Best
Use the shapefile, but in a more quality-oriented approach exclude cases whose accuracy can be doubted. Doubtful cases can be highlighted in several ways.

  • One way is to sort on the “GEOINCONS” field which measures inconsistency of whether there were multiple locations with the same geographic name as the name that was matched. In city-mode the field represents the maximum distance in kilometers between the identically named locations (calculated using the Haversine distance equation)** and therefore higher values means that the risk of geographic misplacement error is greater. Small distance values should not be a concern because gazetteers often have duplicate records for each place written with slightly different coordinates but essentially referring to the same place. It is recommended here that values higher than 100 km (about 1 ½ hour driving distance) should be interpreted as a significant risk that any of the alternative names are in fact different places. In province-mode the value is the number of locations with identical names at any administrative level, for instance the name “New York” could refer to the level-2 county as well as the level-1 state.
  • Another way is to sort on the field “GEOMRATIO” which measures the percentage similarity between the original name and the name of the matched location. Though the match similarity threshold was originally set in the input settings screen, the similiarity values can be compared with your own qualitative assessment of how well the names match by quickly eye-balling the low-value cases. If the original match threshold seems too lenient, simply decide on a new threshold where the name-matches start becoming more acceptable and exclude the cases whose values are lower.

3: Rescue-Mission
Pay attention to doubtful cases like above, but instead of flat-out excluding them, manually provide your own georeferencing. The field called “GEOALTERN” contains the names of the other alternative locations (and in city-mode also the coordinates for each). These alternative names can then serve as a guide of which places need to be compared and thus where to start in your manual georeferencing efforts.

4: All-in
If you require total or near-total completeness of the georeferenced data you can in addition to the above steps manually georeference all the cases that failed to be automatically georeferenced and are conveniently located in the “_remainders” file. The manually georeferenced cases can then be merged with the shapefile from the automatic process to have a complete georeferenced version of the original dataset. HINT: The “_remainders” file can even be put through another round with the Easy Georeferencer software, only this time with a lower “match similarity” threshold to return more matches.

*The output shapefile is created using the PyShp Python module by Joel Lawhead. Code available from http://code.google.com/p/pyshp/, and hints for how to use it at http://geospatialpython.com/

**The Python code for the Haversine algorithm was adapted from Chris Veness’ online Java code guide for calculating distances. http://www.movable-type.co.uk/scripts/latlong.html


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s