Monday, April 10, 2017

Post 4: Geocoding


Goals and Objectives
The goal of this lab is to geocode locations of sand mines in Wisconsin. The mines are divided up into among the class, so the results can be compared in terms of potential error. The objectives include normalizing the data of the mines in excel, geocode the mines using the geocoding service from ESRI, using the Public Land Survey System (PLSS) to assist with geocoding, and compare the results with geocoded mines from other colleagues.

Methods
With the 19 mines that were assigned, they each had to be geocoded. Using the data from the WI DNR, addresses and PLSS were attached to each mine information. However, some mines just had the address and some only had a PLSS. The first step was using the geocoding service from ESRI in ArcMap. From there, each location was checked to make sure that the point was not placed in a reference area to the city, but the actual mine itself. The points were rematched manually, if they were not located in front of a mine. Once the address points were relocated, the locations that were unclear or only had a PLSS, were referenced with the PLSS provided from the WI DNR shapefile to find the rest of the error in the locations.

Results
The data of the mines needed to be normalized in order to the geocoding service in ArcMap to correctly operate. Table 1 is a fraction of the data that was assigned before it was normalized. In this table, a few attributes were missing.
Table 1. Part of the data and its attributes before it was normalized.
Table 2 demonstrates how the data table was then normalized by adding a few fields. This included creating a separate PLSS, Street Address, Street, State, City, and Zip code fields.
Table 2. Part of the data and its attributes after it was normalized.
After geocoding and rematching the locations of the mines, the locations were created in a new feature class (Figure 1).
Figure 1. Geocoded and PLSS identified Mine Locations.
The mines were then compared to two different sets of data. The first being another colleagues geocoded mines. One colleague had 14 of the same mines, so 14 of the 19 mines were compared (Figure 2). Using the Generate Near Table tool, a table was created by inputting the two sets of data to compare the distance (meters) between the geocoded mines. This allowed for recognition in error of the sets of data.
Figure 2. Comparison of the geocoded mines with the mines that were geocoded by another colleague.
The Generate Near Table tool was used again to compare the geocoded mine locations with the actual locations of the selected mines (Figure 3).
Figure 3. Geocoded locations in comparison to the Actual Locations of the mines.


Discussion 
The distance tables were used to select the error type and detect the error in the geocoding process. According to Lo, there is two types of error: inherent and operational. Inherent error is defined as errors that occur as a result of the real world data (Lo 2003). There could be potential changes in scale or variations when projecting the data. Operational data is classified as error that happens during data collections and managing (Lo 2003). It can also be simply referred to as user or processing errors. Looking at Table 3, majority of the error types are operational. The wrong mines could be have been chosen around the reference point as well as misreading the PLSS or addresses. The error in the geocoded locations can be recognized in Figure 3 due to the fact that every point does not line up.
Table 3. Table of showing error of geocoded mines in comparison to the actual mines.
The error types that were identified as inherent were most likely influenced by the where the mine location is defined. The points that were geocoded were placed at the entrances of the mines, closer to the streets. This is different from the actual locations that placed the points in the middle of the mines, that are not necessarily close to the streets or entrances of the mines.

Conclusion
In the beginning of the geocoding process, the ESRI service stated that 19 of the 19 locations has been matched. Based on the information provided in the data tables, matching in ArcMap does not mean it is necessarily in the exact location of the mines. The data points need to be checked to make sure they are actually located in their destined location. Even then, operational errors can be made by not referencing the right PLSS locations or reading the correct addresses. Often times cities have more than one mine, so the point could have been placed in front of the mine, but not the right one. Looking at the error table, there are points that have a great distance between them and the actual locations. This is why it is important to look at the error in the data. Editing could  be done at a later time to replace the points in the correct locations and lower the chance of error and adjust mistakes.

References
CP Lo, AKW Yeung - 2003 - Pearson Prentice Hall

No comments:

Post a Comment