Sunday, February 24, 2019

Data Classification

This week's lab we learned about the 4 common data classification methods: Equal Interval, Quantile, Standard Deviation, and Natural Breaks. We compiled two maps using the Miami Dade County 2010 Census tract data to display each classification method. The first map showed the population percent of senior citizens aged 65 and above. The second map showed the senior citizen population normalized by area. Equal Interval is a classification method where data is represented by classes that contain an equal amount of data values. The range of the data is divided by the amount of classes you want to have. This method is the easiest for the reader to interpret and it is also the easiest to prepare. However, there can be an unequal amount of distribution within the classes that can cause entire classes to be unrepresented with fill color on the map or for one class to dominate the map. Quantile is a classification method where data is sorted into a certain amount of categories with each category containing the same number of values. The total number of observations is divided by the total number of classes. While you will never have empty classes, you have to manually adjust your break values to compensate for tied classes. Similar features can be placed in adjacent classes or features with grossly different values can be placed in the same class. This distortion can be decreased by adding classes. Standard Deviation is a classification method where the standard deviation is added/subtracted from the mean of the data. The data needs to be normally distributed to give your classes clear dividing points. The audience target should be considered as this statistical representation might not be easily understood. Natural Break is a classification method where the natural groups in the dataset are considered. This minimized the differences between data values in the same class. It does consider outliers and places them in their own categories but clusters are placed in one or two classes. It can be difficult to compare two or more maps with the natural break classification because each map range is data specific.


The first map, under the symbology tab, I selected the graduated color (green hue, light to dark) with the field PCT_65ABV for all of the data classification methods except for the Standard Deviation. For the Standard Deviation method, I used the dark brown-tan to light blue-navy blue. The map contained all map essentials with four data frames that contained a legend in each. I used the same layout for the second map except I normalized the area in square miles under the field AGE_65_UP. The population count normalized by area more accurately depicts the distribution of senior citizens of Miami Dade County. The percent above 65 data presentation can be misleading in that a large tract can have a high percentage of senior citizens residing there but contain a low population count. Since the percent above 65 presentation does not factor in area, a small tract can be densely populated while a large tract can be sparsely populated. When the data is normalized by area, the reader can focus on the areas that are densely populated.



No comments:

Post a Comment