Ways of Seeing (Data)

Data is data. Period?  

Both of the maps above visualize the 2008 US presidential election results by coloring each county with different colors: red for the Republican candidate John McCain, and blue for the Democratic candidate Barack Obama. Which is more “distorted” and which one is closer to the truth?

The difference between the two maps is the size of each county shown.  The area of a county on the left follows the usual expectation of geographic map, where the spatial size of the geographic area are represented.   The area of a county on the right, however, is adjusted (or distorted) in a way to represent the size of votes in each particular area.

The map on the left may give the wrong impression that the election results should be a win for the red Republican candidate.  In contrast, the map on the right gives better indication that popular votes have been won by the blue Democrat candidate.

Thus, while the map on the left distorts little in terms of geography, the map on the right represents better for this case where the number of votes counts more than the area size of a county when comparing election results.

The above is among increasing number of cases where several issues of data visualization, including persuasion vs. manipulation.  Some argue that “good charts merely present data, and leave the analysis (obvious though it may be) to the viewer”, while others argue that some additional data visualization efforts can help better comprehension and memorability of viewers.

It is an impressive effort made by other OII people and collaborators, including Mark GrahamScott A. HaleTaylor SheltonMatthew ZookMonica Stephens, etc. to visualise the important Internet indicators at the “Visualizing Data” project.  Thanks to the environment provided by Academia Sinica and Oxford Internet Institute, I have used both network analysis software and Geographic Information Systems (GIS) to visualise data for my research projects on comparative studies on Baidu Baike-Chinese Wikipedia and linguistic networked readiness index for selected languages in India and China.  However, these visualization efforts may receive mixed reception from different audience with different expectations of graphs, maps and charts.  It seems to me that we need to revisit the old social science philosophical issues regarding “evidence”, “explanations”, “presentations” and “interpretations” with the exciting new open data and internet data.  The data visualization are just too powerful to be ignored or be taken at face value.

Leave a Reply