How to even out an uneven analytics of user-generated content: geolinguistic normalization as one possibility

In response to the “Uneven Geographies of User-Generated Information“, I argue that researchers and analysts must be explicit and transparent about the parameters of ideal “eveness”. Ideal evenness could be as simple as a sense of proportionality to the number of speakers, the number of Internet users, or the number of offline publications. Such sense of “evenness” provides essential and concrete baselines for online/offline, cross-country, and/or cross-language comparison. In a poster for the upcoming Opensym 2014, my coauthor and I propose one such possibility of using the number of language speakers across different countries (e.g. Arabic speaking population in Egypt, Israel, Kuwait, etc.) to showcase how the viewing and editing traffic is unevenly distributed.

In a nutshell, it is not surprising that Egypt and Saudi Arabia as major contributing country for reading and editing Arabic Wikipedia, but it is interesting to see, *per capita*(as defined here as per the number of Arabic language speakers for each region), smaller and yet more Internet-ready countries such as Israel, Kuwait, UAE, Bahrain, Qatar, Jordan, etc., contribute significantly more.

Normalized viewing traffic trend lines: Arabic Wikipedia

Normalized viewing traffic trend lines: Arabic Wikipedia

Normalized editing traffic trend lines: Arabic Wikipedia

Normalized editing traffic trend lines: Arabic Wikipedia

It is particularly interesting that Israel contribute the most *per-capita* viewing traffic to Arabic Wikipedia. The pilot findings however relies on the accuracy of the data on the number of speakers listed in “Language-Territory Information” compiled by the Unicode Consortium in CLDR version 25. Regardless, the proposed *per capita* metrics opens the doors not only for better and more detailed understanding of the “unevenness” of analytics of user-generated content, but also for more tangible baselines for our assumptions of “evenness”.

For pre-normalization results, please download the Opensym paper here or try the interactive infographics here

Comments (choose your preferred platforms)

Loading Facebook Comments ...