Regarding the viewing and editing traffic analysis reports of Wikipedia projects, I made a few more interactive infographics which show the historical changes since late 2011. (Historical numbers are scraped from the past versions archived by the Internet archive: viewing and editing.)
The results are listed as below for each language versions (Both the charts and tables list the countries from the highest to the lowest. The tables provide additional groupings of the country codes based on their economy and regions.):
- Chinese (中文) Wikipedia
- Arabic (العربية) Wikipedia
- Russian (Русский) Wikipedia
- English Wikipedia
- Spanish (Español) Wikipedia
- Portuguese (Português) Wikipedia
- Dutch (Nederlands) Wikipedia
- Hebrew (עברית) Wikipedia
From the outcomes of English Wikipedia, the contrast between viewing and editing traffic shows that the editors from the United Kingdom (country code: GB) have been contributing more proportional of editing traffic when compared with the United States (country code:US). In short, the United Kingdom contribute proportionally more editing traffic than the viewing traffic.
(Readers may be interested to ask further, what are the equivalent “per capita” or “per size” comparisons? I argue that the Wikipedia activities depend mainly on two factors: existing editors and content, and thus “per capita” or “per size” results can be done at least from two perspectives of editor size and content size. “Linguistic normalization” is thus needed, a topic to which I will come back in another blog post.)
From the outcomes of Chinese (中文) Wikipedia, the contrast between viewing and editing traffic shows that increasingly proportionally more mainland Chinese users reading Chinese Wikipedia, but such level of increase does not reflect equally on the editing traffic. Also, while Hong Kong users may take up less proportion of the viewing traffic, probably because of the increase of viewing traffic from mainland China. (It would be nice if the Wikimedia Foundation release data in absolute numbers instead of just percentages in order to answer such question.) Note that Hong Kong has ranked consistently the second in the editing traffic.
From the outcomes of Arabic (العربية) Wikipedia, there is no surprise that Saudi Arabia and Egypt are the top two contributing countries for their sizable Arabic-speaking populations. However, there seems to be a sizable decrease of Saudi Arabia’s editing traffic (or a sizable increase of Egypt’s editing traffic). Again, this is a limitation of using percentage data points.
From the outcomes of Spanish (Español) Wikipedia, the viewing and editing traffic shows the existence of a digital divide. Users from Spain contributed to the largest proportion of the editing traffic, whereas users from Mexico contributed to the largest proportion of the viewing traffic. It would be interesting to further examine the geographic distribution of citations and external links in Spanish Wikipedia, so as to see the geographic proportion of Spanish-language sources. It is expected that a sizable proportion of the Spanish-language sources come from Spain, but its share should be decreasing as more Latin-America sources are increasingly online. It is also interesting to see the contribution by the Spanish-speaking diaspora/immigrants in North America.
Note on the data
The original data is the partial (some languages and regions are not included) and derived (percentage data for comparisons within a language). Previous research by Taha YasseriRobert SumiJános Kertész has indicated a notable gap between the actual number of edits and the measurement of “the percentage of requesting ip addresses” that excludes duplication of a single IP address within the same day. The varying gap is likely depend on the dynamic/static IP addressing arranged by different ISPs across different countries.
This is part of the many things that the Wikimedia Foundation need to revise its current traffic data analysis and curation.
Intended as a proof-of-concept prototype to provide insights into the geolinguistic dynamics of Wikipedia projects, this work is not possible without the Jake Vanderpla’s mpld3 plugin which provides a D3js bridge for the Matplotlib results to the Web. It also depends on the Wikimedia’s partial release of the traffic data maintained by Erik Zachte and other Wikimedia’s staff and volunteers. My thanks to them.