Wikipedia traffic: selected language versions

Regarding the viewing and editing traffic analysis reports of Wikipedia projects, I made a few more interactive infographics which show the historical changes since late 2011. (Historical numbers are scraped from the past versions archived by the Internet archive: viewing and editing.)

The results are listed as below for each language versions (Both the charts and tables list the countries from the highest to the lowest. The tables provide additional groupings of the country codes based on their economy and regions.):

From the outcomes of English Wikipedia, the contrast between viewing and editing traffic shows that the editors from the United Kingdom (country code: GB) have been contributing more proportional of editing traffic when compared with the United States (country code:US). In short, the United Kingdom contribute proportionally more editing traffic than the viewing traffic.

(Readers may be interested to ask further, what are the equivalent “per capita” or “per size” comparisons? I argue that the Wikipedia activities depend mainly on two factors: existing editors and content, and thus “per capita” or “per size” results can be done at least from two perspectives of editor size and content size. “Linguistic normalization” is thus needed, a topic to which I will come back in another blog post.)

From the outcomes of Chinese (中文) Wikipedia, the contrast between viewing and editing traffic shows that increasingly proportionally more mainland Chinese users reading Chinese Wikipedia, but such level of increase does not reflect equally on the editing traffic. Also, while Hong Kong users may take up less proportion of the viewing traffic, probably because of the increase of viewing traffic from mainland China. (It would be nice if the Wikimedia Foundation release data in absolute numbers instead of just percentages in order to answer such question.) Note that Hong Kong has ranked consistently the second in the editing traffic.

From the outcomes of Arabic (العربية) Wikipedia, there is no surprise that Saudi Arabia and Egypt are the top two contributing countries for their sizable Arabic-speaking populations. However, there seems to be a sizable decrease of Saudi Arabia’s editing traffic (or a sizable increase of Egypt’s editing traffic). Again, this is a limitation of using percentage data points.

From the outcomes of Spanish (Español) Wikipedia, the viewing and editing traffic shows the existence of a digital divide. Users from Spain contributed to the largest proportion of the editing traffic, whereas users from Mexico contributed to the largest proportion of the viewing traffic. It would be interesting to further examine the geographic distribution of citations and external links in Spanish Wikipedia, so as to see the geographic proportion of Spanish-language sources. It is expected that a sizable proportion of the Spanish-language sources come from Spain, but its share should be decreasing as more Latin-America sources are increasingly online. It is also interesting to see the contribution by the Spanish-speaking diaspora/immigrants in North America.

Note on the data

The original data is the partial (some languages and regions are not included) and derived (percentage data for comparisons within a language). Previous research by Taha YasseriRobert SumiJános Kertész has indicated a notable gap between the actual number of edits and the measurement of “the percentage of requesting ip addresses” that excludes duplication of a single IP address within the same day. The varying gap is likely depend on the dynamic/static IP addressing arranged by different ISPs across different countries.

This is part of the many things that the Wikimedia Foundation need to revise its current traffic data analysis and curation.

Acknowledgement

Intended as a proof-of-concept prototype to provide insights into the geolinguistic dynamics of Wikipedia projects, this work is not possible without the Jake Vanderpla’s mpld3 plugin which provides a D3js bridge for the Matplotlib results to the Web. It also depends on the Wikimedia’s partial release of the traffic data maintained by Erik Zachte and other Wikimedia’s staff and volunteers. My thanks to them.

Comments (choose your preferred platforms)

Loading Facebook Comments ...

2 thoughts on “Wikipedia traffic: selected language versions

  1. “In short, the United Kingdom contribute proportionally more editing traffic than the viewing traffic.”

    This comparison is not entirely fair, have you tried accounting for the bigger share of inhabitants whose mother tongue is not English? A standard source like http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html#US shows there is quite a difference between UK and USA.

    Anecdotally, we know that Latinos (but even Afro-Americans and all “minorities”) are almost non-existing among editors, proportionally speaking. A less “dirty” comparison could be between Portugal and Brasil on the Portuguese Wikipedia; or you could analyse wikis with a less mixed situation (like German, Italian, Polish etc.).

  2. Hello, Federico. Thanks for your comments and questions here and in the mailing list.

    To compare data generally, we need references or bases for comparison. I am sure that you agree now that the percentage data points have their limits.

    The statement that “the United Kingdom contribute proportionally more editing traffic than the viewing traffic” here is on the comparison between editing versus viewing. With percentage data one can only compare ratios. Thus what I mean to say, in more precise terms, should be the editing/viewing percentage ratios are higher in the U.K. and the U.S.

    Your suggestion of using population data listed in the CLDR was exactly what I have done in a on-going research, with an extended abstract submitted to the Wikisym 2014 a month ago. For that research, I have done Spanish, Arabic and other language versions, using exactly the data points you suggested for what I called “geolinguistic normalization”. Now since you seem to be interested in the English Wikipedia’s viewing/editing traffic data, I have made the charts for the English Wikipedia for your further review. Please visit the blog post here: http://wp.me/p2KfH2-SH

Leave a Reply