“Wikipedias” (or its copycat) dominate “Chinese” search engine result pages (SERPs)

It has been reported that (and speculated why) the global leader of search engines Google has consistently favoured the global leader of user-generated encyclopedias Wikipedia by showing relevant pages frequently and prominently in the search engine result pages (thereafter SERPs) (?uhalev, 2006; Charlton, 2012; Gray, 2007; Silverwood-Cope, 2012). Based on 3000-search query SERP data collected in 2012, I have also found that indeed “Wikipedias” too dominate “Chinese-language” search engine result pages(SERPs) as the most visible websites, but a clear difference in which “Chinese” Wikipedia dominate “which Chinese search engine” result pages.

To account for the dominance of Wikipedia pages on SERPs, independent market research by Nielsen Online and Hitwise Intelligence has demonstrated that Wikipedia not only dominates the online visits for encyclopedia content, but also does so mainly because of the traffic directed by major Web search engines (Hopkins, 2009; Nielsen Online, 2008). Even the Wikimedia Foundation acknowledges this, while arguing that half of its readers does look for Wikipedia content (Khanna, 2011). Thus, both as major websites that dominate world’s Web traffic and user attention, their respective roles in providing users’ information online are central, with multiple incidents such as the Google query of “Jew” (Bar?Ilan, 2006): some users were organized to help the Wikipedia’s entry page of “Jew” to rank higher in the Google’s English-language SERPs. Major search engines (Google) and major user-generated encyclopedias (Wikipedia) seem to be the major examples of “network gatekeeping” that filters information for users (Barzilai-Nahon, 2008) .

What about languages other than English?

Nonethless, such empirical research so far has been limited to English-language. For Chinese-language Internet, only one prior empirical research has been conducted in analysing SERPs inside mainland China, with the latest research on 316 search query phrases of “Internet event” collected in 2009, indicating that indeed Baidu Baike, Hudong and Chinese Wikipedia has ranked high among the SERPs (Jiang, Akhtar, 2011). However, it focuses on (and thus is limited to) simplified Chinese users in mainland China and the selected sample of search queries was based upon Internet incidents that are politically controversial to mainland China. This paper contributes findings based a 3000-search query dataset collected in 2011, covering not only more topics but also more Chinese-language localization variants, including regions such as Hong Kong, Taiwan and Singapore. I will share some of the findings that show the localization effects of search engines on the most visible websites presented, and how different Chinese-language encyclopedias dominate different Chinese localized search engine interfaces.

Search query selection

First, I have selected about 3000 search queries that cover a wide range of Chinese topics. As summarized in the table below, the selection includes all 990 entries in “The Cambridge Encyclopedia of China”, the top 10 search terms provided respectively by Baidu and Google (including mainland China, Hong Kong and Taiwan variations) of various categories from 2007 to 2010, major popular cultural references, notable people names and some other culturally, potentially “sensitive” keywords and. Although other selection is possible, this selection is arguably by far the most diverse to date for Chinese SERP research
search_queries_2012HK_300

Search engine selection

I have selected nine search engine variants as below, which altogether cover over 97% of the market for major Chinese-speaking regions: 
• For mainland China (mostly simplified Chinese users): 
zh-cn: Baidu, Google (simplified Chinese), Yahoo China
• For Singapore (mostly simplified Chinese users): 
zh-sg:Google Singapore and Yahoo Singapore 
• For Hong Kong (mostly traditional Chinese users): 
zh-hk:Google Hong Kong and Yahoo Hong Kong 
• For Taiwan (mostly traditional Chinese users): 
zh-tw:Google Taiwan and Yahoo Taiwan

These variants are hereafter abbreviated as Baidu_CN, Google_CN, Yahoo_CN, Google_SG, Yahoo_SG, Google_HK, Yahoo_HK, Google_TW, Yahoo_TW. It is noted that Baidu continues to enjoy its lead in mainland China with Google at second place, after Google moved its mainland operations to Hong Kong . In Hong Kong and Taiwan around 2010 to 2011, Google has overtaken Yahoo’s leading position while maintaining its top position in Singapore .

Findings: Concentrated visibility scores

The table below shows the overall and accumulative distribution of visibility scores for the top-100 most visible websites. It is evident that near 80% of the visibility scores are already concentrated over the top-100 websites.
2012HK_visibility_top
Since the overall table shows the outcome of all nine search engine variants, the findings must be unpacked to see which websites are the most visible for which search engine variants for which categories of search queries.

Findings: user-generated encyclopedias rule

By unpacking the overcome to see the top 5 websites and their top 5 domains, it is shown that user-generated encyclopedias rule! Chinese Wikipedia (zh.wikipedia.org), Baidu Baike (baike.baidu.com) and Hudong Baike (www.hudong.com) are the most visible domains for the most visible websites.
host_domain_top5top5

Findings: Baidu_CN

The table below shows the results only for Baidu_CN, which represents the biggest group of search engine users of the all nine search engine variants: the Baidu users in mainland China. Baidu Baike is the most visible across all types of search queries. Note that the proportion of visibility scores Baidu.com has garnered ranges from weak 60% to 80%, leaving other websites with less than 8% of the total visibility scores. Baidu_CN seems to favour the Chinese video website youku.com for popular search queries (see the categories of top 10 search terms and best film/popular music). Baidu_CN favours itself over MBAlib.com in the Fortune 500 category. For the remaining categories, Wikipedia.org is a distant second.

tcat_host_Baidu_CN

Findings: Google_CN

The table below shows the results of Google_CN, although Baidu.com seems to rank the top position in almost all categories, the proportion of visibility scores is comparatively much less dominant (cf. Baidu_CN), with Wikipedia.org being the close second. Since overall Chinese Wikipedia on average makes up 95% of Wikipedia.org whereas Baidu Baike 87% of Baidu.com, the outcome shown in Table 6 could mean a tie between Chinese Wikipedia and Baidu Baike. Hence if users from mainland China use Google Search instead of Baidu Search, then Chinese Wikipedia will become equally visible as Baidu Baike for them.

tcat_host_Google_CN

Findings: Google_HK

shows the results of Google_HK. Note that because of Google’s exit from mainland China, Google_HK and Google_CN are essentially the same website, with the former serving traditional Chinese users and the latter serving simplified Chinese users. Chinese Wikipedia becomes more visible than Baidu Baike for Google_HK, which is expected because Baidu Baike serves only simplified Chinese content whereas Chinese Wikipedia serves both simplified and traditional Chinese content.
tcat_host_Google_HK

Discussion

In particular, the outcome under the category of Fortune 500 provides an important clue suggesting the difference in outcome is shaped by linguistic factors. MBAlib.com is a website hosted in mainland China serving both simplified Chinese and traditional Chinese content. Thus, Google does not automatically favour Wikipedia by default, as evidenced by the top ranking outcome of MBAlib.com for both Google_CN and Google_HK. In contrast, MBAlib.com is much less visible than Baidu’s own website for the Baidu_CN results, legitimately raising some concerns over the issue of fair competition.

In addition, when considering the ranking position of hudong.com for the respective results for Baidu_CN, Google_CN and Google_HK, the findings seem to confirm the unfair competition accusation made by Hudong’s CEO against Baidu (Yang, 2011). Depending on the types of search quries, Hudong.com is ranked by Google_HK from 3rd to 8th most visible websites, whereas (by Google_CN from 3rd to 9th). In contrast, Hudong is not even among the top-20 for many categories of the sampled queries. Indeed, if Google’s SERP can serve as an independent third party, although Google does not favour Hudong over Wikipedia or even Baidu Baike, Google does not make Hudong almost invisible as Baidu does.

Covering over 97% of the search engine market for four Chinese-speaking regions, the findings clearly indicate a strong localization effects on the gatekeeping function of search engines. The findings also show major user-generated encyclopedias such as Baidu Baike and Chinese Wikipedia do dominate the SERPs with high rankings and visibility scores.  Different localization variants produce divergent outcomes of high-ranking encyclopedia and other websites, thereby indicating strong effects of “network gatekeeping” by search engines in exercising gatekeeping bases of “display” and “localization” (Barzilai-Nahon, 2008).

References

Čuhalev, J. (2006). Ranking of Wikipedia articles on search engines for searches about its own articles (Seminar Task for Internet Search Techniques and Business Intelligence class) (p. 7). Retrieved from http://www.jurecuhalev.com/blog/2006/10/13/seeing-lots-of-wikipedia-in-your-google-searches/
Bar‐Ilan, J. (2006). Web links and search engine ranking: The case of Google and the query “jew”. Journal of the American Society for Information Science and Technology, 57(12), 1581–1589. doi:10.1002/asi.20404
Barzilai-Nahon, K. (2008). Toward a theory of network gatekeeping: A framework for exploring information control. Journal of the American Society for Information Science and Technology, 59(9), 1493–1512. doi:10.1002/asi.20857
Charlton, G. (2012, February 13). Why Wikipedia is top on Google: the SEO truth no-one wants to hear. Econsultancy: Digital Marketers United. Retrieved from http://econsultancy.com/blog/9009-why-wikipedia-is-top-on-google-the-seo-truth-no-one-wants-to-hear?utm_campaign=bloglikes&utm_medium=socialnetwork&utm_source=facebook
Gray, M. (2007, May). Google Love Affair with Wikipedia - Graywolf’s SEO Blog. Graywolf’s SEO Blog. Retrieved December 2, 2011, from http://www.wolf-howl.com/google/google-love-affair-with-wikipedia/
Hopkins, H. (2009, January 23). Britannica 2.0: Wikipedia Gets 97%% of Encyclopedia Visits. Hitwise Intelligence: Analyst Weblog. Retrieved March 19, 2012, from http://weblogs.hitwise.com/us-heather-hopkins/2009/01/britannica_20_wikipedia_gets_9.html
Jiang, M., & Akhtar, A. (2011). Peer into the Black Box of Chinese Search Engines: A Comparative Study of Baidu, Google, and Goso. Presented at the The 9th Chinese Internet Research Conference (CIRC 2011), Washington, D.C.: Institute for the Study of Diplomacy.  Georgetown University.
Khanna, A. (2011, October 26). Google drives traffic to Wikipedia, but half of readers look for Wikipedia content — Wikimedia blog. Wikimedia Foundation: Global blog. Official blog. Retrieved April 29, 2012, from http://blog.wikimedia.org/2011/10/26/search-and-wikipedia/
Nielsen Online. (2008). Wikipedia U.S. Web Traffic Grows 8,000 Percent In Five Years, Driven By Search. New York: Nielsen Online. Retrieved from http://news.softpedia.com/news/Wikipedia-Traffic-Mostly-from-Google-85703.shtml
Silverwood-Cope, S. (2012, February 8). Wikipedia: Page one of Google UK for 99%% of searches. Intelligent Positioning Blog. Retrieved February 14, 2012, from http://www.intelligentpositioning.com/blog/2012/02/wikipedia-page-one-of-google-uk-for-99-of-searches/
Yang, Y. (2011, February 25). China’s “Wikipedia” Submits Complaint about Baidu. Economic Observer News, 508, 28. Retrieved from http://www.eeo.com.cn/ens/Industry/2011/03/04/195125.shtml

Comments (choose your preferred platforms)

Loading Facebook Comments ...