What were Google thinking when they spent $1.65bn on YouTube? What Google was after was probably not the idea, or the infrastructure or the content, but the users. Users, in large numbers are incredibly valuable sources of information, even when they do not know it.
Ricardo Baeza-Yates, Director of Web Mining Research for Yahoo Research in Barcelona, says that Yahoo agrees. In a talk at the OII today, he described some on Yahoo’s ideas on improving search by using the implicit information left by hundreds of millions of search queries.
The idea of using information from web users for search is as old as search itself. In the early days of the web, AltaVista simply used the text on a web page for indexing. The text was predominantly human written, and therefore AltaVista was using the collective efforts of everyone who wrote on a web page. Google’s innovation was the PageRank algorithm that made use of the links that web page owners decide to put on their sites. The next step, at least according to Baeza-Yates, is making use of the effort of millions of search users to determine the best answers to specific queries. Under the right conditions, Yahoo believes, the crowd will be wise.
With regard to search, Yahoo seems to have two strategies to make use of their massive user resources. The more mature is Yahoo Answers, where specific, question-type queries are answered by other Yahoo Answers users. The answers are also rated by users and the best answer can be decided by the user who originally asked the question. The catchy slogan is “Better Search through people”. The number of people and incentives required to make this truly useful remains to be seen.
Yahoo’s other search effort is called Mindset (mindset.research.yahoo.com). Mindset is ‘Intent Driven Search’. The idea is that the same search query will have different ‘best’ results based on the intent of the searcher. For example, a search for ‘laptop’ may be for buying a laptop, or for finding out more about the construction and characteristics of a laptop. Mindset has an intent dial at the top of the page, which can be set for anything between ‘Shopping’ on one side and ‘Research’ on the other. The user then sets her intent for the query and the results are modified accordingly.
Yahoo is putting a lot of effort into mining the implicit information that the users of these search projects provide. By mining the search data for links between similar queries and the result clicked on, it is possible to establish links between queries without analyzing the content of the queries themselves. For example, misspelled queries ‘londn eye’ and ‘londen eye’ can be connected without analyzing the query text. The search engine, with automatic spelling correction, will present the London Eye website in both cases. If the user picks the same London Eye site for both queries, an implicit link is created. This implicit link is established mainly because of the automatic spelling correction, but with other, less obviously connected queries the same applies if a major site is often selected from different queries.
Implicit information is clearly valuable, if it can be properly analyzed when it exists on such a large scale. I find it an interesting example of collaborative, distributed knowledge production – a complex information production process is at work during our every query and click on the web.
Two upcoming conferences might be of interest to the millions of people who read this.
International Conference on Technology, Knowledge and Society
3rd International Conference on Communities and Technologies
Both are of a multidisciplinary nature. Or perhaps they are of an interdisciplinary nature. The two terms are sometimes used interchangeably and the difference is not always clear. And what is it we do here at the OII?
The OII website says it is a centre for “multidisciplinary study of the Internet”. Science magazine describes multidisciplinary research as a “today’s hottest buzzword”. The many conferences and publications described as multi- or interdisciplinary seem to indicate growing attention to this type of work. Some suggest that the traditional departments of universities are outdated and hindering progress and that most things today are, or should be, multidisciplinary.
Peter van den Besselaar and Gaston Heimeriks wrote a paper on the differences and usage of the terms. In “Disciplinary, Multidisciplinary, Interdisciplinary – Concepts and Indicators –” they say:
the basic difference between these various manifestations of non-disciplinary is the level of integration of the different disciplinary approaches they are based on.
- In multidisciplinary research, the subject under study is approached from different angles, using different disciplinary perspectives. However, neither the theoretical perspectives nor the findings of the various disciplines are integrated in the end.
- An interdisciplinary approach, on the other hand, creates its own theoretical, conceptual and methodological identity. Consequently, the results of an interdisciplinary study of a certain problem are more coherent, and integrated.”
In studying the Internet integrating the theoretical perspectives and findings of the various disciplines used in the study sounds desirable to me. And eventually Internet studies might also have its own theoretical, conceptual and methodological identity, but we aren’t there yet. Perhaps we should be aiming for interdisciplinary work, or for a clear definition of what we mean by multidisciplinary and interdisciplinary.
There is a lot of discussion about the social (MySpace, Facebook) and the collaborative (Wikipedia, Open Source) on the Internet and future it is creating. Yochai Benkler has written 515 pages on this topic in the ‘Wealth of Networks‘.
His treatment of the question of why people contribute to these projects is especially interesting. Why do I bother clicking on ‘edit’ on a Wikipedia page to correct a link, fact or spelling error? In working on Open Source projects I have always thought it important that the code be made publicly available. Yet, I was never too keen on sharing my homework with others. Where is this general altruism coming from? It may seem counter intuitive that something like Wikipedia actually works. Benkler argues that people participate for many different reasons, and for a site to work, all these different motivations for sharing can be used, often at the same time.
It is also possible, however, that they have tapped into a valuable insight, which is that people behave sociably and generously for all sorts of different reasons, and that at least in this domain, adding reasons to participate — some agonistic, some altruistic, some reciprocity-seeking — does not have a crowding-out effect.
On Wikipedia there is certainly the agonistic as some pages have to be locked to prevent repeated, biased editing, there is also the altruistic in specialists writing and maintaining whole pages, but perhaps clicking ‘edit’ is simply a simple and small enough task that people like me think they might as well make a quick fix. Benkler says this modularity in the greater task is important in getting as many people as possible to contribute. The smaller the minimum contribution, the more people are likely to make a contribution.
…the technical architectures, organizational models, and social dynamics of information production and exchange on the Internet have developed so that they allow us to structure the solution to problems – in particular to information production problems — in ways that are highly modular. This allows many diversely motivated people to act for a wide range of reasons that, in combination, cohere into new useful information, knowledge, and cultural goods.
Meanwhile, the debate about the accuracy of the socially produced Wikipedia still goes on. ‘America’s Finest News Source’ reports that ‘Wikipedia Celebrates 750 Years Of American Independence‘, while a more serious publication asks ‘Can Wikipedia conquer expertise?‘
Benkler’s book is available in its entirety on his website, linked above. He has also provided a Wiki for discussion…