Big Data and Public Policy Workshop

On 13 September 2013, I attended a great workshop at the Harvard Faculty Club organized by my colleagues Vicki Nash and Helen Margetts of the Oxford Internet Institute and the journal Policy and Internet on the topic of “Responsible Research Agendas for Public Policy in the Era of Big Data”. The assembled body included not just the usual compliment of academics, but also a significant number of people working in, and closely with, U.S. governmental bodies. The participants from agencies such as the FCC, Federal Reserve Board, Bureau of Labor Statistics, and Census Bureau added immeasurably to our day of discussions on how to understand the relationship between big data and public policy.

I won’t recount the whole day of discussions, which ranged from discussions of how to move toward systems-level thinking using big data once we have a ‘map the size of the world’ at our disposal to debates about how big data can contribute both to the public good in the best cases and, conversely, to the public ‘bad’ in the most troubling cases. Instead, let me draw out just a couple of themes that I found most interesting.

A key point that arose on several occasions has to do with the disconnect between what governments at all levels (i.e. national, state/regional, local) potentially have access to in terms of data, and actually have access to in practice. While recent revelations about the NSA have many thinking of western governments as panopticons, at the agency level it is clear that the situation is far more complex. For example, government agencies gathering data to aid decision-making on economic policy based on sales data and labor statistics traditionally have access to rather broad brush data about things like when raw materials leave the U.S. and when goods return. The global firms which are actually manufacturing and marketing the final goods, however, have extremely detailed records about the entire change of production and sales, but do not have any regulatory requirement to turn these data over to the government, even though it could potentially contribute to policy decisions based on much more complete information. It was suggested during the workshop that one approach would be to approach these firms and ask them to turn it over to appropriate agencies out of patriotic duty, but of course the flip side is that many firms have a self interest in avoiding regulation, and may feel that staying a step ahead of governments gives them an advantage. Additionally, of course, these data are understood to be key assets for companies, and sharing them could easily be seen as undermining their competitiveness. Several participants argued that the growing gap between leading industries and lagging governmental bodies results in regulations that are doomed to failure as the regulators aren’t keeping up with the industries they are meant to be regulating.

It was pointed out, however, that even though the private sector clearly has more big data available to them at the moment that either academics or (some sectors of) the government, we shouldn’t fall prey to thinking that any sector was surging ahead in the careful and innovative use of big data. Clearly, all have ample evidence of naïve or downright poor uses of big data, even while the examples of powerful uses of big data are the most hyped.

One point I noted throughout the day as the group discussed big data for public policy is that there are clearly different levels of policy that were blurring together in our discussion. While some issues overlap, there are different issues when discussing how financial regulators can draw on data-driven approaches to monitoring and regulating the banking sector, as compared to how local governments can use data and algorithms to better inform public-facing case workers who are facing benefits claimants across the desk. Both have considerable potential for improving policy using data and algorithms to reach more effective decisions regarding their constituents. However, bankers are more likely to have considerable expertise at their disposal both to understand the algorithms upon which these decisions are based, and also more ability to push back against any decisions they disagree with. For the benefit claimant, the potential for being faced with inflexible decisions when neither side of the desk understands the complex underlying algorithms driving these decisions is much higher.

The risks, it was argued during the workshop, can be thought of in several key categories, including the risk of big data leading to more unequal treatment of individuals, the risk of misuse of data, and the risk that big data insights will become so compelling that we can’t, at some level, avoid acting on them, but then risk facing a backlash from portions of the public unwilling to accept the power of the algorithms as justification for public policy.

The final portion of the day grappled with how best to train the data scientists to work with these data. It was generally agreed that the data scientist of the future needs skills that go far beyond technical programming skills or narrow analytical skills. The ability to work on multidisciplinary teams is key, and the data scientists who want to lead these teams will need to be able to speak multiple ‘languages’ (in the sense of the language of engineering, of science, of management, etc.). They will need diplomatic skills, and enough understanding of the domains they are working in to know what makes sense, rather than just what the data may appear to show, in order to avoid acting on false correlations. The data scientists need to become really smart in posing questions, knowing what data points are most useful to have access to, and to understand what uses of data are most beneficial for society, and which are most harmful to democracy and society.

All these discussions will feed my thinking as we continue to work on our project which is aimed at getting beyond the hype around big data and discovering what the actual practices, research possibilities, and strategies for working with big data are in the social sciences.

Tanner’s new impact model

Simon Tanner at King’s College London has released his new impact study at http://simon-tanner.blogspot.co.uk/2012/10/the-balanced-value-impact-model.html. This work provides a model for thinking about, measuring and documenting the impact of digital resources.  I’m especially pleased to see our own TIDSR toolkit (http://microsites.oii.ox.ac.uk/tidsr/) referenced strongly in the document as an example of an “essential point of reference” to those wishing to measure impact.

CFP: Small data in a Big Data world

The following call for papers comes in from my friend Lois Scheidt. They will be taking a critical look at big data, and what it means for the practice of research.

CFP “Small Data” in a “Big Data” World, Panel at International Congress of Qualitative Inquiry (ICQI) 2013  to be held May 15-18, 2013 on the campus of the University of Illinois, Urbana-Champaign IL.

 Recently the academic research world has been flooded with discussion of the uses and implications of “Big Data.” For those of us whose research focuses on digital environments this discussion includes conferences, grants, special publications, and job announcements that focus on Big Data and the computational turn in social science and humanities research.

 ‘Big Data’ is not necessarily defined by the size of the data set, for humanities scholars have long been interested in huge textual and image-based corpora.  Instead, ‘Big Data’ refers to the increasing complexity of relationships between data objects in a given set, often requiring large-scale computational and algorithmic resources for analysis.   ‘Small Data’ research, on the other hand, often begins with a theoretical (e.g., critical race theory) or methodological (e.g., case study or ethnography) approach, which is then applied to digital data drawn from less-popular websites, YouTube videos, or even individual blog posts and comments.

Unfortunately, the tools used to analyze Big Data seem to be influencing modes of thought about new media and digital research away from the theoretical and towards the scientistic.  For example, in a recent article Bruns and Burgess (2012) argue that humanist, interpretive studies of social media are ‘ideosyncratic, non-repeatable, and non-verifiable’.   Although Bruns and Burgess concede that there is space for ‘traditional qualitative methods’, their suggestion is that these methods need to be ‘integrated and innovated’ upon in a ‘big data’ context.

Given the increasing amounts of attention (e.g., external funding, public policy, or student interest) ‘big data’ is accruing, where does this leave Small Data research and researchers? This panel seeks to explore the position of Small Data in relation to the discussion and/or use of Big Data. As the definition of Big Data is still in flux we are using Bruns & Burgess (2012) to ground our individual presentation. We are seeking presentations that will explore a variety of views on this turn toward Big Data and the impact on the researched, the researcher, and academia.

References:

Bruns, A., & Burgess, J. (2012). Notes towards the Scientific Study of Public Communication on Twitter. Conference on Science and the Internet. Düsseldorf. Retrieved Oct. 8, 2012 from http://snurb.info/node/1678.

Individual presenters should submit a 150 word abstract to each of the organizers by Nov. 15, 2012.

Organizers:

Andre Brock, Assistant Professor, School of Library and Information Science University of Iowa andre.brock@gmail.com

 Lois Ann Scheidt, Doctoral Candidate, School of Library and Information Science Indiana University lscheidt@indiana.edu

Please forward this CFP to other potentially interested parties and groups.

AoIR Presentation

The slides here were ones we presented about the end(s) of eresearch and the beginings of big data at the Association of Internet Researchers (AoIR) Internet Research 13 (http://ir13.aoir.org/) meeting in Salford.

The presentation focuses on various possible ends to the e-Research programme, including the possibility that it was also just about providing computation support for other disciplines, or that everyone will become ‘accidental e-researchers’ as computation becomes the norm and thus, like other infrastructures, disappears from notice. The third possibility, which is supported by our data, is that various foci over the years (the Grid, Clouds, big data) gain attention cumuluatively (in other words, don’t appear to replace each other, but to add to the mix of computation approaches across the disciplines.

We also discuss styles of science, and suggest that it is possible than an additional ‘algorithmic’ style has emerged that is not a separate style, but an overlay to many of the other styles as algorithmic and computation approaches become part of the regular toolkit of science, social science, and the humanities. Several examples of this are presented in the slides.