We are looking for a part time researcher for our wikipedia project. It should be someone with quantitative skills and a history of writing for an academic audience (and some post-graduate training). We can be somewhat flexible in terms of hours. That is to say, if you are interested in working near full time over the summer, rather than half time for six months, we can negotiate.

Here is the post:

Part-time Research Assistant Fell Fund Project

OXFORD INTERNET INSTITUTE

Grade 6: Salary £26,004 – £31,020 p.a. (Variable hours – up to 18.5 hours a week)

We are a leading international research Institute looking for a part-time Research Assistant to carry out research into the geography and social structure of Wikipedia in the Middle East and North Africa through large-scale data analysis. The position will involve the analysis of the corpus of Wikipedia text, user-pages and history files and the use of statistical techniques to explain spatial and social patterns. Our research question focuses on patterns of representation on Wikipedia as well as an articulation of patterns of conflict and barriers to participation.

Based at our OII North office at 66 Banbury Road, this position is available immediately for 6 months, with variable hours – up to 18.5 hours a week.

Applications for this vacancy are to be made online. To apply for this role and for further details, including a job description and selection criteria, please click here

Only applications received before 12:00 midday on the 1st June 2012 can be considered. Interviews for those short-listed are currently planned to take place in the week commencing 4th June 2012

Recently on the social media collective blog, danah boyd set off a firestorm by suggesting that the imposition of real names on social media sites is an abuse of power…or even authoritarian. The obvious retort is “don’t like it, don’t use it”, or learn how to segement one’s network (i.e. bend to the system, because its your problem).

But I’m here to take another angle on this one: real name sites are necessarily inadequate for online audiences. Yes, necessarily inadequate.

I once had a dream that people could seamlessly manage their social networks on any site through some combination of visulaization and clever user interfaces. It was based on the visualization of Facebook networks. People who see these networks, as I’ve discovered in many interviews (early work discussed here), readily identify the myriad social contexts in their networks. One cluster is clearly family, another is clearly coworkers, and so forth. As such, it seemed like the next step would be to use this information to create some sort of selective sharing interface. These are Google+’s social circles (or Diaspora’s Aspects), except determined semi-automatically. Then one could simply select which context, and read from it, or post to it.

Network drawn from namegenweb

Old Facebook personal network showing many social contexts

This is “the myth of selective sharing” (as Marc Smith calls it). Its an engineer’s dream based on a misunderstanding of the key distinction between offline and online life. Offline we assume that our conversations are not encoded and thereafter available to people outside of our immediate audience, by default. Yes, some lucky people give talks to large audiences, they get on the radio or tv. Most don’t. But everyone has some reason to share things with one person but not another. We don’t need to go as far as whistle blowers, political dissidents or closet cases in religious areas. Lots of people have grievances with their bosses, or find someone else attractive, or have problem students / subordinates they need advice on. Lots of people need advice on their own issues, be it alcoholism, drug abuse or gambling. When people do this offline, they do it in situations: temporally and spatially bounded contexts for action. The pub after work; the patio over a cup of coffee; the closed door meeting.

But what do we mean by offline anymore? Some assume it is when they are not searching and browsing the web? Or when they aren’t streaming video, emailing someone or inside a virtual world. Being offline actually refers to a much more limited space than that. Being online is being encoded and having that which is encoded available to some party other than those immediately present. You are not online when you are in front of a computer – you are online when your actions are being digitized and networked. Online is on-the-record. Offline is off-the-record.

Offline people say things appropriate to the group they are in. That doesn’t mean they are two-faced, insincere or liars. It means people are context aware. People observe walls, clocks, furniture, fashion and music. These things guide us as to the appropriate way of acting. The guy writing his novel at the bar on Friday night is out-of-place. The guy who shows up to work drunk on Monday morning has a problem. Offline people don’t have to worry about their real name, because their behavior is tied to the context and the impressions the foster in that context. In fact, I’ll say that even more strongly – if your speech is not confined to the context you are in – but available to a potentially unknowable audience – you are online.

This is why real name sites are necessarily inadequate. They deny individuals the right to be context-specific. They turn the performance of impression management into the process of curation. Facebook curates through the top news feed, Twitter does it through lists and Google+ through some confusing (and as far as I can tell, failing) social circles model. Impression management means selectively presenting an idealized version of one’s self specific to that context. Curation means selecting objects for display. So if you don’t think that being context-specific is a right, consider what you think the ‘free’ means in the right to free speech. When my speech is necessarily encumbered by a tethering to a single all-encompassing key (the real name) that unlocks whatever I say, I am no longer free to address one specific context and not another one. I am engaging in a trust relationship with the curator, but I am not free to say what I want. Sometimes that relationship fails, sometimes its out of my control (when others post on my behalf, tag me, etc…)

Of course, this applies most strongly to non-addressed spaces. When I address someone in an email or on the phone, I am still online, but I’m not necessarily subject to curation. I send a message to a specific recipient, I expect that recipient to get the message, not have gmail decide (but even then spam is filtered out through some curation). On the other hand, when I submit content to social media sites, I do not have a clear view of who is, or will see it, outside of some vague notion of friend lists.

Pseudonyms have long been a way out of this situation. Someone might have one name for an anonymous support group, another for a group of bi-curious and closeted individuals (or just for sex in general), another for a message board about programming, and one for politics or political action. If these were mine, then the choice to blend them or keep them separate is mine. Real names and third-party curation takes away that choice. In their place they offer many advantages, but freedom is not one of them. And that’s why the imposition of one name, one network for all is an abuse of power. It says not only is the curator better at deciding who you should read your content than you, the curator won’t even give you the choice to begin with.

(Cross posted at the social media collective blog)

Correction!!! My NodeXL workshop is Tuesday

Just a small correction to note that my NodeXL social network analysis workshop is on Tuesday, July 5th!

Still on the fence? Here’s a testamonial from Dr Jo Hamilton, ClimateXChange Coordinator at the Environmental Change Institute Oxford University Centre for the Environment.

“Bernie Hogan’s workshop was an excellent introduction to the topic and themes of Social Network Analysis. He conveyed the key points succinctly, incorporating questions and discussion as he went along. The exercises using NODEXL were a great way to embed our knowledge, and to enable us to ask more practical questions about using the software. An excellent introduction which has made me more confident to approach and use the tools.”

In a recent New Scientist article I was quoted on a story about gender-detection technology. The hook was that this technology might have detected the fraudulence of the author of the moderately popular blog “gay girl from Damascus”. Turns out he was straight guy from the US at the University of Edinburgh. The quote reads:

Bernie Hogan, a specialist in social network technology at the Oxford Internet Institute in the UK, thinks there is a useful role for such technology. “Being able to provide some extra cues as to the gender of a writer is a good thing – it can only help.” The Independent, June 17, 2011

Its a classic taken-out-of-content quote by a good journalist for a sensible story. But the point is that it suggests I’m wholeheartedly validating the notion of detecting gender, almost as a pre-condition for interpreting content…”it can only help”. To note, the leading question was “would gender-identifying software be useful for curious end users?” To which I replied sure (almost tautologically). But what if someone doesn’t want to be tied down by their existing identity, or what if they want speak for a population that are reluctant to speak for themselves. What if they want a certain level of gender ambiguity?

Syria is currently engaging in significant human rights abuses, political strife and rampant state-endorsed homophobia. These significant issues were covered in the blog and telegraphed by the media. GGiD was addressing these issues, albeit with borrowed legitimacy and some significant factual inaccuracy. If technology can assess your race, class, gender, occupation, education level and so forth simply from your texts then it also politicizes your texts. Even if sex and race are biological (where gender and ethnicity are cultural), the importance we give them in evaluating legitimacy is a political act. This is especially true in Syria, where an exposed identity could lead to very serious consequences.

I like my pseudononymous Internet. I like the 4chans, reddits, and tumblrs of the world. Having a button that tells me whether content was written by a man or a women adds a contextualizing lens. Such a lens may help to detect fraud, but it may also have a chilling effect on the creative, open and free discourse currently found on these sites. If you want to detect fraud such a button can only help, but if you want creative blogs such as the story of a lesbian in highly surveilled, politically oppressive,homophobic country…that’s another matter.

What’s in a name? Is Bernie more likely to get a call back for an apartment than Jamal or Abdul? The answer is most likely, at least on Craigslist (and we have just as much reason to believe elsewhere as well) . We explore this issue in a new paper that just yesterday was accepted for publication in City & Community. This journal is the official journal of the Community & Urban Sociology section of the American Sociological Association, and we think a fine home for this work.

We explore this issue using a novel form of an online audit study. Past audit studies have looked at whether people get call backs for resumes, turned down on the telephone and call backs for apartments in person. Our work is somewhat different in the sense that we define a methodology for scaling up this sort of work by using an automated emailing program that I designed and implemented. Brent Berry, my co-author from the University of Toronto came up with the idea and we worked rather hard in putting the pieces together over the last several years.

It will appear in issue 10(4) sometime later this year. A pre-print copy of the article is available here for download [pdf link]. Please note that there will be some slight typographical changes between this version and the final version (so be careful when quoting), but the substantive writing should be the same.

NodeXL / Network Workshop at UKSNA

I’m giving a one day workshop at the upcoming UK Social Networks Association conference in Greenwich. The workshop is on Wednesday, July 6th. Registration is still open.

Here’s the write up from the page on short course :

From data to discovery: NodeXL for social network capture, analysis and visualisation

This workshop will familiarise attendees with the tools to begin social network capture, analysis and visualisation, especially the analysis of data from the web, such as tweets, Facebook friendships and hyperlinks. While the workshop is focused on social media, it is applicable to a host of social networks, such as 2-mode, ego networks and whole networks.

The workshop will quickly cover some of the basic topics in data collection and analysis and move on to visualisation through the NodeXL package. This software, designed by consortium of HCI specialists, social scientists and technologists from Microsoft Research is designed to make the discovery of patterns with social networks straightforward and intuitive.

By working with data already in Excel, NodeXL makes the preparation of network data sets simple and enables the researcher to focus most directly on analysis and intuition building. By having data capture tools embedded directly, it also serves as an extensive and highly customisable way to capture data, not just analyse it.

NodeXL is a free download but requires Microsoft 2007 or 2010 for Windows (Mac users need to use Bootcamp or Parallels). It is designed by the nascent Social Media Research Foundation, and already is actively used by both industry and academic researchers. A learner’s guide and academic text, ‘Analysing Social Media with NodeXL’ was published last year by Morgan Kaufman.

Topics covered include:

  1. Network types and network metrics
  2. Intuition building through visualisation
  3. Coupling metrics and visualisation
  4. NodeXL as a data capture tool (Twitter, WWW, Email, Facebook)
  5. Importing and exporting data and images from NodeXL
  6. Advanced NodeXL Features: Groups, Automation, Macros.

Its a tall order, but I’ve already done a few of these workshops so far, and I can guarantee it will be fast paced, engaging and hopefully insightful. If you are interested in have any questions, feel free to email me. The course costs £100 (and £50 for students).

For the second year in a row, I co-taught the OII’s Online Social Networks course with Dr. Sandra Gonzalez-Bailon. Again this year we had some excellent studies and some excellent students. But I thought the graphs were so visually interesting, and often creative that I’ve asked for permission to compile them in a gallery. This isn’t a complete list of sociograms (and of course, not every network analysis needs a sociogram) but this work ought to give you a sense of what we have been doing.

The students have provided the text themselves and often additional material if you follow links in the descriptions and comments to their own personal blogs and websites. Enjoy!

What you can and can’t get from Facebook

I have had several interesting requests for my NameGen program, including the much maligned desktop application that is not really active development (pending future grant money). What is interesting is the sorts of requests that come in based on misunderstandings of what Facebook provides to an application developer.

So, what can and can’t you get from the Facebook API about an individual and their friends?

I. You can get a list of a user’s friends, their names, and their profile pic. Here is the FQL:

SELECT user,name FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = $user_id)

If the user has made other information public, that is often accessible as well, such as gender, date of birth and hometown. Some protected information such as relationship status is never available to the API for anyone other than the user. You can customize what information is available to others through Privacy Settings -> Applications, Games and Websites -> Information Accessible through your friends. This figure shows what I have set. I believe it is the default.

The friend sharing customization screen

The customization screen for Facebook's information sharing between friends via the API

II. You can get a 1.5 degree personal network. (Please note – this is a simplified and suboptimal query that will not return a complete network in most cases where the network is larger than 200 nodes).

SELECT uid1, uid2 FROM friend WHERE uid1 IN (SELECT uid2 FROM friend WHERE uid1 = $userid ) AND uid2 IN (SELECT uid1 FROM friend WHERE uid2 = $userid)

This will return a list of friend links. There is a more inefficient way of querying this information using the get_friends() query. It requires you to send all possible friend combinations, and it will return, for each one, whether the two users are friends. So the list length will be n(n-1)/2 where n is the number of friends. That’s approximately 125k elements for a 500 person network. Those elements have to be sent in batches of 5k max (as far as I recall).

III. You can get the links between friends in a group. I don’t have this query on hand, partially because Facebook will not return a complete list of group members (at least through FQL), but if you are in a group, you can ask if other group members are friends. The same cannot be said of a fan page (i.e. The ones where you ‘like’ rather than the ones where you join). One attempt to explore this network is in http://apps.facebook.com/netvizz

Click for book

An example showing the 1.0, 1.5 and 2.0 ego nets (taken from my chapter in Analyzing Social Media with NodeXL - click pic for book

What can you view through Facebook (and theoretically spider) but not get through the API:

  1. The 2-degree network: If you can view a friend, you can view all of their friends. That said, the API will not return these people. It only will return the friends of a single user.

  2. All members of a group.

  3. All members of a page (if you are a page administrator).

So…what does this mean? It means that basically, Facebook allows people to learn about the network immediately surrounding them and that’s it. It similarly means Facebook will not allow you easily to learn the personal network for someone else. So consider the following requests that won’t work:

  • The personal network of a person’s friend. Being able to access this would be a problem as a clear form of surveillance. So if you want to view a friend’s network, you should ask them to download it themselves.

  • This also holds for parents - no I will not help you download your teenager’s network just because she put you on limited profile. Moreover, this is not even possible unless you and your daughter and all of her friends are in a group and you can get the complete list of people in the group. And recall, while it is possible to add other people to a group, those people must already be your friend, so you can already access those ties anyway.

  • The personal network of someone you are investigating. If you do not have a warrant to download this network (or to compel a suspect to download their own network), and you want this information, this again, is surveillance. Also, I’m pretty sure that taking the network of someone else would be a breach of the terms of use. And as for the warrant – I would tread carefully, as this will probably involve all sorts of legalistic red tape (especially in privacy friendly countries such as Germany).

  • This also holds for people curious about the personal network of anyone they are not friends with. A Facebook friendship is a form of information access and redistribution. If you are not friends with someone, they have not granted you this access. This is the sort of notion that got Pete Warden into legal hot water with his 200 million-strong global Facebook network that he was planning on releasing.

Personally, I think that data portability is a good thing, and I am currently following the debacle between Google and Facebook rather closely, but I also think this needs to be a carefully metered out. And I don’t think that Facebook is a sinkhole of data – in fact, their API is very liberal. In some cases, perhaps too liberal. This is especially considering the perennially evolving terms of use. But they have made some careful decisions that I tend to agree with (such as limiting the 24-hour caching requirement), and while I wouldn’t give them carte blanche support, I think they are making baby steps in the right direction. Thankfully, those steps do not involve downloading the networks of others without their permission and collecting friends of friends of friends.

As part of Facebook’s march to capture as much of your life as possible while making you feel okay about it, last month they announced a “Download your Data” feature. This feature has been rolled out in waves, and happily, that wave lapped up on my shores yesterday. So here’s a review of the feature

The short story is that this is a pretty neat feature that is clearly still a first step. In particular, there are a handful of absences, such as not knowing who ‘liked’ your posted. Also, it won’t download any of your friend’s data, so if you want to view your personal network, you’re still left with applications like my own namegenweb or netvizz.

The data that comes down is basically a series of queries on the big table that holds all of this data, spit out as a series of sparse html files. The source code of the files is nice and clean, with legible div tags, but its no easier to parse this data than it is to do so through a series of fql queries. Obviously, most users aren’t going to either look at the source code or use fql queries, but its nice to know that their API is so thorough and open that you were able to get all of this data even before the “download your data” feature.

To get started, you have to go to account -> account settings. There should be a feature there called download your information, with a link called “learn more”.

Click on account settings

If you are part of the roll out, then you can click on this link. If not, then it tells you to sit tight, and that it will be available sometime in the future. When available, it gives you a big preamble and a simple “download” button.

Once clicked, that button should turn into a greyed out “pending” button. The first time I did this, it emailed me and said there was something wrong with my file, and that I could try again. I tried again, and the button never turned to “pending”. So I logged out of Facebook, closed my browser and tried again. This time it worked.

If everything works, you should see this screen before waiting.

It took about 2 hours for the email to come to me, and it was very simple to click on that link and get a zip file.

Inside the zip file was a series of folders and a pretty straightforward “index.html” file. I’ve posted a stripped down version of mine to NameGen so you can get a sense of what this whole thing looks like.

Example Profile Screen

Comments to photos appears as usual. People tagged in photos do not. Both messages on one’s wall, and the replies show up, but interestingly, only the number of likes shows up. You cannot see who actually liked it. I can’t comment on “places”, since I haven’t used that feature and do not know how it would show up.

Curiously, Event IDs are the only thing that is clickable. The remainder are not – that is, you cannot click on a message and go to the live version on Facebook, or click on a friend and go to their profile.

It is nice that this is all in a clean html format. Its a step towards data transparency, but not towards data portability. There is a great deal of work that would be required to actually slice up this data. But as I mentioned, most of this is already covered in the Facebook API, so it doesn’t seem like having another point of entry to machine readable structured data was necessary.

(Face)Book week for Bernie

Wow, what a week for Facebook and me! Nevermind the fact that results from the “So you want to be a scientist” project with Nina Jones went massive on BBC’s website (as the top story two days running with almost a million views). But much hard work and perseverance from colleagues of mine have turned into two fine new books work checking out. And, as you might have suspected, I’m featured in both of them. And so I’m happy to give two utterly biased reviews to two books on social media released this week. “Analyzing Social Media with NodeXL” and “The Facebook Era, 2nd ed.”. Details below.

Continue reading ‘(Face)Book week for Bernie’