I’ve written a number of papers about the role that automated software agents (or bots) play in Wikipedia, claiming that they are critical to the continued operation of Wikipedia. This paper tests this hypothesis and introduces a metric visualizing the speed at which fully-automated bots, tool-assisted cyborgs, and unassisted humans review edits in Wikipedia. In the first half of 2011, ClueBot NG – one of the most prolific counter-vandalism bots in the English-language Wikipedia – went down for four distinct periods, each period of downtime lasting from days to weeks. Aaron Halfaker and I use these periods of breakdown as naturalistic experiments to study Wikipedia’s quality control network. Our analysis showed that the overall time-to-revert damaging edits was almost doubled when this software agent was down. Yet while a significantly fewer proportion of edits made during the bot’s downtime were reverted, we found that those edits were later eventually reverted. This suggests that human agents in Wikipedia took over this quality control work, but performed it at a far slower rate.
This post for Ethnography Matters is a very personal, reflective musing about the first bot I ever developed for Wikipedia. It makes the argument that while it is certainly important to think about software code and algorithms behind bots and other AI agents, they are not immaterial. In fact, the physical locations and social contexts in which they are run are often critical to understanding how they both ‘live’ and ‘die’.
My frequent collaborator Aaron Halfaker has written up a fantastic article with John Riedl in Computer reviewing a lot of the work we’ve done on algorithmic agents in Wikipedia, casting them as Wikipedia’s immune system. Choice quote: ”These bots and cyborgs are more than tools to better manage content quality on Wikipedia—through their interaction with humans, they’re fundamentally changing its culture.”
I don’t normally pick on people whose work I really admire, but I recently saw a tweet from Mark Sample that struck a nerve: “Look, if you don’t instagram your first pumpkin spice latte of the season, humanity’s historical record will be dangerously impoverished.” While it got quite a number of retweets and equally snarky responses, he is far from the first to make such a flippant critique of the vapid nature of social media. It also seriously upset me for reasons that I’ve been trying to work out, which is why I found myself doing one of those shifts that researchers of knowledge production tend to do far too often with critics: don’t get mad, get reflexive. What is it that makes such a sentiment resonate with us, particularly when it is issued over Twitter, a platform that is the target of this kind of critique? The reasons have to do with a fundamental disagreement over what it means to interact in a mediated space: do we understand our posts, status updates, and shared photos as representations of how we exist in the world which collectively constitute a certain persistent performance of the self, or do we understand them a form of communication in which we subjectively and interactionally relate our experience of the world to others?
This was an interview I did with the wonderful Heather Ford, originally posted at Ethnography Matters (a really cool group blog) way back in January. No idea why I didn’t post a copy of this here back then, but now that I’m moving towards my dissertation I’m thinking about this kind of stuff more and more. In short, I argue for a non-anthropocentric yet still phenomenological ethnography of technology, studying not the culture of the people who build and program robots, but the culture of those the robots themselves.
In the Wikipedia research community — that is, the group of academics and Wikipedians who are interested in studying Wikipedia — there has been a pretty substantial and longstanding problem with how research is published. Academics, from graduate students to tenured faculty, are deeply invested and entrenched in an system that rewards the publication of research. Publish or perish, as we’ve all heard. The problem is that the overwhelming majority of publications which are recognized as ‘academic’ require us to assign copyright to the publication, so that the publisher can then charge for access to the article. This is in direct contradiction with the goals of Wikipedia, as well as many other open source and open content creation communities — communities which are the subject of a substantial amount of academic research.
I recently saw Helvetica, a documentary directed by Gary Hustwit about the typeface of the same name — it is available streaming and on DVD from Netflix, for those of you who have a subscription. As someone who studies ubiquitous socio-technological infrastructures (and Helvetica is certainly one), I know how hard it is to seriously pay attention to something that which we see every day. It may seem counter-intuitive, but as Susan Leigh Star reminds us, the more widespread an infrastructure is, the more we use it and depend on it, the more invisible it becomes — that is, until it breaks or generates controversy, in which case it is far too easy. But to actually say something about what well-oiled, hidden-in-plain-sight infrastructures are, how they came to have such a place in our society, and why they won out over their competitors is a notoriously difficult task. But I came to realize that the film is less of a history of fonts, and more of an anthropology of design.
I’m part of a Wikipedia research group called “Critical Point of View” centered around the Institute for Network Cultures in Amsterdam and the Centre for Internet and Society in Bangalore. (Just a disclaimer, the term ‘critical’ is more like critical theory as opposed to Wikipedia bashing for its own sake.) We’ve had some great conferences and are putting out an edited book on Wikipedia quite soon. My chapter is on bots, and the abstract and link to the full PDF is below:
I describe the complex social and technical environment in which bots exist in Wikipedia, emphasizing not only how bots produce order and enforce rules, but also how humans produce bots and negotiate rules around their operation. After giving a brief overview of how previous research into Wikipedia has tended to mis-conceptualize bots, I give a case study tracing the life of one such automated software agent, and how it came to be integrated into the Wikipedian community.
The Lives of Bots [PDF, 910KB]
So given what’s going on* in Egypt and the Middle East, we in the West are fascinated by not so much revolutions and popular uprisings against dictatorial regimes, but an efficacious use of social media. Even Clinton is talking about the Internet as “the world’s town square”, and it seems that the old conversation about the Internet and the public sphere is going to flare up for the third time (1993-5 and 2001-3 are the other two times). Since Habermas is generally credited for bringing this notion of the public sphere to the forefront of popular, political, and academic discourse, it is natural to cite him. Then critique him to death, talking about how we need to get beyond an old white guy’s theories. And it feels good, I know.
The problem is that most people only read his first book, The Structural Transformation of the Public Sphere, which was written in 1962, and then proceed to critique “the Habermasian public sphere.” I can’t tell you how many articles I’ve read which demand that we ‘move beyond’ Habermas or go ‘post-Habermasian’ and only cite Structural Transformation. It’s a great literary foil if you’re advancing your own concept of the public sphere, and the whole ‘new events require a re-evaluation of old theories’ is a mainstay of academia. As a crazy post-Latourian socio-technical ethnographer who grants agency to everything (literally, every single thing) except for social structures, it is weird that I’m defending him. But I’m also a huge proponent of keeping your intellectual allies close and your intellectual opponents closer.
* I love how all our social/cultural/economic/political theories of the state, legitimacy, revolution, and democracy are undergoing their most radical problematization since the fall of the Soviet Union, such that we don’t know how to name the events in the past month, thus we settle on something like “what’s going on.”
This is a paper I co-authored with David Ribes and recently presented at HICSS, the Hawaii International Conference on Systems Sciences. It’s a qualitative methodology based on analyzing logging data that we developed through my research on Wikipedia, but has some pretty broad applications for studying highly-distributed groups. It’s an inversion of the previous paper we presented at CSCW, showing in detail how we traced how Wikipedian vandal fighters as they collectively work to identify and ban malicious contributors.
Abstract: We detail the methodology of ‘trace ethnography’, which combines the richness of participant-observation with the wealth of data in logs so as to reconstruct patterns and practices of users in distributed sociotechnical systems. Trace ethnography is a flexible, powerful technique that is able to capture many distributed phenomena that are otherwise difficult to study. Our approach integrates and extends a number of longstanding techniques across the social and computational sciences, and can be combined with other methods to provide rich descriptions of collaboration and organization.
Citation: Geiger, R.S., & Ribes, D. (2011). Trace Ethnography: Following Coordination Through Documentary Practices. In Proceedings of the 44th Annual Hawaii International Conference on Systems Sciences. Retrieved from http://www.stuartgeiger.com/trace-ethnography-hicss-geiger-ribes.pdf