We’re organizing a workshop on trace ethnography at the 2015 iConference, led by Amelia Acker, Matt Burton, David Ribes, and myself. See more information about it on the workshop’s website, or feel free to contact me for more information.
Date: March 24th 2015, 9:00-a.m.-5:00 p.m.
Location: iConference venue, Newport Beach Marriott Hotel & Spa, Newport Beach, CA
Deadline to register through this form: Feb 1st, 2015. Note: you will also have to register through the official iConference website as well.
Notification: Feb 15th, 2015
Description: This workshop introduces participants to trace ethnography, building a network of scholars interested in the collection and interpretation of trace data and distributed documentary practices. The intended audience is broad, and participants need not have any existing experience working with trace data from either qualitative or quantitative approaches. The workshop provides an interactive introduction to the background, theories, methods, and applications–present and future–of trace ethnography. Participants with more experience in this area will demonstrate how they apply these techniques in their own research, discussing various issues as they arise. The workshop is intended to help researchers identify documentary traces, plan for their collection and analysis, and further formulate trace ethnography as it is currently conceived. In all, this workshop will support the advancement of boundaries, theories, concepts, and applications in trace ethnography, identifying the diversity of approaches that can be assembled around the idea of ‘trace ethnography’ within the iSchool community.
In short, I built a script that dynamically generates a robots.txt file for search engine bots, who download the file when they seek direction on what parts of a website they are allowed to index. By default, it directs all bots to stay away from the entire site, but then presents an exception: only the bot that requests the robots.txt file is allowed full reign over the site. If Google’s bot downloads the robots.txt file, it will see that only Google’s bot gets to index the entire site. If Yahoo’s bot downloads the robots.txt file, it will see that only Yahoo’s bot gets to index the entire site. Of course, this is assuming that bots identify themselves to my server in a way that they recognize when it is reflected back to them.
This is a new article published in Information, Communication, and Society as part of their annual special issue for the Association of Internet Researchers (AoIR) conference. This year’s special issue was edited by Lee Humphreys and Tarleton Gillespie, who did a great job throughout the whole process.
Abstract: This article introduces and discusses the role of bespoke code in Wikipedia, which is code that runs alongside a platform or system, rather than being integrated into server-side codebases by individuals with privileged access to the server. Bespoke code complicates the common metaphors of platforms and sovereignty that we typically use to discuss the governance and regulation of software systems through code. Specifically, the work of automated software agents (bots) in the operation and administration of Wikipedia is examined, with a focus on the materiality of code. As bots extend and modify the functionality of sites like Wikipedia, but must be continuously operated on computers that are independent from the servers hosting the site, they involve alternative relations of power and code. Instead of taking for granted the pre-existing stability of Wikipedia as a platform, bots and other bespoke code require that we examine not only the software code itself, but also the concrete, historically contingent material conditions under which this code is run. To this end, this article weaves a series of autobiographical vignettes about the author’s experiences as a bot developer alongside more traditional academic discourse.
Official version at Information, Communication, and Society
Author’s post-print, free download [PDF, 382kb]
I’ve written a number of papers about the role that automated software agents (or bots) play in Wikipedia, claiming that they are critical to the continued operation of Wikipedia. This paper tests this hypothesis and introduces a metric visualizing the speed at which fully-automated bots, tool-assisted cyborgs, and unassisted humans review edits in Wikipedia. In the first half of 2011, ClueBot NG – one of the most prolific counter-vandalism bots in the English-language Wikipedia – went down for four distinct periods, each period of downtime lasting from days to weeks. Aaron Halfaker and I use these periods of breakdown as naturalistic experiments to study Wikipedia’s quality control network. Our analysis showed that the overall time-to-revert damaging edits was almost doubled when this software agent was down. Yet while a significantly fewer proportion of edits made during the bot’s downtime were reverted, we found that those edits were later eventually reverted. This suggests that human agents in Wikipedia took over this quality control work, but performed it at a far slower rate.
This post for Ethnography Matters is a very personal, reflective musing about the first bot I ever developed for Wikipedia. It makes the argument that while it is certainly important to think about software code and algorithms behind bots and other AI agents, they are not immaterial. In fact, the physical locations and social contexts in which they are run are often critical to understanding how they both ‘live’ and ‘die’.
My frequent collaborator Aaron Halfaker has written up a fantastic article with John Riedl in Computer reviewing a lot of the work we’ve done on algorithmic agents in Wikipedia, casting them as Wikipedia’s immune system. Choice quote: “These bots and cyborgs are more than tools to better manage content quality on Wikipedia—through their interaction with humans, they’re fundamentally changing its culture.”
I don’t normally pick on people whose work I really admire, but I recently saw a tweet from Mark Sample that struck a nerve: “Look, if you don’t instagram your first pumpkin spice latte of the season, humanity’s historical record will be dangerously impoverished.” While it got quite a number of retweets and equally snarky responses, he is far from the first to make such a flippant critique of the vapid nature of social media. It also seriously upset me for reasons that I’ve been trying to work out, which is why I found myself doing one of those shifts that researchers of knowledge production tend to do far too often with critics: don’t get mad, get reflexive. What is it that makes such a sentiment resonate with us, particularly when it is issued over Twitter, a platform that is the target of this kind of critique? The reasons have to do with a fundamental disagreement over what it means to interact in a mediated space: do we understand our posts, status updates, and shared photos as representations of how we exist in the world which collectively constitute a certain persistent performance of the self, or do we understand them a form of communication in which we subjectively and interactionally relate our experience of the world to others?
This was an interview I did with the wonderful Heather Ford, originally posted at Ethnography Matters (a really cool group blog) way back in January. No idea why I didn’t post a copy of this here back then, but now that I’m moving towards my dissertation I’m thinking about this kind of stuff more and more. In short, I argue for a non-anthropocentric yet still phenomenological ethnography of technology, studying not the culture of the people who build and program robots, but the culture of those the robots themselves.
In the Wikipedia research community — that is, the group of academics and Wikipedians who are interested in studying Wikipedia — there has been a pretty substantial and longstanding problem with how research is published. Academics, from graduate students to tenured faculty, are deeply invested and entrenched in an system that rewards the publication of research. Publish or perish, as we’ve all heard. The problem is that the overwhelming majority of publications which are recognized as ‘academic’ require us to assign copyright to the publication, so that the publisher can then charge for access to the article. This is in direct contradiction with the goals of Wikipedia, as well as many other open source and open content creation communities — communities which are the subject of a substantial amount of academic research.
I recently saw Helvetica, a documentary directed by Gary Hustwit about the typeface of the same name — it is available streaming and on DVD from Netflix, for those of you who have a subscription. As someone who studies ubiquitous socio-technological infrastructures (and Helvetica is certainly one), I know how hard it is to seriously pay attention to something that which we see every day. It may seem counter-intuitive, but as Susan Leigh Star reminds us, the more widespread an infrastructure is, the more we use it and depend on it, the more invisible it becomes — that is, until it breaks or generates controversy, in which case it is far too easy. But to actually say something about what well-oiled, hidden-in-plain-sight infrastructures are, how they came to have such a place in our society, and why they won out over their competitors is a notoriously difficult task. But I came to realize that the film is less of a history of fonts, and more of an anthropology of design.