Closed-source papers on open source communities: a problem and a partial solution

13 minute read


In the Wikipedia research community — that is, the group of academics and Wikipedians who are interested in studying Wikipedia — there has been a pretty substantial and longstanding problem with how research is published. Academics, from graduate students to tenured faculty, are deeply invested and entrenched in an system that rewards the publication of research. Publish or perish, as we’ve all heard.   The problem is that the overwhelming majority of publications which are recognized as ‘academic’ require us to assign copyright to the publication, so that the publisher can then charge for access to the article.  This is in direct contradiction with the goals of Wikipedia, as well as many other open source and open content creation communities — communities which are the subject of a substantial amount of academic research.

Freely-accessible or freely-licensed?

There are actually two issues here, the first being that members of these communities want access to research about themselves without having to pay the average $20-$30 an article.  While important, this also overshadows a more fundamental concern: communities like Wikipedia, Apache, Creative Commons, and OLPC were founded on the idea of providing free and open software, hardware, or educational content to the world.   The Wikimedia Foundation’s mission statement is “to empower and engage people around the world to collect and develop educational content under a free license or in the public domain.”  That is pretty clear-cut, and those of us with obligations to both our own academic community and the Wikipedia community are having more and more problems with negotiating those competing tensions.

In a sense, this is related to how the major ethical dilemma with 19th and early 20th century anthropologists wasn’t about giving ‘their natives’ a copy of their manuscripts. Rather, it was that most anthropologists were participating in systems of colonialism, which were in direct opposition to the interests of the people they studied.  Now, I am in no way arguing that the same kind of power relation exists between academics who study Wikipedians and the Wikipedian community, or that the issue open educational sources is on the same ethical level as colonialism.   As an aside, contemporary anthropologists have documented this shift from ‘studying down’ to ‘studying up’, although I would say that most academics who research open communities like Wikipedia are now ‘studying across’ — but that interesting subject is for another blog post.   But I bring it up because unlike with the Trobriand Islanders, the communities that we study are now beginning to articulate their concerns with how we perform and publish our research, and it is something that we need to listen to.

So to return to the core issue at hand: why is the Wikipedian community (and the Wikimedia Foundation) supporting research that will be copyrighted and bound up in publications which further support an intellectual property regime they clearly stand against?   And what does it mean for us as academic researchers to give back to the communities we study?   It obviously goes beyond being willing to send a copy of a PDF to an interested Wikipedian over e-mail, or even hosting a freely-accessible copy of our copyrighted PDFs on our websites (which many of us do, even when we’re not supposed to). For those of us studying Wikipedia, Creative Commons, Scratch, or a number of open content creation communities, it means releasing our research under a Creative Commons license, as this has become the standard for releasing everything other than code.

Now, the moment I say this, all the academics breathe a heavy sigh, knowing that such a request is impossible, given the current academic system in which we are entrenched. Even the Journal of Computer Mediated Communication, one of the few top-tier open access journals in the social sciences, is copyrighted by the publisher. Some academic superstars like Lawrence Lessig have been able to get their books published from a university press while still being released under a CC license, but not all of us are Lawrence Lessig. Especially for graduate students and junior faculty, who are desperately trying to get their research published anywhere, when the paper finally gets accepted and that copyright assignment form comes in your inbox, the last thing you want to do is start a losing battle over CC-BY-SAing your paper. However, I do have to give a shoutout to Joseph Reagle, who spent a massive amount of effort getting MIT Press to let him publish his book on Wikipedia under a CC license (although with a number of restrictions), but it is unclear the extent to which this will continue in the future.

A partial solution: freely-licensed figures, ‘used with permission’ in copyrighted research papers

So now I finally get to the solution that this blog post was supposed to be entirely about. We academics who study open content communities have an obligation to release our research under free licenses. This does not mean that we have to release our research papers under CC-BY-SA, which is all but impossible for most of us. What it means is that we must release our findings, results, and conclusions under such licenses, and thanks to how copyright works, we can do this through the existing system. Conclusions and abstracts are easy: we just re-write them. We should actually be in the habit of re-writing our densely-worded abstracts and conclusions under a more succinct and human-readable for the communities we study anyway.

However, there is also a way to do this with figures, charts, and graphs. This idea came to me when I saw a copyrighted article in the ACM library (from the Association for Computing Machinery, where a significant amount of Wikipedia research is published) which used a photo someone else took “with permission.” This kind of thing happens regularly enough for the ACM to have a rather sane policy on it: “The author’s copyright transfer applies only to the work as a whole, and not to any embedded objects owned by third parties. An author who embeds an object, such as an art image that is copyrighted by a third party, must obtain that party’s permission to include the object, with the understanding that the entire work may be distributed as a unit in any medium.” I haven’t checked any other publication houses, but I’ve seen this kind of situation happen in so many different books and papers that it could provide a nice loophole in for most of academia.

For most research on Wikipedia, the figures, charts, and graphs are the most interesting aspects of the research, and these can be released under a CC-BY or CC-BY-SA license, and then used with permission in an ACM article. The ACM’s main concern is that they need authors to assign copyright to them in order to make sure publication goes smoothly, and as long as the ‘original author’ of the image is completely fine with having the image in the work and published by the ACM, everyone is happy. I’m no lawyer, but I think this would work with releasing figures, charts, and graphs, even though the copyright policy only qualifies the legal phrase with an example of art images copyrighted by third parties. This doesn’t work as well with many forms of qualitative research, such as historical or interview-based research in which the goal is to elaborate on specific case studies. Still, figures and conceptual diagrams are also useful in those kinds of papers, and can be added to an alternative documentation of a research project, which is possibly co-extensive with but not identical to the research paper.

I’ve actually been putting my charts and graphs up on Wikimedia Commons for quite some time (you can check them all out on my user gallery), even before I realized that copyright was even an issue.   These figures are present in my published papers, many of which are copyrighted by the ACM.  Thankfully, it turns out that this is actually compatible copyright-wise, but this is only solid because I uploaded them to Commons before assigning copyright to the ACM.  It is less clear if someone can retroactively release such images.

But that issue aside, my graphs and charts can live in both worlds, serving members of both communities.  For my quantitative research, these graphs contain my core findings about the rise of bots and assisted editing tools, for example. I have yet to document my previous research projects in a way that would be helpful to others.  More on that in the section below, but I think that even just uploading figures to Commons is a good start.  And it is incredibly painless, especially given that uploading to Commons is a lot easier now than it has been in the past.

Research documentation on Meta-Wiki

**Documentation of research projects could take place quite nicely in a new Research: namespace that some great people at the Wikimedia Foundation have provided to document planed, current, and past research projects on Meta-Wiki, the wiki that is used to coordinate many tasks which are common to all language versions of Wikipedia, as well as projects like Wikisource or Wiktonary. You can see a very rough example of one of these that I am working on with as part of my summer research  fellowship with the Wikimedia Foundation: an incomplete but still interesting study of new users that fellow-Fellow Jonathan Morgan and I are doing.

The documentation page is not written like an academic article, although it does give Wikipedians and researchers alike something that is arguably more important. It gives information necessary to replicate the study, for example, how we sampled for new users and what coding schema we used to track new user participation in community spaces. It also contains a few sentences about the motivation of the study, and a few sentences about each of the results. And critically, it contains the graphs which clearly indicate that since 2004, fewer users are participating in community spaces in their first thirty days of joining the project.  If I wanted to write this up into an academic article (which I do plan to), I can do so in such a way that is both suitable for the ACM or another academic publisher, while keeping all the existing content on the documentation page freely-licensed.

Now, to be on the safe side, it may be wise to release these graphs under a CC-BY license instead of a CC-BY-SA one, because the Share Alike requirement might require some other researcher to release an entire academic paper under a CC-BY-SA license if they use one of my CC-BY-SA figures.   However, I do not think this is the case, because as I am the original copyright holder, I can choose to give permission to using images in my own academic papers.   This is a common misconception with Share Alike and CC licenses in general — while I can never revoke my license once I make it, I am not bound by those terms in my own work, and can release the image under as many free and non-free licenses as I choose.   For example, if it is entirely my own image that I license with CC-BY-SA, I do not have to release every work that builds on it under CC-BY-SA, just as I can license the work for commercial use even if I choose a CC license that prohibits commercial use.

Research isn’t a paper

In all, I think that many of the seemingly-intractable problems stem from the false assumption that research projects are entirely encapsulated in a series of papers, and so the demand to ‘freely license your research’ is heard as ‘freely license your papers’.  However, academics already think of research projects as these long processes which spawn multiple papers, and so there is no reason why a research project could not also spawn a freely-licensed documentation space which does not prohibit the publishing of research papers.  Certainly there are many aspects of research papers which would not be included, and there is a risk that these documentation spaces would be second-class reports which are always incomplete compared to the research paper.  Though it is a bit patronizing to universally assume that community members don’t want that dense theoretical analysis of how distributed cognition flows in the actor-network, I think that a facts, figures, and abstracts version would suffice for most.

Given the current academic systems in which we are currently entrenched, I think that this is a good short-term solution, especially for graduate students and other junior scholars who do not have the political capital to change the way in which existing publication regimes operate.  And who knows, perhaps by creating alternative, freely-licensed spaces for documenting research, these publications will recognize the need to make research, though not necessarily research papers, freely accessible and open to all.