Publications

You can also find my articles on my Google Scholar profile.

Asking an AI for salary negotiation advice is a matter of concern: Controlled experimental perturbation of ChatGPT for protected and non-protected group discrimination on a contextual task with no clear ground truth answers

Published in arxiv preprints [cs.CY], 2024

We conducted controlled experimental bias audits for four versions of ChatGPT, which we asked to recommend an opening offer in salary negotiations for a new hire. We submitted 98,800 prompts to each version, systematically varying the employee’s gender, university, and major, and tested prompts in voice of each side of the negotiation: the employee versus employer. We find ChatGPT as a multi-model platform is not robust and consistent enough to be trusted for such a task. We observed statistically significant salary offers when varying gender for all four models, although with smaller gaps than for other attributes tested. The largest gaps were different model versions and between the employee- vs employer-voiced prompts. We also observed substantial gaps when varying university and major, but many of the biases were not consistent across model versions. We tested for fictional and fraudulent universities and found wildly inconsistent results across cases and model versions. We make broader contributions to the AI/ML fairness literature. Our scenario and our experimental design differ from mainstream AI/ML auditing efforts in key ways. Bias audits typically test discrimination for protected classes like gender, which we contrast with testing non-protected classes of university and major. Asking for negotiation advice includes how aggressive one ought to be in a negotiation relative to known empirical salary distributions and scales, which is a deeply contextual and personalized task that has no objective ground truth to validate. These results raise concerns for the specific model versions we tested and ChatGPT as a multi-model platform in continuous development. Our epistemology does not permit us to definitively certify these models as either generally biased or unbiased on the attributes we test, but our study raises matters of concern for stakeholders to further investigate. Read more

Making Algorithms Public: Reimagining Auditing From Matters of Fact to Matters of Concern

Published in International Journal of Communication, 2024

Stakeholders concerned with bias, discrimination, and fairness in algorithmic systems are increasingly turning to audits, which typically apply generalizable methods and formal standards to investigate opaque systems. We discuss four attempts to audit algorithmic systems with varying levels of success—depending on the scope of both the system to be audited and the audit’s success criteria. Such scoping is contestable, negotiable, and political, linked to dominant institutions and movements to change them. Algorithmic auditing is typically envisioned as settling “matters-of-fact” about how opaque algorithmic systems behave: definitive declarations that (de)certify a system. However, there is little consensus about the decisions to be automated or about the institutions automating them. We reposition algorithmic auditing as an ongoing and ever-changing practice around “matters-of-concern.” This involves building infrastructures for the public to engage in open-ended democratic understanding, contestation, and problem solving—not just about algorithms in themselves, but the institutions and power structures deploying them. Auditors must recognize their privilege in scoping to “relevant” institutional standards and concerns, especially when stakeholders seek to reform or reimagine them. Read more

Community, Time, and (Con)text: A Dynamical Systems Analysis of Online Communication and Community Health among Open-Source Software Communities

Published in Cognitive Science, 2022

Free and open-source software projects have become essential digital infrastructure over the past decade. These projects are largely created and maintained by unpaid volunteers, presenting a potential vulnerability if the projects cannot recruit and retain new volunteers. At the same time, their development on open collaborative development platforms provides a nearly complete record of the community’s interactions; this affords the opportunity to study naturally occurring language dynamics at scale and in a context with massive real-world impact. The present work takes a dynamical systems view of language to understand the ways in which communicative context and community membership shape the emergence and impact of language use—specifically, sentiment and expressions of gratitude. We then present evidence that these language dynamics shape newcomers’ likelihood of returning, although the specific impacts of different community responses are crucially modulated by the context of the newcomer’s first contact with the community. Read more

‘Garbage In, Garbage Out’ Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data?

Published in Quantitative Science Studies, 2021

Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. We report to what extent a random sample of ML application papers across disciplines give specific details about whether best practices were followed in labeling training data. Read more

The Labor of Maintaining and Scaling Free and Open-Source Software Projects

Published in Proceedings of the ACM on Human-Computer Interaction (CSCW 2021), 2021

We report findings from an interview-based study of maintainers of free and/or open-source software (F/OSS) projects. F/OSS maintainers perform complex and often-invisible interpersonal and organizational work to keep their projects operating as active communities of users and contributors. We particularly focus on how this labor of maintaining and sustaining changes as projects and their software grow and scale across many dimensions. Read more

Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?

Published in Proceedings of the 2020 ACM Conference on Fairness, Accountability, and Transparency (FAT* 2020), 2019

Many machine learning projects for new application areas involve teams of humans who label data for a particular purpose, from hiring crowdworkers to the paper’s authors labeling the data themselves. In this paper, we investigate to what extent a sample of machine learning application papers in social computing – specifically papers from ArXiv and traditional publications performing an ML classification task on Twitter data – give specific details about whether best practices in human annotation were followed. Read more

ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia

Published in Proceedings of the ACM on Human-Computer Interaction (CSCW 2020), 2019

This paper presents an overview and case studies of ORES, Wikipedia’s real-time machine learning as a service platform, which is designed in line with Wikipedia’s values of open participation, decentralization, and continual iteration. ORES decouples and reduces incidental complexity around several aspects of applying machine learning in a user-generated content platform, including curating training data sets, building models to serve predictions, auditing predictions, and developing interfaces or automated agents that act on those predictions. Read more

Career Paths and Prospects in Academic Data Science: Report of the Moore-Sloan Data Science Environments Survey

Published:

This report of a survey of academic data scientists discusses what data science in the academy is, and various issues around the career paths for those in universities who practice and support data science. We provide evidence-based recommendations about how universities can better support an emerging set of roles and responsibilities around data and computation within and across academic fields. Read more

Recommended citation: R. Stuart Geiger, Charlotte Mazel-Cabasse, Chihoko Cullens, Laura Noren, Brittany Fiore-Gartland, Diya Das, and Henry Brady (2018). _Career Paths and Prospects in Academic Data Science: Report of the Moore-Sloan Data Science Environments Survey._ Report. Berkeley, California: UC-Berkeley Institute for Data Science. https://osf.io/preprints/socarxiv/xe823/

The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work

Published in Computer-Supported Cooperative Work (JCSCW), 2018

Data analytics increasingly relies on open source software (OSS) libraries that extend scripted languages like python and R. Software documentation for these libraries is crucial for people across all experience levels, but documentation work raises many challenges, particularly in open source communities. In this collaboration between ethnographers and data scientists, we discuss the types, roles, practices, and motivations around documentation in data analytics OSS libraries. Read more

Recommended citation: Geiger, R.S., Varoquaux, N., Mazel-Cabasse, C., and Holdgraf, C. (2018). ”The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work.” Computer-Supported Cooperative Work (JCSCW), 27(3). DOI:10.1007/s10606-018-9333-1 https://link.springer.com/article/10.1007/s10606-018-9333-1

Beyond opening up the black box: Investigating the role of algorithmic systems in Wikipedian organizational culture

Published in Big Data & Society, 2017

Scholars and practitioners across domains are increasingly concerned with algorithmic transparency and opacity, interrogating the values and assumptions embedded in automated, black-boxed systems, particularly in user-generated content platforms. I report from an ethnography of infrastructure in Wikipedia to discuss an often understudied aspect of this topic: the local, contextual, learned expertise involved in participating in a highly automated social-technical environment. Read more

Recommended citation: R. Stuart Geiger. (2017). "Beyond opening up the black box: Investigating the role of algorithmic systems in Wikipedian organizational culture." Big Data & Society 4(2). https://doi.org/10.1177/2053951717730735

Operationalizing conflict and cooperation between automated software agents in Wikipedia: A replication and expansion of ‘Even Good Bots Fight’

Published in Proceedings of the ACM on Human-Compter Interaction, 2017

A mixed-method trace ethnographic analysis of issues around the governance of automated software agents in Wikipedia, focusing on how to interpret cases where bots reverted each other’s edits. Read more

Recommended citation: R. Stuart Geiger and Aaron Halfaker. 2017. “Operationalizing conflict and cooperation between automated software agents in Wikipedia: A replication and expansion of Even Good Bots Fight." Proceedings of the ACM on Human-Computer Interaction (Nov 2017 issue, CSCW 2018 Online First) 1, 2, Article 49. DOI:https://doi.org/10.1145/3134684. https://commons.wikimedia.org/wiki/File:conflict-bots-wp-cscw.pdf.

Summary Analysis of the 2017 GitHub Open Source Survey

Published in SocArxiv Preprints, 2017

This report is a high-level summary analysis of the 2017 GitHub Open Source Survey dataset, presenting frequency counts, proportions, and frequency or proportion bar plots for every question asked in the survey. Read more

Recommended citation: R. Stuart Geiger. (2017). "Summary Analysis of the 2017 GitHub Open Source Survey." _SocArXiv Preprints._ doi: 10.17605/OSF.IO/ENRQ5

Bot-based collective blocklists in Twitter: the counterpublic moderation of harassment in a networked public space

Published in Information, Communication, and Society, 2016

This article introduces and discusses bot-based collective blocklists (or blockbots) in Twitter, which have been developed by volunteers to combat harassment in the social networking site in a more decentralized and counterpublic way than actions taken by Twitter, Inc. staff. I discuss how such forms of automation require that communities encode specific understandings of what harassment is and how to identify it, relating these cases to several longstanding issues around the governance and moderation of the public sphere. Read more

Recommended citation: Geiger, R. Stuart. (2016). “Bot-based collective blocklists in Twitter: the counterpublic moderation of harassment in a networked public space.” Information, Communication, and Society 19(6). http://stuartgeiger.com/blockbots-ics.pdf

Defining, Designing, and Evaluating Civic Values in Human Computation and Collective Action Systems

Published in Proceedings of HCOMP, Citizen-X Workshop, 2014

We review various crowdsourcing and collective action systems, identifying particular sets of civic values and assumptions. Read more

Recommended citation: Matias, N. and Geiger, R.S. “Defining, Designing, and Evaluating Civic Values in Human Computation and Collective Action Systems.” In Proceedings of HCOMP 2014, Citizen-X Workshop. http://stuartgeiger.com/defining-civic-values-hcomp-matias-geiger.pdf.

Old Against New, or a Coming of Age? Broadcasting in an Era of Electronic Media.

Published in Journal of Broadcasting and Electronic Media, 2014

On the history and continued relevance of the term "broadcasting" in an era of social media. Read more

Recommended citation: Geiger, R. Stuart and Lampinen, Airi. (2014). “Old Against New, or a Coming of Age? Broadcasting in an Era of Electronic Media.” Journal of Broadcasting and Electronic Media 58(3). http://www.stuartgeiger.com/jobem.pdf

Snuggle: Designing for efficient socialization and ideological critique

Published in Proceedings of CHI, 2014

This paper discusses the Snuggle project, built to support newcomer socialization and reflexive critique of Wikipedia's existing socialization processes. Read more

Recommended citation: Halfaker, Aaron., Geiger, R. Stuart., and Treveen, Loren. (2014). “Snuggle: Designing for Efficient Socialization and Ideological Critique.” In Proceedings of the 2014 ACM Conference on Human Factors in Computing (CHI 2014). http://www-users.cs.umn.edu/~halfak/publications/Snuggle/halfaker14snuggle-personal.pdf

Bots, bespoke code, and the materiality of software platforms

Published in Information, Communication, and Society, 2014

This article introduces and discusses the role of bespoke code in Wikipedia, which is code that runs alongside a platform or system, rather than being integrated into server-side codebases. Read more

Recommended citation: Geiger, R. Stuart. (2014). “Bots, Bespoke Code, and the Materiality of Software Platforms.” Information, Communication, and Society 17. http://stuartgeiger.com/bespoke-code-ics.pdf

The Next Generation of Scientists: Examining the Experiences of Graduate Students in Network-Level Social-Ecological Science

Published in Ecology and Society, 2013

We examined how graduate students experienced and social-ecological research initiative within the large-scale, geographically distributed Long Term Ecological Research (LTER) Network. Read more

Recommended citation: Romolini, Michele., Sydne Record, Rebecca. Garvoille, Y. Marusenko, and R. Stuart Geiger. (2013) “The Next Generation of Scientists: Examining the Experiences of Graduate Students in Network-Level Science.” In Ecology and Society 18(3). http://stuartgeiger.com/lter-network-level-science-es.pdf

When the Levee Breaks: Without Bots, What Happens to Wikipedia’s Quality Control Processes?

Published in Proceedings of WikiSym, 2013

This paper examines what happened when one of Wikipedia's counter-vandalism bots unexpectedly went offline. Read more

Recommended citation: Geiger, R. Stuart and Halfaker, Aaron. (2013). “When the Levee Breaks: Without Bots, What Happens to Wikipedia’s Quality Control Processes?” In Proceedings of the 9th International Symposium on Wikis and Open Collaboration (WikiSym 2013). http://stuartgeiger.com/wikisym13-cluebot.pdf

The Rise and Decline of an Open Collaboration Community: How Wikipedia’s reaction to sudden popularity is causing its decline

Published in American Behavioral Scientist, 2013

A mixed-method, multi-study analysis of editor retention, socialization, gatekeeping, and governance in Wikipedia. Read more

Recommended citation: Halfaker, Aaron., R. Stuart Geiger, Jonathan Morgan, and John Riedl. (2013). “The Rise and Decline of an Open Collaboration System: How Wikipedia’s reaction to sudden popularity is killing it.” American Behavioral Scientist 57(5). http://dx.doi.org/10.1177/0002764212469365

Using Edit Sessions to Measure Participation in Wikipedia

Published in Proceedings of CSCW, 2013

This paper establishes a quantitative metric for measuring editor activity through temporal edit sessions. Read more

Recommended citation: Geiger, R. Stuart and Halfaker, Aaron. (2013). “Using Edit Sessions to Measure Participation in Wikipedia.” In Proceedings of the 2013 ACM Conference on Computer Supported Cooperative Work (CSCW 2013). http://www.stuartgeiger.com/cscw-sessions.pdf

Artifacts that Organize: Delegation in the Distributed Organization

Published in Information and Organization, 2012

This paper studies the role of computational infrastructure and organizational structure in the Open Science Grid. Read more

Recommended citation: Ribes, David, Steve Jackson, R. Stuart Geiger, Matt C. Burton, and Tom Finholt (2012). “Artifacts that organize: Delegation in the distributed organization.” Information and Organization 23:1–14. http://www.stuartgeiger.com/artifacts-that-organize.pdf

“Writing up rather than writing down”: Becoming Wikipedia Literate

Published in Proceedings of WikiSym, 2012

We introduce and advocate a multi-faceted theory of literacy to investigate the knowledges and organizational forms are required to improve participation in Wikipedia’s communities. Read more

Recommended citation: Ford, Heather and R. Stuart Geiger. (2012). “”Writing up rather than writing down”: Becoming Wikipedia Literate.” In Proceedings of the 8th International Symposium on Wikis and Open Collaboration (WikiSym 2012). New York: ACM Digital Library. http://www.stuartgeiger.com/becoming-wikipedia-literate.pdf

Defense Mechanism or Socialization Tactic? Improving Wikipedia’s Notifications to Rejected Contributors

Published in Proceedings of ICWSM, 2012

A descriptive study of Wikipedia's highly-automated socialization processes and an A/B test to improve templated messages to newcomers. Read more

Recommended citation: Geiger, R. Stuart, Aaron Halfaker, Maryana Pinchuk, and Steven Walling (2012). “Defense Mechanism or Socialization Tactic? Improving Wikipedia’s Notifications to Rejected Contributors.” In Proceedings of the 2012 International Conference on Weblogs and Social Media (ICWSM 2012). http://stuartgeiger.com/defense-mechanism-icwsm.pdf

Black-boxing the user: internet protocol over xylophone players (IPoXP)

Published in Proceedings of CHI (alt.CHI), 2012

We introduce IP over Xylophone Players (IPoXP), a novel Internet protocol between two computers using xylophone-based Arduino interfaces Read more

Recommended citation: Geiger, R. Stuart, Yoon J. Jeong, and Emily Manders (2012). “Black-Boxing the User: Internet Protocol over Xylophone Players.” In Proceedings of the 2012 ACM Conference on Human-Computer Interaction (alt.CHI 2012). New York: ACM Digital Library. http://stuartgeiger.com/ipoxp.pdf

The Lives of Bots

Published in Wikipedia: A Critical Point of View, 2011

I describe the complex social and technical environment in which bots exist in Wikipedia, emphasizing not only how bots produce order and enforce rules, but also how humans produce bots and negotiate rules around their operation. Read more

Recommended citation: Geiger, R. Stuart. (2011). “The Lives of Bots.” In G. Lovink and N. Tkacz (eds.) In Wikipedia: A Critical Point of View. Amsterdam: Institute of Network Cultures. http://www.stuartgeiger.com/lives-of-bots-wikipedia-cpov.pdf

Participation in Wikipedia’s Article Deletion Processes

Published in Proceedings of WikiSym, 2011

This paper investigates Wikipedia's article deletion processes, finding that it is heavily populated by specialists. Read more

Recommended citation: Geiger, R. Stuart and Heather Ford. (2011) “Participation in Wikipedia’s Deletion Processes.” In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym 2011). New York: ACM Digital Library. http://www.stuartgeiger.com/papers/article-deletion-wikisym-geiger-ford.pdf

Trace Ethnography: Following Coordination through Documentary Practices

Published in Proceedings of HICSS , 2011

We detail the methodology of ‘trace ethnography’, which combines the richness of participant-observation with the wealth of data in logs so as to reconstruct patterns and practices of users in distributed sociotechnical systems Read more

Recommended citation: Geiger, R. Stuart and David Ribes (2011). “Trace Ethnography: Following Coordination through Documentary Practices.” In Proceedings of the 44th Annual Hawaii International Conference on System Sciences (HICSS). http://www.stuartgeiger.com/trace-ethnography-hicss-geiger-ribes.pdf

The Work of Sustaining Order in Wikipedia: The Banning of a Vandal

Published in Proceedings of CSCW , 2010

This paper traces out a heterogeneous network of humans and non-humans involved in the identification and banning of a single vandal in Wikipedia. Read more

Recommended citation: Geiger, R. Stuart and David Ribes (2010). “The Work of Sustaining Order in Wikipedia: The Banning of a Vandal.” In Proceedings of the 2010 ACM Conference on Computer-Supported Cooperative Work (CSCW 2012). New York: ACM Digital Library. http://www.stuartgeiger.com/papers/cscw-sustaining-order-wikipedia.pdf

The Social Roles of Bots and Assisted Editing Tools

Published in Proceedings of Wikisym, 2009

A short paper showing the recent explosive growth of automated editors (or bots) in Wikipedia, which have taken on many new tasks in administrative spaces. Read more

Recommended citation: Geiger, R. Stuart (2009). “The Social Roles of Bots and Assisted Editing Tools.” In Proceedings of the 5th International Symposium on Wikis and Open Collaboration. New York: ACM Digital Library. http://www.stuartgeiger.com/papers/geiger-wikisym-bots.pdf

Does Habermas Understand the Internet? The Algorithmic Construction of the Blogo/Public Sphere

Published in Gnovis, 2009

Habermasians have been debating about the role of the Internet in the public sphere, but they have all taken for granted the highly-automated software infrastructures that mediate our knowledge of the blogosphere. Read more

Recommended citation: Geiger, R. Stuart (2009). “Does Habermas Understand the Internet? The Algorithmic Construction of the Blogo/Public Sphere.” Gnovis: A Journal of Communication, Culture, and Technology. 10(1). http://www.stuartgeiger.com/papers/gnovis-habermas-blogopublic-sphere.pdf