Reports from the BIDS Best Practices in Data Science Series

Published:

This series is a set of reflections and write-ups from meetings we regularly hold at the Berkeley Institute for Data Science, where we bring a wide range of people from across the UC-Berkeley campus and beyond together to discuss how to do something challenging in data science well – or at least better.

Challenges of Doing Data-Intensive Research in Teams, Labs, and Groups

Download PDF here.

Abstract: What are the challenges and best practices for doing data-intensive research in teams, labs, and other groups? This paper reports from a discussion in which researchers from many different disciplines and departments shared their experiences on doing data science in their domains. The issues we discuss range from the technical to the social, including issues with getting on the same computational stack, workflow and pipeline management, handoffs, composing a well-balanced team, dealing with fluid membership, fostering coordination and communication, and not abandoning best practices when deadlines loom. We conclude by reflecting about the extent to which there are universal best practices for all teams, as well as how these kinds of informal discussions around the challenges of doing research can help combat impostor syndrome.

Recommended citation: R. Stuart Geiger, Dan Sholler, Aaron Culich, Ciera Martinez, Fernando Hoces de la Guardia, François Lanusse, Kellie Ottoboni, Marla Stuart, Maryam Vareth, Nelle Varoquaux, Sara Stoudt, and Stéfan van der Walt. “Challenges of Doing Data-Intensive Research in Teams, Labs, and Groups: Report from the BIDS Best Practices in Data Science Series.” BIDS Best Practices in Data Science Series. Berkeley Institute for Data Science: Berkeley, California. 2018. doi:10.31235/osf.io/a7b3m

Best Practices for Fostering Diversity and Inclusion in Data Science

Download PDF here.

Abstract: What actions can we take to foster diverse and inclusive workplaces in the broad fields around data science? This paper reports from a discussion in which researchers from many different disciplines and departments raised questions and shared their experiences with various aspects around diversity, inclusion, and equity. The issues we discuss include fostering inclusive interpersonal and small group dynamics, rules and codes of conduct, increasing diversity in less-representative groups and disciplines, organizing events for diversity and inclusion, and long-term efforts to champion change.

Recommended citation: R. Stuart Geiger, Orianna DeMasi, Aaron Culich, Andreas Zoglauer, Diya Das, Fernando Hoces de la Guardia, Kellie Ottoboni, Marsha Fenner, Nelle Varoquaux, Rebecca Barter, Richard Barnes, Sara Stoudt, Stacey Dorton, Stéfan van der Walt. “Best Practices for Fostering Diversity and Inclusion in Data Science: Report from the BIDS Best Practices in Data Science Series.” BIDS Best Practices in Data Science Series. Berkeley, CA: Berkeley Institute for Data Science. 2019. doi:10.31235/osf.io/8gsjz

Best Practices for Managing Turnover in Data Science Groups, Teams, and Labs

Download PDF here.

Abstract: Turnover is a fact of life for any project, and academic research teams can face particularly high levels of people who come and go through the duration of a project. In this article, we discuss the challenges of turnover and some potential practices for helping manage it, particularly for computational- and data-intensive research teams and projects. The topics we discuss include establishing and implementing data management plans, file and format standardization, workflow and process documentation, clear team roles, and check-in and check-out procedures.

Recommended citation: Recommended citation: Dan Sholler, Diya Das, Fernando Hoces de la Guardia, Chris Hoffmann, François Lanusse, Nelle Varoquaux, Rolando Garcia, R. Stuart Geiger, Shana McDevitt, Scott Peterson, Sara Stoudt. “Best Practices for Managing Turnover in Data Science Groups, Teams, and Labs.” BIDS Best Practices in Data Science Series. Berkeley, CA: Berkeley Institute for Data Science. 2019. doi:10.31235/osf.io/wsxru

Resistance to Adoption of Best Practices

Download PDF here.

Abstract: There are many recommendations of “best practices” for those doing data science, data-intensive research, and research in general. These documents usually present a particular vision of how people should work with data and computing, recommending specific tools, activities, mechanisms, and sensibilities. However, implementation of best (or better) practices in any setting is often met with resistance from individuals and groups, who perceive some drawbacks to the proposed changes to everyday practice. We offer some definitions of resistance, identify the sources of researchers’ hesitancy to adopt new ways of working, and describe some of the ways resistance is manifested in data science teams. We then offer strategies for overcoming resistance based on our group members’ experiences working alongside resistors or resisting change themselves. Our discussion concluded with many remaining questions left to tackle, some of which are listed at the end of this piece.

Recommended citation: Recommended citation: Dan Sholler, Sara Stoudt, Chris Kennedy, Fernando Hoces de la Guardia, François Lanusse, Karthik Ram, Kellie Ottoboni, Marla Stuart, Maryam Vareth, Nelle Varoquaux, Rebecca Barter, R. Stuart Geiger, Scott Peterson, and Stéfan van der Walt. “Resistance to Adoption of Best Practices.” BIDS Best Practices in Data Science Series. Berkeley Institute for Data Science: Berkeley, California. 2019. doi:10.31235/osf.io/qr8cz