CSCW paper lengths, 2000-2018

Supplemental interactive graphs and data tables for The Rise and Fall of the Note: Changing Paper Lengths in ACM CSCW, 2000-2018

Now with bonus 2019 data!

By R. Stuart Geiger, UC-Berkeley Institute for Data Science

This is a supplement to a paper to appear in Proceedings of the ACM on Human-Computer Interaction (PACMHCI) as part of the ACM's 2019 conference on Computer-Supported Cooperative Work (CSCW)

You can see more data, code, and documentation on this paper's GitHub repo, but this page is for graphs and figures that require fancy javascript magic to render, hence the separate site.

Data

Interactive data table (note: takes some time to load & search)

Data in CSV format

Data dictionary / variable descriptions

Interactive figures

Note: Hover over a data point to show more info, although there is no way to make a label stick. These figures are dynamically resizable based on your browser window. You can interact with the graph to zoom and pan by drawing boxes and dragging the graph, as well as through the buttons at the top-right.

Box+swarmplots by year

Note: all plots have a combined boxplot (showing 95th, 75th, 50th, 25th, and 5th percentiles) and swarmplot (plotting each individual paper as a point) by publication year.

Main body length (no references/appendices) of all papers in words (nopunct method).

This is the main graph that shows the rise and fall of the note. This is the "nopunct" method, which replaces all punctuation with spaces before doing word counts. This was not in the original paper, but it better controls for big tables of numbers.

Main body length of non-notes (>4 pages) in words (nopunct method).

This is the same graph as above, but removes notes. This way, the boxplots better show the rising growth of non-note papers.

Length of papers in pages.

This shows the raw number of pages in each PDF per year. This method is the easiest, but it is not able to separate out the main body from references and appendices, nor does it account to the shift in words per page from the double-column to single-column format (see below).

Words per page (nopunct method).

This clearly shows how the shift from the two-column to one-column format resulted in substantially fewer words per page.

Number of references (approximate).

This is an attempt to count the number of references cited, which is generally accurate to within 1-3 references. It shows the substantial growth of references over time.

Main body length (no references/appendices) of all papers in words (punct method).

This is the same plot as the first one, but with the original "punct" method presented in the paper, which does not replace punctuation with spaces before doing a word count. Among other issues, this method results in papers with tables having substantially higher word counts.

Scatterplots

Number of references by main body length in words (nopunct).

In other words, how proportional is the number of references to the length of the main section? There is a cluster of notes that is clearly visible in this plot.

Words per page (nopunct) by main body length in words (nopunct).
A few interesting patterns can be seen here: the 2017.5/2018 single-column PACMHCI papers are a visible cluster; classic notes are a long line at the bottom; classic 10-pagers are a parallel line in the middle; 2013-2017 papers >10 pages fan out above this line.

Software used

In addition to the many software tools described in the paper, these web visualizations are made possible using the specific libraries (and their extensive dependencies):