Now with bonus 2019 data!
This is a supplement to a paper to appear in Proceedings of the ACM on Human-Computer Interaction (PACMHCI) as part of the ACM's 2019 conference on Computer-Supported Cooperative Work (CSCW)
You can see more data, code, and documentation on this paper's GitHub repo, but this page is for graphs and figures that require fancy javascript magic to render, hence the separate site.
Note: Hover over a data point to show more info, although there is no way to make a label stick. These figures are dynamically resizable based on your browser window. You can interact with the graph to zoom and pan by drawing boxes and dragging the graph, as well as through the buttons at the top-right.
Note: all plots have a combined boxplot (showing 95th, 75th, 50th, 25th, and 5th percentiles) and swarmplot (plotting each individual paper as a point) by publication year.
This is the main graph that shows the rise and fall of the note. This is the "nopunct" method, which replaces all punctuation with spaces before doing word counts. This was not in the original paper, but it better controls for big tables of numbers.
This is the same graph as above, but removes notes. This way, the boxplots better show the rising growth of non-note papers.
This shows the raw number of pages in each PDF per year. This method is the easiest, but it is not able to separate out the main body from references and appendices, nor does it account to the shift in words per page from the double-column to single-column format (see below).
This clearly shows how the shift from the two-column to one-column format resulted in substantially fewer words per page.
This is an attempt to count the number of references cited, which is generally accurate to within 1-3 references. It shows the substantial growth of references over time.
This is the same plot as the first one, but with the original "punct" method presented in the paper, which does not replace punctuation with spaces before doing a word count. Among other issues, this method results in papers with tables having substantially higher word counts.
In other words, how proportional is the number of references to the length of the main section? There is a cluster of notes that is clearly visible in this plot.
A few interesting patterns can be seen here: the 2017.5/2018 single-column PACMHCI papers are a visible cluster; classic notes are a long line at the bottom; classic 10-pagers are a parallel line in the middle; 2013-2017 papers >10 pages fan out above this line.
In addition to the many software tools described in the paper, these web visualizations are made possible using the specific libraries (and their extensive dependencies):