For the twenty-fifth class, we will read REPLICA: REPL Instrumentation for Coq Analysis [PDF] [DOI] by Talia Ringer, Alex Sanchez-Stern, Dan Grossman, and Sorin Lerner, as well as the short paper Towards an Incremental Dataset of Proofs [PDF] by Hanneli Tavante.
The REPLICA paper describes the collection and manual analysis of a dataset of atomic (incremental) edits to programs, specifications, and proofs in Coq. This was really the first user study of its kind in the proof assistant community, and the first public collection of such a dataset.
The dataset is too small to be used for training many machine learning models popular these days. But the main point was to identify patterns in proof development, and use those to learn how to build tools that better help proof engineers. The patterns identified did inspire both prior and ongoing work, and provided lessons to help with future user studies.
With a large number of external collaborators through EleutherAI, I am in the process of setting up a much larger version (thousands of participants) of a study of this flavor involving regular programmers, too. That project is open to everyone---the more help, the better. Let me know if you're interested in helping!
Again though, since this is my work and the power dynamic is weird, it's again totally cool if you don't enjoy this at all. I won't take it personally.
The second paper is a short follow-up by Hanneli Tavante about plans for another user study of this flavor, this time one I am not involved in myself. I am super excited to see what comes of it, and wanted to include it as an additional discussion piece.
Please note that this is due the morning before class, so that we can discuss it in person. And please install the artifact before Thursday's class.