Class 25: User Experience

Reading

For the twenty-fifth class, we will read REPLICA: REPL Instrumentation for Coq Analysis [PDF] [DOI] by Talia Ringer, Alex Sanchez-Stern, Dan Grossman, and Sorin Lerner, as well as the short paper Towards an Incremental Dataset of Proofs [PDF] by Hanneli Tavante.

The REPLICA paper describes the collection and manual analysis of a dataset of atomic (incremental) edits to programs, specifications, and proofs in Coq. This was really the first user study of its kind in the proof assistant community, and the first public collection of such a dataset.

The dataset is too small to be used for training many machine learning models popular these days. But the main point was to identify patterns in proof development, and use those to learn how to build tools that better help proof engineers. The patterns identified did inspire both prior and ongoing work, and provided lessons to help with future user studies.

With a large number of external collaborators through EleutherAI, I am in the process of setting up a much larger version (thousands of participants) of a study of this flavor involving regular programmers, too. That project is open to everyone---the more help, the better. Let me know if you're interested in helping!

Again though, since this is my work and the power dynamic is weird, it's again totally cool if you don't enjoy this at all. I won't take it personally.

The second paper is a short follow-up by Hanneli Tavante about plans for another user study of this flavor, this time one I am not involved in myself. I am super excited to see what comes of it, and wanted to include it as an additional discussion piece.

You can find a reading guide and the discussion questions below.

Please note that this is due the morning before class, so that we can discuss it in person. And please install the artifact before Thursday's class.

Guide

Reading Guide

After the introduction, the REPLICA paper dives into the infrastructure for the instrumentation to collect the atomic edit data, and the deployment to a group of proof engineers to collect that data. The details of this are not super important, though you may find it interesting if you want to do something similar yourself. (I also find it some combination of comforting and distressing that, under the hood, even Coq is just a state machine and a REPL.)

The real meat for proof automation is in Chapters 4 and 5. The takeaways, benchmarks, and examples may be interesting to you, since they directly inform how to develop better tools (like automation) for proof engineers.

Chapter 6 was a deliberate attempt to capture what was challenging about running this user study, as well as what went wrong in the process, and then discuss how the proof engineering community can do better in the future. This was important since we really had no precedent for running this study. I think it is a very good read, especially as large datasets become more and more relevant with machine learning tools growing in importance in programming languages and formal methods.

The short paper by Hanneli is good to read in full, perhaps with the lessons from the REPLICA paper in mind.

Discussion Questions

Please write a short paragraph as a comment in the class forum thread about any one of the following discussion questions (you need to respond to just one question of your choice to get credit, though you're free to write more if interested):

Is there anything from the REPLICA paper that you found especially surprising? Why or why not?
Imagine that you would like to scale a user study like this up to reach many more participants. How would you reach a large number of participants?
What do you think is the most important takeaway from the REPLICA paper, and why?
One challenge to classifying changes in REPLICA was making sense of not just syntactic relations, like adding or removing hypotheses, but also making sense of semantic relations like equivalences or refinements. If tasked with designing a classification of changes, what categories would you consider that the REPLICA paper did not consider?
The analysis of the atomic edit data in Chapter 5 of REPLICA was extraordinarily manual, and was extremely time consuming. How would you automate this kind of analysis?
How well do you think the short paper does at addressing some of the limitations from REPLICA?
Write a paragraph responding to someone else's comment in the discussion forum.
Write a paragraph about any other single interesting thought, question, or idea that you had in response to the reading.

You may want to look at other people's responses before class in order to shape the discussion. And again remember that you need to answer just one question to get credit, though you are free to answer more if interested.

<< Previous | Class Schedule | Next >>