For the twenty-first class, we will read Diversity-Driven Automated Formal Verification [PDF] [DOI] by Emily First and Yuriy Brun.

This paper takes a look at tactic prediction models---models that predict the next tactic in your proof---in the Coq proof assistant. Traditionally in this space, researchers use benchmarks to measure the success of these proof models, with higher scores (more proofs proven) corresponding to better results. But this paper shows that different models don't just prove more or fewer proofs---they prove different proofs. So, by bringing diverse models together, you get better performance than with any single model---by a lot!

Machine learning tools for proofs are surging in popularity these days. Next week, you'll see talks from researchers at both Google and OpenAI about other machine learning tools for proofs!

You can find a reading guide and the discussion questions for this class below.

Please note that this is due the morning before class, so that we can discuss it in person. And please install the artifact before Thursday's class.


Reading Guide

This paper is (in my opinion) very well written, with a lot of context, so I think it's worth reading in full. But I would not worry too much if you do not know the machine learning terminology (especially the names of machine learning architectures like transformers or the TreeLSTM---these refer to some choices in how different machine learning tools for proofs encode proof data so that they can learn from it).

For this paper, if you encounter new terminology you've never seen before, I advise that you write it down, and either note it in the forum post or ask about it in class. But it should then be OK to move on. Since this paper is at the intersection of two topics for which few people have experience both of at the same time, it's likely some of the terminology is new, and that's OK.

The authors of the paper also sent me a video they thought would be helpful. I didn't find it very helpful :).

Discussion Questions

Discussion Questions

Please write a short paragraph as a comment in the class forum thread about any one of the following discussion questions (you need to respond to just one question of your choice to get credit, though you're free to write more if interested):

  1. Based on these results, how would you recommend that people evaluate future changes to machine learning models for predicting tactics?
  2. How might you go about designing and building a machine learning tool for proofs that takes these results into account?
  3. Are there any other dimensions of the data, parameters, or anything else for which you think varying those dimensions could make a difference in which proofs the model can solve? What experiment might you design to test that hypothesis?
  4. Do you think these results would hold true in a domain for which a lot more data is available to begin with? Can you think of any circumstances under which these results may not hold? Or do you think that strength in diversity of machine learning models is fundamental? Why or why not? What experiment might you design to test that hypothesis?
  5. Please list any new terminology you encountered during your reading that you did not recognize.
  6. Write a paragraph responding to someone else's comment in the discussion forum.
  7. Write a paragraph about any other single interesting thought, question, or idea that you had in response to the reading.

You may want to look at other people's responses before class in order to shape the discussion. And again remember that you need to answer just one question to get credit, though you are free to answer more if interested.

<< Previous | Class Schedule | Next >>