This class, we will read Three Years of Experience with Sledgehammer, a Practical Link between Automatic and Interactive Theorem Provers [PDF] [DOI] by Lawrence C. Paulson and Jasmin Christian Blanchette.

Sledgehammer works by flattening goals in Isabelle/HOL's logic (a classical higher-order logic called HOL) into first-order logic, then invoking first-order procedures to dispatch those goals. Sledgehammer also reconstructs lower-level proofs in the end, so that the resulting proof does not always rely on calls to Sledgehammer. (I am not sure to what degree proof reconstruction is used in practice, relative to keeping calls to Sledgehammer in the code.)

Probably the coolest thing about Sledgehammer is that people actually use it. That's impressive, because often one of the hardest problems of building powerful tools and automation is getting people to use it---that is, adoption. This paper focuses on the design decisions that contributed to adoption, at least in the authors' views.

Sledgehammer has seen a number of implementations in proof assistants other than Isabelle/HOL, including in Coq. But Sledgehammer remains immensely popular, while some of these other tools have not really taken off. The reasons for this are hard to pin down---and worthy of discussion.

You can find a reading guide and the discussion questions below.

Please note that this is due the morning before class, so that we can discuss it in person. And please install the artifact before Thursday's class.


Reading Guide

This is a less heavy read than the reading from the previous week, so it is good to read it in its entirety. Section 2 describes a lot of the design decisions that went into Sledgehammer, and makes a really good read. Section 3 describes a short evaluation. And Section 4 discusses educational uses.

Reading this from start to finish is probably fine, but as always, if you struggle or get confused, it's OK to take note of that and skip the rest of a section or otherwise move on.

Discussion Questions

Discussion Questions

Please write a short paragraph as a comment in the class forum thread about any one of the following discussion questions (you need to respond to just one question of your choice to get credit, though you're free to write more if interested):

  1. Sometimes, powerful proof automation really struggles with adoption, but Sledgehammer has seen and continues to see extensive use by real proof engineers. After reading this paper, what do you think is primarily behind the success of Sledgehammer? And what do you think we could learn from this in designing, implementing, and deploying other proof automation?
  2. Can you think of any other important design decisions not discussed in the paper that may contribute to adoption of powerful proof automation like Sledgehammer?
  3. Last week, we discussed e-graphs, which can be used to implement congruence closure and automate a lot of equational reasoning. Do you think hammers will become obsolete if we have powerful equational reasoning directly in the logic that forms the basis for our proof assistant? Or do you think these two kinds of automation are and will continue to be complementary? Why or why not?
  4. Based on this paper, can you think of any ideas to improve Sledgehammer? Consider in particular leveraging advances in computer science from the last decade since this was written.
  5. Suppose a new proof of a previously unproven conjecture is solved using automation like Sledgehammer. What degree of proof reconstruction would you need to feel satisfied with the proof? Would the calls to Metis seen in Section 2.3.1 suffice for you? Why or why not? What do you think you could do to improve proof reconstruction?
  6. How would you debug a proof that failed to go through using heavy automation like Sledgehammer? And how would you build better tools for debugging to make this process easier?
  7. The paper mentions the potential of using machine learning tools to identify harmful lemmas. Can you think of any other ideas for integrating hammers with machine learning tools for proof synthesis to improve tools of either kind? What about integrating hammers with other kinds of automation you've seen so far?
  8. Do you think the use of heavy automation like Sledgehammer in educational contexts is more helpful than harmful, or more harmful than helpful? Why or why not?
  9. Write a paragraph responding to someone else's comment in the discussion forum.
  10. Write a paragraph about any other single interesting thought, question, or idea that you had in response to the reading.

You may want to look at other people's responses before class in order to shape the discussion. And again remember that you need to answer just one question to get credit, though you are free to answer more if interested.

<< Previous | Class Schedule | Next >>