For the seventeeth class, we will read Congruence Closure in Intensional Type Theory [PDF] [DOI] by Daniel Selsam and Leonardo de Moura. As a short bonus reading, we'll also read the two-pager Congruence Closure in Cubical Type Theory [PDF] by Emil Holm Gjørup and Bas Spitters.

If you were paying way too much attention to the reading before spring break, you may have seen me allude to these things called e-graphs, and cite these papers. Neither of these papers say the word e-graph, but that's what they're about.

E-graphs are these super wonderful data structures for managing equality and equivalence information. They are traditionally used inside of constraint solvers and automated theorem provers to implement congruence closure, an efficient algorithm for equational reasoning---but they can also be used to implement rewrite systems.

Probably the best explanation of e-graphs I've seen comes from the documentation for an e-graph library called egg. What the first paper we'll read this week does that is so cool is implement congruence closure with these data structures in a dependent type theory, since traditional solvers and automated theorem provers don't handle dependent types. So basically this paper takes the central insights at the core of many solvers and automated theorem provers, and lifts it into a dependently typed proof assistant. It's quite incredible.

The result is a really powerful tactic for solving proof goals that follow from series of equations. But as I hinted at in the proof repair paper from before spring break, these have the potential to help with other kinds of automation too---like a much better workflow for proof repair across type equivalences when managing a large project with many different datatypes that change! So one thing I'd love to do is build e-graphs into PUMPKIN Pi---let me know if you're interested!

Oh, but there's just one problem. Remember when I said early on that equality is one of the hardest problems in type theory? Well, equality in dependent type theory is hard and controversial, and the first paper we'll read makes a particular assumption about equality. This assumption is not one that, say, Coq makes by default, and it's actually inconsistent with Cubical Agda (and homotopy type theory more generally).

Enter the second paper---a short two-pager on e-graphs in Cubical Agda. This uses a different notion of equality that comes from cubical type theory (an implementation of homotopy type theory). But it's a short paper and, from what I gather, not yet implemented efficiently. Still, these two papers together offer a balanced view of the design space so far in implementing e-graphs for proof assistants based on dependent type theory.

You can find a reading guide and the discussion questions below.

Please note that this is due the morning before class, so that we can discuss it in person. And please install the artifact before Thursday's class.


Reading Guide

The first paper is one of my absolute favorites, but it's also extremely heavy. Understanding all of the details requires expertise in both dependent type theory and e-graphs---very few people have both!

If you are not familiar with e-graphs at all, I think skimming the egg documentation I linked to earlier is a good start before diving into this paper. Then, when they say "union-find data structure" in this paper, you can mentally substitute "e-graph."

With that in mind, most of this paper deals with the challenges of dependent types as they interact with equality and, by extension, e-graphs. If you (like most people) do not have a lot of background in dependent type theory, it is very normal to get lost when reading some of the details (honestly, I have a lot of background, and I still often get lost). But I would still spend a lot of time on Sections 2 and 3 to understand the challenges that dependent types pose, and what this paper does at a high level to solve them.

Section 4 gets into a lot of the implementation details. These are practically important, but a lot of them are specific to the particular notion of equality chosen by Lean. So I would not get too bogged down by these, unless you find it particularly interesting. Do read it, but don't worry too much about understanding the entire thing. Section 5 backs up nicely to show what it's like in action, and we'll hopefully get a chance to dive in when we look at the artifact, too.

The second paper---the two-pager on congruence closure in Cubical Agda---is a nice quick follow-up. I think the coolest thing about it is how simple the definition of congruence is using the notion of equality from cubical type theory---path equality. In a sense, to me it feels like path equality is the natural generalization of congruence to dependent types. But, from conversations with Leo de Moura, I've also heard that implementing this efficiently with path equality will be hard, and that the notion of equality that Lean chose really helps with efficiency. Tradeoffs abound!

In any case (gosh sorry, I get too excited about this topic and can't stop talking), for the second paper, it's good to just get the high-level gist of what using a different notion of equality buys you---but it's cool if you don't totally get that notion of equality.

Still, if you're super curious about cubical type theory and want to learn more, I strongly recommend these lecture notes by Anders Mörtberg.

Discussion Questions

Discussion Questions

Please write a short paragraph as a comment in the class forum thread about any one of the following discussion questions (you need to respond to just one question of your choice to get credit, though you're free to write more if interested):

  1. These two papers together show one situation where the foundations of your proof assistant influence the kind of automation that is possible, and how to implement that automation. Can you think of other kinds of automation strongly influenced by foundational choices like the definition of equality? What and why? What design decisions would you make in implementing a proof assistant to make those kinds of automation easier?
  2. E-graphs in Cubical Agda currently struggle with efficiency to the point of (last I checked) not yet being usable for many practical problems. If tasked with making e-graphs efficient in Cubical Agda, how would you approach the problem? What would you do?
  3. The two notions of equality chosen in these papers are inconsistent with one another, but each of them is consistent with Coq, which makes no commitment to either. Still, by not committing to either, Coq loses both of these implementations of congruence for dependent types---so it still lacks an implementation. Is there any way to implement something like this for Coq, despite Coq lacking the necessary axioms by default? How? What about other proof assistants you've heard of so far?
  4. Machine learning tools sometimes struggle with data equivalent but not identical to those seen in training data. For example, Alpha Go once lost a game because of this. Reasoning about equivalent datatypes and proof states is perhaps even more important to the generalizability of a machine learning tool for proofs. Do you think encoding e-graphs in a machine learning tool for proofs would solve this problem? Why or why not? How would you encode them, if so? And do you think they could help with relations more general than equivalences, too?
  5. Are there any other kinds of proof automation where you think e-graphs may be useful? What kinds, and how?
  6. Write a paragraph responding to someone else's comment in the discussion forum.
  7. Write a paragraph about any other single interesting thought, question, or idea that you had in response to the reading.

You may want to look at other people's responses before class in order to shape the discussion. And again remember that you need to answer just one question to get credit, though you are free to answer more if interested.

<< Previous | Class Schedule | Next >>