Getting Started

For the twenty-second class, we are going to look at the DIVA artifact. ICSE has a great artifact culture, so the artifact is all packaged up and ready to go for us in a VM. We will rerun parts of the evaluation of DIVA, and peek into the output to see what the proofs it produces looks like, and how it fails when it fails. We'll also discuss a bit about what reproducibility really means in the context of machine learning tools (we would need fancy hardware to retrain the model, for example, and determinism seems to be near impossible), and how tactic prediction models could be integrated into proof engineering workflows.

Installation Instructions

  1. Install VirtualBox if you do not already have it.
  2. Download artifact_documentation.zip and icse2022.ova from the official artifact package.
  3. Unzip artifact_documentation.zip and open README.md from the unzipped directory.
  4. Follow the instructions from README.md to install the VM and run DIVA on a single project.

If you have any trouble, please post your questions in this thread. Please also drop any installation tips you have in that thread.

If installation fails, and the class forums are no help, do not panic; we will try to match you with someone in class for whom installation worked.

You can find the exercise we will do together in class below.

Let's Hack!

Let's Hack!

During class, after dividing into groups, open up the VM for the artifact. Then, as a group, run the model on a number of different projects (see the README.md instructions that you downloaded when you installed the VM). Look at the outputs for particular proofs and projects, and discuss the following questions with your group:

  1. What proofs does the model seem to do well on? Do they have anything in common?
  2. What proofs does the model seem to struggle with? Do they have anything in common?
  3. Does the model seem to have any common modes of failure (for example, running the same tactic over and over again)?
  4. What would it mean to reproduce the results from this paper fully, in an ideal world? What resources might it require access to?
  5. Can you think of any ideas to address any limitations you encounter, like common failure modes or reproducibility limitations?

These will be your discussion questions at the end. As usual, they will be the only graded part. A partial answer is fine, so don't worry if you struggle to answer some of these questions. When you're done, please have your group post your answers in this thread, listing all group members in your response.

You can do this on your own if you cannot make it to class, or enter a Zoom breakout if you're remote. Let me know if you can't attend, but would like me to find you a partner or make an exception.

<< Previous | Class Schedule | Next >>