During class, after diving into groups, we will take a look at the data repository for the REPLICA paper. Any workflow for exploring the data is fine, but here is one suggested workflow to get you started, with some questions to discuss with your group along the way:
- Read the README to get a sense of the repository organization.
- Take a look at the benchmarks. Is there anything you find interesting or informative about the benchmarks?
- Take a look at the list of all changes identified. Do you see anything you find interesting or informative about the changes?
- Think about what other questions you might ask about the data not answered by the study. Do you have enough information to answer those questions, given enough time? How might you go about doing so?
- Poke around at the analysis scripts. Do you understand what is going on? How might you go about writing scripts to answer the questions you identified about the data in the previous step?
- How might you automate more of this workflow, if at all?
- If time allows, spend some time cloning the repository and running the scripts, possibly modifying them if interested. Or, spend some time poking around at the raw data or at other Git commits. What did you learn from this?
The only graded part will be answering the following discussion question (to be posted in this thread, listing all group members in your response): What did you learn from exploring the data that you did not learn from reading the paper? How might this influence work that you might do in this space, whether that work is another user study, an analysis of the existing data, or the design or improvement of a tool addressing user needs?
You can do this on your own if you cannot make it to class, or enter a Zoom breakout if you're remote. Let me know if you can't attend, but would like me to find you a partner or make an exception.