This episode’s guests are George Ho and Saul Pwanson, whose crossword datasets were featured in the Data Is Plural newsletter in 2021 and 2016, respectively. Saul and George explain the difference between American-style and cryptic crosswords, how they collected their datasets, and what they learned along the way.
Relevant and mentioned links:
- Saul’s xd archive, grid comparison, and .xd file format
- FiveThirtyEight’s coverage of the plagiarism scandal Saul’s analysis unearthed and Saul’s csv,conf talk, “How a File Format Led to a Crossword Scandal”
- George’s dataset of cryptic crossword clues
- George’s datasheet for the dataset
- Timnit Gebru et al.’s “Datasheets for Datasets”
- XWord Info, from which Saul gathered New York Times crossword data
- David Steinberg’s Pre-Shortzian Puzzle Project, with “litzing” contributions from Barry Haldiman and others
Theme music by Nikhil Sonnad.