Data Is Plural - S2E5: Crosswords

Dec 20 2023 16 mins 1

Other Episodes

This episode’s guests are George Ho and Saul Pwanson, whose crossword datasets were featured in the Data Is Plural newsletter in 2021 and 2016, respectively. Saul and George explain the difference between American-style and cryptic crosswords, how they collected their datasets, and what they learned along the way.

Relevant and mentioned links:

Saul’s xd archive, grid comparison, and .xd file format
FiveThirtyEight’s coverage of the plagiarism scandal Saul’s analysis unearthed and Saul’s csv,conf talk, “How a File Format Led to a Crossword Scandal”
George’s dataset of cryptic crossword clues
George’s datasheet for the dataset
Timnit Gebru et al.’s “Datasheets for Datasets”
XWord Info, from which Saul gathered New York Times crossword data
David Steinberg’s Pre-Shortzian Puzzle Project, with “litzing” contributions from Barry Haldiman and others