Instructor:
People who use a language share its words. Yet the ways we use those shared words are nearly as unique as our fingerprints. Even stranger, our distinctive patterns do not depend upon rare words, but common ones we scarcely think about using: the, a, of, etc. Using data about such words, forensic linguists can identify the author of a document accurately enough to serve as evidence in court.
In this course, we study how and why we create data from language. We consider different types of texts, including transcribed speech, social media, and literature. What do we gain when we transform language into data? What do we lose? This course not only engages questions from academic fields like digital humanities and computational linguistics, but also practical questions of everyday life in the twenty-first century. Google became powerful and ubiquitous by transforming language into data. What does understanding that process teach us about how we live today?