Audio-Aligned and Parsed Corpora of Vernacular Speech
This research group brings to the network unique experience, expertise, and leadership in the area of grammatically annotated corpora of vernacular speech. As such, our contribution rounds out the project’s goal of incorporating the entire range of existing tools and methodologies for advancing research in syntactic variation. The head of the research unit (Tortora) has spearheaded the creation of two one-million-word parsed corpora of vernacular speech for the purposes of the study of syntactic variation: the Audio-Aligned and Parsed Corpus of Appalachian English (AAPCAppE; Tortora et al. 2017), and the Corpus of New York City English, (CoNYCE; Tortora et al. in progress), both funded by the National Science Foundation and National Endowment for the Humanities (United States). The AAPCAppE and the CoNYCE consist of Praat textgrids accompanied by .wav files of the underlying speech signal, as well as a complete set of syntactically annotated text files, for two different dialects of American English. Proof of concept is available to the public through a web interface for the AAPCAppE at www.aapcappe.org. This resource allows researchers to do in-depth analyses of particular constructions that are specific to Appalachian English, or typical of vernacular Englishes more generally, using CorpusSearch queries (Randall 2009); access to the speech signal allows for full transparency and replicability in scientific research. The corpora also serve as a model for the building of similar such corpora of vernacular speech (of any language) in the future.