In the past two decades computational approaches to dialect variation, known as dialectometry, have allowed researchers to work efficiently with large amounts of data and in a data-driven manner define dialect groups, identify specific dialect features and search for general tendencies in language variation. One of the main advantages of the data-driven dialectology is “avoiding the need to select which features to use as the basis of characterization” (Nerbonne, 2008). However, most published studies in dialectometry are based on data extracted from dialect atlases or surveys containing linguistic features carefully selected by human experts. Automatic extraction and analysis of meaningful features from raw text, like interviews, would enable researchers to work with data that has not be chosen by experts and which can be considered unbiased. Despite the attractiveness of this type of approach, automatic feature extraction at all linguistic levels is still challenging (Kroon, 2022) and understudied.
We are excited to announce a MODIFED workshop organized by the Re-examining Dialect Syntax Network (REEDs) and Leiden University Centre for Digital Humanities (LUCDH). This 2-day workshop will take place at Leiden University on Thursday 20 June and Friday 21 June 2024. This event is designed to foster collaboration among specialists in dialectology, computational linguistics, and corpus linguistics, with a focus on identifying morphosyntactic dialect features from various semi-structured and unstructured sources. This workshop will provide an opportunity for researchers and research groups to reflect on theoretical and/or methodological problems and solutions related to automatic morphosyntactic dialect feature extraction.