Martijn Wieling

Martijn Wieling is Professor by special appointment of Low Saxon / Groningen Language and Culture at the Center for Groningen Language and Culture and an Associate Professor (UHD1) at the University of Groningen. His research focuses on investigating language variation and change quantitatively, with a specific focus on the Low Saxon language. He uses both large digital corpora of text and speech, as well as experimental approaches to assess differences in the movement of the tongue and lips during speech. More information about the research conducted in his group can be found on the website of the Speech Lab Groningen.

+31 6 100 950 21
m.b.wieling@rug.nl
Room nr.: 1311.434
@martijnwieling

Academic positions

Present 2018

Professor by special appointment of Low Saxon / Groningen Language and Culture

Center for Groningen Language and Culture
Present 2018

Associate Professor

University of Groningen, Department of Information Science
2018 2015

Assistant Professor (with tenure)

University of Groningen, Department of Information Science
2015 2013

Post-doctoral researcher (NWO Veni)

University of Groningen, Department of Information Science
2013 2012

Post-doctoral researcher (NWO Rubicon)

University of Tübingen, Department of Quantitative Linguistics

Education & Training

Ph.D. 2012

Ph.D. in Linguistics (cum laude)

University of Groningen, Faculty of Arts
M.Sc.2007

Master (research) of Science in Behavioural and Cognitive Neurosciences (cum laude)

University of Groningen, Faculty of Science and Engineering
M.Sc.2007

Master of Science in Computing Science
(cum laude)

University of Groningen, Faculty of Science and Engineering
B.Sc.2005

Bachelor of Science in Computing Science
(cum laude)

University of Groningen, Faculty of Science and Engineering

Honors, Awards and Grants (>€25,000)

2019-2024

NWO PhD in the Humanities Grant (€ 250,000)

This five-year research grant was awarded to Wieling and PhD student Teja Rebernik by the Netherlands Organisation for Scientific Research (NWO) for their project "Speech planning and monitoring in Parkinson's disease".
2019-2023

Google Community grant: Van Old noar Jong

In this Google-funded project, Wieling and his colleagues have developed a game to teach aspects of the Groningen dialects to primary school children. The game has launched and can be freely downloaded for Apple and Android.
GYA

Member (2019) of the Global Young Academy

Wieling was selected as one (out of 600+ applications) of the 43 new members of the Global Young Academy (DJA) in May 2019 for a period of five years. The Global Young Academy gives a voice to young scientists around the world. To realise this vision, the GYA develops, connects, and mobilises young talent from six continents. Moreover, the GYA empowers young researchers to lead international, interdisciplinary, and inter-generational dialogue with the goal to make global decision making evidence-based and inclusive.
EYRA

European Young Research Award (2016)

In 2016, Wieling was awarded the European Young Research Award for post-doctoral researchers. This prestigious award is granted every two years to researchers demonstrating outstanding research performance and leadership. It aims to inspire early stage researchers to incorporate a European dimension and perspective into their research. Wieling was the first awardee from the Humanities.
YAG

Founding Member of the Young Academy Groningen (2016)

In 2016, Wieling was selected as one of the 18 founding members of the Young Academy of Groningen for a period of five years. The Young Academy Groningen is a club for the University of Groningen’s most talented, enthusiastic and ambitious young researchers. Members come from all fields and disciplines and have a passion for science and an interest in matters concerning science policy, science and society, leadership and career development.
DJA

Member (2015) and vice-chairman (2018) of De Jonge Akademie

Wieling was selected as one of the youngest members of De Jonge Akademie (DJA) of the Royal Netherlands Academy of Arts and Sciences (KNAW) in April 2015 for a period of five years. In 2018, Wieling was elected as vice-chairman of De Jonge Akademie for a period of two years.The Young Academy is a dynamic and innovative group of 50 top young scientists and scholars with outspoken views about science and scholarship and the related policy. The Young Academy organises inspiring activities for various target groups focusing on interdisciplinarity, science policy, and the interface between science and society.
YSP

Lisa Lena Opas-Hänninen Young Scholar Prize (2014)

At the Methods in Dialectology XV conference, Wieling was awarded the Lisa Lena Opas-Hänninen Young Scholar Prize for his poster presentation on Validating and using the PMI-based Levenshtein distance as a measure of foreign accent strength.
2018-2019

CIT Data Science Project Grant (€ 35,000; co-PI: Dr Nanna Hilton)

This one-year data science grant was awarded to Wieling and Hilton by the Centre for Information Technology of the University of Groningen for their project "Automatic recognition of Frisian speakers: Using computers to discriminate the Frisian accent and voice".
2017-2021

YAG Interdisciplinary PhD Grant (€ 120,000; co-PI: Prof. Michel Vols)

Together with Prof. Michel Vols, Wieling was awarded one of the three yearly interdisciplinary Young Academy of Groningen PhD grants for a project with the goal of automatically predicting court judgments.
2017-2018

Digital Humanities Explored Project Grant (€ 50,000)

This one-year data science grant was awarded to Wieling by the Centre for Digital Humanities of the University of Groningen for his project "Identifying Dutch accents automatically".
2016-2017

VIVIS Research Grant (€ 50,000)

This one-year research grant was awarded to Wieling by the Vereniging van Instellingen voor mensen met een VISuele beperking for his project "Automatic speech recognition for congenitally blind speakers".
2013-2017

NWO Veni Grant (€ 250,000)

This four-year research grant was awarded to Wieling by the Netherlands Organisation for Scientific Research (NWO) for his project "Improving speech learning models and English pronunciation with articulography". Only 15.5% of the submitted project proposals were granted.
2012-2013

NWO Rubicon Grant (€ 60,000)

This one-year research grant was awarded to Wieling by the Netherlands Organisation for Scientific Research (NWO) for his project "Investigating language variation physically". Only 12% of the submitted project proposals were granted.

Publications

(h: 35)

Exploring self-supervised speech representations for cross-lingual acoustic-to-articulatory inversion

Hao, Y., de Vries, W., Tienkamp, T., van Noord, R., & Wieling, M. (2024, forthcoming)

Proceedings paper Proceedings of Interspeech 2024.

Abstract

Acoustic-to-articulatory inversion (AAI) is the process of inferring vocal tract movements from acoustic speech signals. Despite its diverse potential applications, AAI research in languages other than English is scarce due to the challenges of collecting articulatory data. In recent years, self-supervised learning (SSL) based representations have shown great potential for addressing low-resource tasks. We utilize wav2vec 2.0 representations and English articulatory data for training AAI systems and investigates their effectiveness for a different language: Dutch. Results show that using mms-1b features can reduce the cross-lingual performance drop to less than 30%. We found that increasing model size, selecting intermediate rather than final layers, and including more pre-training data improved AAI performance. By contrast, fine-tuning on an ASR task did not. Our results therefore highlight promising prospects for implementing SSL in AAI for languages with limited articulatory data.

Quantifying the effect of speech pathology on automatic and human speaker verification

Halpern, B., Tienkamp, T., Huang, W., Violeta, L., Rebernik, T., de Visscher, S., Witjes, M., Wieling, M., Abur, D., & Toda, T. (2024, forthcoming)

Proceedings paper Proceedings of Interspeech 2024.

Abstract

This study investigates how surgical intervention for speech pathology (specifically, as a result of oral cancer surgery) impacts the performance of an automatic speaker verification (ASV) system. Using two recently collected Dutch datasets with parallel pre and post-surgery audio from the same speaker, NKI-OC-VC and SPOKE, we assess the extent to which speech pathology influences ASV performance, and whether objective/subjective measures of speech severity are correlated with the performance. Finally, we carry out a perceptual study to compare judgements of ASV and human listeners. Our findings reveal that pathological speech negatively affects ASV performance, and the severity of the speech is negatively correlated with the performance. There is a moderate agreement in perceptual and objective scores of speaker similarity and severity, however, we could not clearly establish in the perceptual study, whether the same phenomenon also exists in human perception.

Multiple estimates of the Frisian and Low Saxon speaker population size in the Netherlands

Buurke, R., Bartelds, M., Knooihuizen, R., & Wieling, M. (2024)

Journal paper Linguistic Minorities in Europe Online, doi: 10.1515/lme.28672125.

Abstract

Language questionnaires are often used to approximate the size of linguistic communities, which we attempt for two regional languages in the Netherlands: Frisian and Low Saxon. We distributed a language questionnaire about a range of topics (including language use, proficiency, intergenerational transfer, and the respondent�s language learning context) through an existing large-scale longitudinal study (the Lifelines Cohort Study). This yielded 38,500 respondents across the three northern provinces (Fryslân, Groningen, and Drenthe) where the two regional languages are spoken. Language questionnaires can suffer from bias arising from how questions are presented or information is portrayed. Our sample likely suffered from sampling bias, because the prevalence of dialect speakers was unrealistically large. Initially, we applied post-stratification to account for differences between ratios in the sample and the northern population (e.g., for sex, age, domicile population density, and educational attainment). This only improved the estimates to a limited extent for our metrics (i.e., self-indicated speaking proficiency and use at home), so we used an intergenerational transmission approach instead. Earlier language usage estimates were used as reference points, and we derived estimates for the generations that followed the reference generations. We found that the Low Saxon speaker population size is declining, with around 350,000 speakers in 2021 aged between 6 and 69 (around 41% of the population in that age range) and 140,000 people using it at home (around 17%). The Frisian population appears stable, with around 250,000 speakers aged between 5 and 60 (62% of the population in that age range) and 195,000 people using it at home (around 48%). As these estimates seem plausible when compared to other speaker counts, we conclude that our intergenerational estimation approach may be used to obtain speaker estimates when required information is available and more common methods are ineffective.

A cognitive geographic approach to dialectology: Cognitive distance as a predictor for perceptual dialect distance

Sekeres, H., Wieling, M., & Knooihuizen, R. (2024, forthcoming)

Book chapter In: Susanne Wagner (eds.) Proceedings of Methods in Dialectology XVII. Language Science Press.

Abstract

In dialectology, the central relationship under investigation is usually that between dialect distance and (Euclidean) geographic distance. Nevertheless, other approaches than geographic distance may be better suited to represent the relationship humans have with space, such as travel times (Gooskens, 2004) or 'rice paddy distances' (Stanford, 2012), and have been successfully used to explain dialect variation. In this study, we explain perceptual dialect differences using both geographic distance and a different type of distance that is commonly used in the field of cognitive geography. Cognitive geography is based on the assumption that an individual's mental representation of their environment has a greater effect on their behaviour than the actual environment (Montello, 2018). A commonly used metric in cognitive geography is the cognitive distance: the geographic distance between two places as estimated by an individual (Montello, 1991). Although the individual and social aspects of language are an important component of research in dialectology, the individual and social aspects of geography have not been widely considered. This study introduces the use of cognitive distances into dialect research and investigates whether these mental representations of space can serve as an explanatory variable in dialectology. Nearly 800 participants from the north of the Netherlands provided cognitive distances between the place where they grew up and seven other locations in the same region. They also rated the similarity of dialect recordings from these locations to the dialect of the location where they grew up. A linear mixed-effects regression model was built to predict perceptual dialect distance from both cognitive distance and geographic distance. The resulting model indicates that geographic distance is more predictive of perceptual dialect distance than cognitive distance. There was also a significant interaction between cognitive and geographic distance. Cognitive distance is more predictive of perceptual dialect distance when geographic distance is short than when geographic distance is long. Furthermore, an exploratory analysis revealed that gender and proficiency in the participants' local dialect were predictive of perceptual dialect distance. Our findings indicate that cognitive distance can be used to explain dialect variation, especially when the area under investigation is small, and consequently that the framework of cognitive geography can be usefully employed in dialectological research.

Developing infrastructure for low-resource language corpus building

Sekeres, H., Heeringa, W., de Vries, W., Zwagers, O., Wieling, M., & Jensma, G. (2024)

Proceedings paper In: Maite Melero, Sakriani Sakti, & Claudia Soria (eds.) Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages at LREC-COLING 2024, ELRA and ICCL, pp. 72-78.

Abstract

For many of the world's small languages, few resources are available. In this project, a written online accessible corpus was created for the minority language variant Gronings, which serves both researchers interested in language change and variation and a general audience of (new) speakers interested in finding real-life examples of language use. The corpus was created using a combination of volunteer work and automation, which together formed an efficient pipeline for converting printed text to Key Words in Context (KWICs), annotated with lemmas and part-of-speech tags. In the creation of the corpus, we have taken into account several of the challenges that can occur when creating resources for minority languages, such as a lack of standardisation and limited (financial) resources. As the solutions we offer are applicable to other small languages as well, each step of the corpus creation process is discussed and resources will be made available benefiting future projects on other low-resource languages.

The impact of electromagnetic articulography sensors on the articulatory-acoustic vowel space in speakers with and without Parkinson's Disease

Tienkamp, T., Rebernik, T., Jacobi, J., Wieling, M., & Abur, D. (2024)

Proceedings paper In: Cécile Fougeron, & Pascal Perrier (eds.) Proceedings of the 13th International Seminar of Speech Production, pp. 61-64.

Abstract

The somatosensory effect of electromagnetic articulography (EMA) sensors on speech remains relatively unexplored. Moreover, EMA sensors may be more disruptive to speech in individuals with somatosensory deficits (e.g., persons with Parkinson's Disease; PwPD). Thus, we investigated the effect of EMA sensors on the articulatory-acoustic vowel space (AAVS) in both typical speakers (n=23) and PwPD (n=23). The AAVS was calculated before EMA sensor placement, directly after, and after approximately one hour to assess habituation. The AAVS significantly decreased following sensor placement and did not change with habituation, regardless of speaker group. PwPD had a smaller AAVS compared to typical speakers, but were not differentially impacted by EMA sensors. EMA sensor placement led to average reductions of the AAVS of 13.5% for PwPD and 14.2% for typical speakers, which suggests that articulatory-acoustics from studies with and without the use of EMA sensors may not be fully comparable.

The effect of speaking style on the articulatory-acoustic vowel space in individuals with tongue cancer before and after surgical treatment

Tienkamp, T., Rebernik, T., Buurke, ., Raoul, ., Polsterer, K., van Son, R., Wieling, M., Witjes, M., de Visscher, S., & Abur, D. (2024)

Proceedings paper In: Cécile Fougeron, & Pascal Perrier (eds.) Proceedings of the 13th International Seminar of Speech Production, pp. 65-68.

Abstract

The impact of surgical treatment for tongue cancer is traditionally assessed with vowel formant metrics from read speech or sustained vowels. However, isolated speech might not fully reflect a speaker's typical speech. Here, we assessed the effect of speaking style (read vs. semi-spontaneous) on vowel acoustics of individuals pre- and post-surgery for tongue cancer. Eight individuals (3 females and 5 males) were recorded pre- and approximately six months post-surgery. We calculated the articulatory-acoustic vowel space (AAVS) during read speech (sentences) and semi-spontaneous speech (picture description). Results showed that the AAVS did not differ significantly pre- and post-surgery. Picture descriptions yielded a significantly smaller AAVS compared to the reading task, which was consistent pre- and post-surgery. Our findings suggest that both read and semi-spontaneous speech styles would be suitable to quantify the impact of surgical intervention for tongue cancer on vowel acoustics.

Assessing differences in articulatory-acoustic vowel space in Parkinson's Disease by sex and phenotype

Hoekzema, N., Rebernik, T., Tienkamp, T., Chaboksavar, S., Ciot, V., Gleichman, A., Jonkers, R., Noiray, A., Wieling, M., & Abur, D. (2024)

Proceedings paper In: Cécile Fougeron, & Pascal Perrier (eds.) Proceedings of the 13th International Seminar of Speech Production, pp. 69-72.

Abstract

The goal of this study was to determine whether articulatory-acoustics differ between individuals in the tremor-dominant (TD) and postural instability/gait difficulty (PIGD) phenotypes of Parkinson's disease (PD). The study included 31 individuals with PD (21 TD, 10 PIGD) and 29 control speakers (CS) who were all Dutch native speakers. A read speech task and a semi-spontaneous speech task were completed, and the Articulatory-Acoustic Vowel Space (AAVS) was calculated for both tasks. Results showed no significant difference in AAVS between the overall control group and PD for either phenotype. Follow-up analyses, pooling speech data from our prior study (+27 PD, +23 CS), demonstrated a significantly lower AAVS in males with PD compared to controls and no group differences for females. Thus, articulatory-acoustic changes may be more pronounced for male compared to female speakers with PD, but may not differ by PD phenotype.

Production allophones of North American English liquids

Tiede, M., Boyce, S., Stern, M., Rebernik, T., & Wieling, M. (2024)

Proceedings paper In: Cécile Fougeron, & Pascal Perrier (eds.) Proceedings of the 13th International Seminar of Speech Production, pp. 153-156.

Abstract

The syllabic liquids [ɚ] (as in "purr") and [əɫ] (as in "pull") have well-defined acoustic targets but are produced with a wide range of heterogenous tongue postures. This work surveys midsagittal tongue shapes from a large (N=78) number of speakers producing these sounds, to illustrate their variety, and to determine systematically how this variety can be quantified. In particular we propose that a categorization based on just two parameters--degree of tongue dorsum convexity and tip orientation--is sufficient to classify observed shapes, and superior to defining ad hoc prototypes.

Minority language happiness: the link between social inclusion, well-being, and speaking a regional language in the northern Netherlands

Brouwer, J., Buurke, R., van den Berg, F., Knooihuizen, R., Loerts, H., Bartelds, M., Wieling, M., & Keijzer, M. (2024)

Journal paper Ampersand, 12, 100173.

Abstract

Belonging to groups is often based on shared features between members and is associated with higher levels of (social) well-being. One especially strong marker of one's group membership is language. In linguistics, most research about group membership and well-being focuses on migrants and refugees. However, very little research has focused on the link between speaking a regional language and well-being. This is surprising, as regional languages index a strong shared in-group identity that could lead to exclusion of those who do not speak them. As a first empirical step, this paper reports on the association between regional language use (specifically, of Frisian and Low Saxon) and social well-being. We distributed a language background questionnaire to participants of the Lifelines cohort, a multigenerational cohort study comprising data from 167,729 participants living in the north of the Netherlands. In both language contexts (Frisian in Fryslân, and Low Saxon in Groningen and Drenthe), those using the regional language half of the time or more were found to have significantly more social contacts. They also experienced higher levels of social embeddedness than those who did not know or did not frequently use the regional language. The higher degree of social embeddedness for frequent regional language users was most strongly present in rural areas. Furthermore, we found that frequent users of Frisian living in Fryslân had higher levels of social embeddedness in rural areas than frequent users of Low Saxon living in Groningen or Drenthe. No effect was found for more overt measures of social well-being such as loneliness or life satisfaction. While our results confirm an association between regional language speaking and some indices of social functioning on a large scale, they cannot uncover a causal relationship between the two. We discuss how longitudinal studies and interviews in future studies may inform us further about the relation between regional language use and social well-being.

GAM-based individual difference measures for L2 ERP studies

Meulman, N., Schmid, M., Sprenger, S., & Wieling, M. (2023)

Journal paper Research Methods in Applied Linguistics, 2(3), 100079.

Abstract

ERPs (Event-Related Potentials) have become a widely-used measure to study second language (L2) processing. To study individual differences, traditionally a component outcome measure is calculated by averaging the amplitude of a participant's brain response in a pre-specified time window of the ERP waveform in different conditions (e.g., the 'Response Magnitude Index'; Tanner, Mclaughlin, Herschensohn & Osterhout, 2013). This approach suffers from the problem that the definition of such time windows is rather arbitrary, and that the result is sensitive to outliers as well as participant variation in latency. The latter is particularly problematic for studies on L2 processing. Furthermore, the size of the ERP response (i.e., amplitude difference) of an L2 speaker may not be the best indicator of near-native proficiency, as native speakers also show a great deal of variability in this respect, with the 'robustness' of an L2 speaker's ERP response (i.e., how consistently they show an amplitude difference) potentially being a more useful indicator. In this paper we introduce a novel method for the extraction of a set of individual difference measures from ERP waveforms. Our method is based on participants' complete waveforms for a given time series, modelled using generalized additive modelling (GAM; Wood, 2017). From our modelled waveform, we extract a set of measures which are based on amplitude, area and peak effects. We illustrate the benefits of our method compared to the traditional Response Magnitude Index with data on the processing of grammatical gender violations in 66 Slavic L2 speakers of German and 29 German native speakers. One of our measures in particular appears to outperform the others in characterizing differences between native speakers and L2 speakers, and captures proficiency differences between L2 speakers: the 'Normalized Modelled Peak'. This measure reflects the height of the (modelled) peak, normalized against the uncertainty of the modelled signal, here in the P600 search window. This measure may be seen as a measure of peak robustness, that is, how reliable the individual is able to show a P600 effect, largely independently of where in the P600 window this occurs. We discuss implications of our results and offer suggestions for future studies on L2 processing. The code to implement these analyses is available for other researchers.

Sound change estimation in Netherlandic regional languages: reducing inter-transcriber variability in dialect corpora

Buurke, R., & Wieling, M. (2023)

Journal paper Taal & Tongval, 75(1), 7-28.

Abstract

Large phonetic corpora are frequently used to investigate language variation and change in dialects, but these corpora are often constructed by many researchers in a collaborative effort. This typically results in inter-transcriber issues that may impact the reliability of analyses using these data. This problem is exacerbated when multiple phonetic corpora are compared when investigating real time dialect change. In this study, we therefore propose a method to automatically and iteratively merge phonetic symbols used in the transcriptions to obtain a more course-grained, but better comparable, phonetic transcription. Our approach is evaluated using two large phonetic Netherlandic dialect corpora in an attempt to estimate sound change in the area in the 20th century. The results are discussed in the context of the available literature about dialect change in the Netherlandic area.

Changing usage of Low Saxon auxiliary and modal verbs

Siewert, J., Wieling, M., & Scherrer, Y. (2023)

Proceedings paper In: Nina Tahmasebi, Syrielle Montariol et al. (eds.) Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, Association for Computational Linguistics, pp. 112-118.

Abstract

We investigate the usage of auxiliary and modal verbs in Low Saxon dialects from both Germany and the Netherlands based on word vectors, and compare developments in the modern language to Middle Low Saxon. Although most of these function words have not been affected by lexical replacement, changes in usage that likely at least partly result from contact with the state languages can still be observed.

Predicting citations in Dutch case law with natural language processing

Schepers, I., Medvedeva, M., Bruijn, M., Wieling, M., & Vols, M. (2023)

Journal paper Artificial Intelligence and Law, doi: 10.1007/s10506-023-09368-5.

Abstract

With the ever-growing accessibility of case law online, it has become challenging to manually identify case law relevant to one's legal issue. In the Netherlands, the planned increase in the online publication of case law is expected to exacerbate this challenge. In this paper, we tried to predict whether court decisions are cited by other courts or not after being published, thus in a way distinguishing between more and less authoritative cases. This type of system may be used to process the large amounts of available data by filtering out large quantities of non-authoritative decisions, thus helping legal practitioners and scholars to find relevant decisions more easily, and drastically reducing the time spent on preparation and analysis. For the Dutch Supreme Court, the match between our prediction and the actual data was relatively strong (with a Matthews Correlation Coefficient of 0.60). Our results were less successful for the Council of State and the district courts (MCC scores of 0.26 and 0.17, relatively). We also attempted to identify the most informative characteristics of a decision. We found that a completely explainable model, consisting only of handcrafted metadata features, performs almost as well as a less well-explainable system based on all text of the decision.

Making more of little data: Improving low-resource automatic speech recognition using data augmentation

Bartelds, M., San, N., McDonnell, B., Jurafsky, D., & Wieling, M. (2023)

Proceedings paper Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 715-729.

Abstract

The performance of automatic speech recognition (ASR) systems has advanced substantially in recent years, particularly for languages for which a large amount of transcribed speech is available. Unfortunately, for low-resource languages, such as minority languages, regional languages or dialects, ASR performance generally remains much lower. In this study, we investigate whether data augmentation techniques could help improve low-resource ASR performance, focusing on four typologically diverse minority languages or language variants (West Germanic: Gronings, West-Frisian; Malayo-Polynesian: Besemah, Nasal). For all four languages, we examine the use of self-training, where an ASR system trained with the available human-transcribed data is used to generate transcriptions, which are then combined with the original data to train a new ASR system. For Gronings, for which there was a pre-existing text-to-speech (TTS) system available, we also examined the use of TTS to generate ASR training data from text-only sources. We find that using a self-training approach consistently yields improved performance (a relative WER reduction up to 20.5% compared to using an ASR system trained on 24 minutes of manually transcribed speech). The performance gain from TTS augmentation for Gronings was even stronger (up to 25.5% relative reduction in WER compared to a system based on 24 minutes of manually transcribed speech). In sum, our results show the benefit of using self-training or (if possible) TTS-generated data as an efficient solution to overcome the limitations of data availability for resource-scarce languages in order to improve ASR performance.

DUMB: A Benchmark for Smart Evaluation of Dutch Models

de Vries, W., Wieling, M., & Nissim, M. (2023)

Proceedings paper Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 7221-7241.

Abstract

We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of datasets for low-, medium- and high-resource tasks. The total set of nine tasks includes four tasks that were previously not available in Dutch. Instead of relying on a mean score across tasks, we propose Relative Error Reduction (RER), which compares the DUMB performance of language models to a strong baseline which can be referred to in the future even when assessing different sets of language models. Through a comparison of 14 pre-trained language models (mono- and multi-lingual, of varying sizes), we assess the internal consistency of the benchmark tasks, as well as the factors that likely enable high performance. Our results indicate that current Dutch monolingual models under-perform and suggest training larger Dutch models with other architectures and pre-training objectives. At present, the highest performance is achieved by DeBERTaV3 (large), XLM-R (large) and mDeBERTaV3 (base). In addition to highlighting best strategies for training larger Dutch models, DUMB will foster further research on Dutch. A public leaderboard is available at https://dumbench.nl.

5-minute formant adaptation task in Dutch children

Rebernik, T., Tienkamp, T. B., Polsterer, K. M., Hukker, V., Medvedeva, M., van der Ploeg, M., Schepers, I., Sekeres, H., de Vries, W., Abur, D., Jonkers, R., Noiray, A., & Wieling, M. (2023)

Proceedings paper 20th International Congress of Phonetic Sciences, pp. 923-927.

Abstract

The study investigates whether a short formant perturbation experiment elicits an adaptive response under less controlled experimental circumstances. 30 Dutch children were recruited and tested at a festival. They were asked to produce four target words containing an open-mid front rounded vowel /ɛ/ while we manipulated their feedback so that they would hear /ɪ/ for a period of 16 trials. Despite the short adaptation paradigm, our results show that children significantly changed their vowel productions in response to the perturbation. This suggests that long and monotonous experimental paradigms might not always be necessary, especially with populations that have a shorter attention span.

SPRAAKLAB: a mobile laboratory for collecting speech production data

Wieling, M., Rebernik, T., & Jacobi, J. (2023)

Proceedings paper 20th International Congress of Phonetic Sciences, pp. 2060-2064.

Abstract

In this paper, we discuss the specifications of a mobile laboratory, dubbed SPRAAKLAB, and how we use it for acquiring research-grade acoustic and articulatory data in the field, thereby providing access to participant populations which are otherwise hard to study. In addition, we illustrate how the mobile laboratory supports public engagement activities in combination with research data acquisition, allowing us to entertain and inform an interested audience about speech research, while simultaneously collecting speech production data from dozens of participants in a matter of days.

Estimating the level and direction of aggregated sound change of dialects in the northern Netherlands

Buurke, R. S., Sekeres, H. G., Heeringa, W., Knooihuizen, R., & Wieling, M. (2022)

Journal paper Taal & Tongval, 74(2), 183 - 214.

Abstract

This article reports investigations into sound change at the community-level of Frisian and Low Saxon dialect groups in the north of the Netherlands, which differ in key factors influencing dialect decline. We combine phoneti- cally transcribed corpora with dialectometric approaches that can quantify change among older male dialect speakers in a real-time framework. A multidimensional variant of the Levenshtein distance, combined with methods that induce realistic distances between sounds, is used to estimate how much dialect groups converged to and diverged from Standard Dutch between 1990 and 2010. Our analyses indicate that sound change is a slow process in this geographical area. The Frisian and North Low Saxon dialect groups seem to be most stable, while Westphalian Low Saxon varieties seem to be most prone to change. We offer possible explanations for our findings and discuss shortcomings of the data and approach in detail.

Longitudinal analyses of depression, anxiety, and suicidal ideation highlight greater prevalence in the northern Dutch population during the COVID-19 lockdowns

Ori, A. P., Wieling, M., Lifelines, C. R. I., & van Loo, H. M. (2023)

Journal paper Journal of Affective Disorders, 323, 62-70.

Abstract

Background: The pandemic of the coronavirus disease 2019 (COVID-19) has led to an increased burden on mental health. Aims: To investigate the development of major depressive disorder (MDD), generalized anxiety disorder (GAD), and suicidal ideation in the Netherlands during the first fifteen months of the pandemic and three nation-wide lockdowns.

Method: Participants of the Lifelines Cohort Study - a Dutch population-based sample-reported current symptoms of MDD and GAD, including suicidal ideation, according to DSM-IV criteria. Between March 2020 and June 2021, 36,106 participants (aged 18-96) filled out a total of 629,811 questionnaires across 23 time points. Trajectories over time were estimated using generalized additive models and analyzed in relation to age, sex, and lifetime history of MDD/GAD.

Results: We found non-linear trajectories for MDD and GAD with a higher number of symptoms and prevalence rates during periods of lockdown. The point prevalence of MDD and GAD peaked during the third hard lockdown at 2.88 % (95 % CI: 2.71 %-3.06 %) and 2.92 % (95 % CI: 2.76 %-3.08 %), respectively, in March 2021. Women, younger adults, and participants with a history of MDD/GAD reported significantly more symptoms. For suicidal ideation, we found a significant linear increase over time in younger participants. For example, 20-year-old participants reported 4.14x more suicidal ideation at the end of June 2021 compared to the start of the pandemic (4.64 % (CI: 3.09 %-6.96 %) versus 1.12 % (CI: 0.76 %-1.66 %)).

Limitations: Our findings should be interpreted in relation to the societal context of the Netherlands and the public health response of the Dutch government during the pandemic, which may be different in other regions in the world.

Conclusions: Our study showed greater prevalence of MDD and GAD during COVID-19 lockdowns and a continuing increase in suicidal thoughts among young adults suggesting that the pandemic and government enacted restrictions impacted mental health in the population. Our findings provide actionable insights on mental health in the population during the pandemic, which can guide policy makers and clinical care during future lockdowns and epi/pandemics.

Distance over time in a maximal sprint: understanding athletes' action boundaries in sprinting

Postma, D., Wieling, M., Lemmink, K., & Zaal, F. (2022)

Journal paper Ecological Psychology, 34(4), 133 - 156.

Abstract

The present study examined the kinematics of maximal effort sprint running, mapping the relations among a person's maximal running speed, maximum running acceleration and the distance coverable in a certain amount of time by this person. Thirty-three participants were recruited to perform a simple sprint task. Both forward and backward running were considered. Participants' position, velocity and acceleration data were obtained using a Local Positioning Measurement system. Participants' speed-acceleration profiles turned out to be markedly non-linear. To account for these non-linear patterns, we propose a new macroscopic model on the kinematics of sprint running. Second, we examined whether target distance was of influence on the evolution of participants' running speeds over time. Overall, no such effect on running velocity was present, except for a 'finish-line effect'. Finally, we studied how variation in individuals' maximum running velocities and accelerations related to differences in their action boundaries. The findings are discussed in the context of affordance-based control in running to catch fly balls.

Communicative language teaching: Structure-based or Dynamic Usage-Based?

Rousse-Malpat, A., Steinkrauss, R., Wieling, M., & Verspoor, M. (2022)

Journal paper The Journal of EuroSLA, 6(1), 20 - 33.

Abstract

Although communicative language teaching (CLT) was thought to have revolutionized classroom practice, there are "weak" and "strong" versions (Howatt, 1984). Most foreign language classrooms in the world still favor weak versions with structure-based (SB) views on language (Lightbown & Spada, 2013), and practice in the Netherlands is not much different (West & Verspoor, 2016). However, a small group of teachers in the Netherlands started teaching French as a second language with a strong CLT program in line with Dynamic Usage-Based (DUB) principles. Rather than focusing on rule learning and explicit grammar teaching to avoid errors, the DUB program takes the dynamics of second-language development into consideration and focuses on the three key elements of usage-based theory: frequency, salience and contingency. These translate into a great deal of exposure, repetition, learning the meaning of every single word through gestures, and presenting whole chunks of language, all without explicit grammar teaching. This study aims to compare the effects of the SB and DUB instructional programs after three years. We traced the second-language development of 229 junior high school students (aged 12 to 15) learning French in the Netherlands over three years. The participants took three oral tests over the course of three years (568 interviews) and wrote seven narratives on the same topic (1511 narratives). As expected, the DUB approach, which is in line with a strong CLT version, was more effective in achieving proficiency in both speaking and writing and equally effective in achieving accuracy.

Quantifying language variation acoustically with few resources

Bartelds, M., & Wieling, M. (2022)

Proceedings paper Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3735-3741.

Abstract

Deep acoustic models represent linguistic information based on massive amounts of data. Unfortunately, for regional languages and dialects such resources are mostly not available. However, deep acoustic models might have learned linguistic information that transfers to low-resource languages. In this study, we evaluate whether this is the case through the task of distinguishing low-resource (Dutch) regional varieties. By extracting embeddings from the hidden layers of various wav2vec 2.0 models (including a newly created Dutch model) and using dynamic time warping, we compute pairwise pronunciation differences averaged over 10 words for over 100 individual dialects from four (regional) languages. We then cluster the resulting difference matrix in four groups and compare these to a gold standard, and a partitioning on the basis of comparing phonetic transcriptions. Our results show that acoustic models outperform the (traditional) transcription-based approach without requiring phonetic transcriptions, with the best performance achieved by the multilingual XLSR-53 model fine-tuned on Dutch. On the basis of only six seconds of speech, the resulting clustering closely matches the gold standard.

Make the best of cross-lingual transfer: Evidence from POS tagging with over 100 languages

de Vries, W., Wieling, M., & Nissim, M. (2022)

Proceedings paper Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 7676-7685.

Abstract

Cross-lingual transfer learning with large multilingual pre-trained models can be an effective approach for low-resource languages with no labeled training data. Existing evaluations of zero-shot cross-lingual generalisability of large pre-trained models use datasets with English training data, and test data in a selection of target languages. We explore a more extensive transfer learning setup with 65 different source languages and 105 target languages for part-of-speech tagging. Through our analysis, we show that pre-training of both source and target language, as well as matching language families, writing systems, word order systems, and lexical-phonetic distance significantly impact cross-lingual performance. The findings described in this paper can be used as indicators of which factors are important for effective zero-shot cross-lingual transfer to zero- and low-resource languages.

Low Saxon dialect distances at the orthographic and syntactic level

Siewert, J., Scherrer, Y., & Wieling, M. (2022)

Proceedings paper Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change, pp. 119-124.

Abstract

We compare five Low Saxon dialects from the 19th and 21st century from Germany and the Netherlands with each other as well as with modern Standard Dutch and Standard German. Our comparison is based on character n-grams on the one hand and PoS n-grams on the other and we show that these two lead to different distances. Particularly in the PoS-based distances, one can observe all of the 21st century Low Saxon dialects shifting towards the modern majority languages.

Neural representations for modeling variation in speech

Bartelds, M., de Vries, W., Sanal, F., Richter, C., Liberman, M., & Wieling, M. (2022)

Journal paper Journal of Phonetics, 92, 101137.

Abstract

Variation in speech is often quantified by comparing phonetic transcriptions of the same utterance. However, manually transcribing speech is time-consuming and error prone. As an alternative, therefore, we investigate the extraction of acoustic embeddings from several self-supervised neural models. We use these representations to compute word-based pronunciation differences between non-native and native speakers of English, and between Norwegian dialect speakers. For comparison with several earlier studies, we evaluate how well these differences match human perception by comparing them with available human judgements of similarity. We show that speech representations extracted from a specific type of neural model (i.e.~Transformers) lead to a better match with human perception than two earlier approaches on the basis of phonetic transcriptions and MFCC-based acoustic features. We furthermore find that features from the neural models can generally best be extracted from one of the middle hidden layers than from the final layer. We also demonstrate that neural speech representations not only capture segmental differences, but also intonational and durational differences that cannot adequately be represented by a set of discrete symbols used in phonetic transcriptions.

Rethinking the field of automatic prediction of court decisions

Medvedeva, M., Wieling, M., & Vols, M. (2023)

Journal paper Artificial Intelligence and Law, 31, 195-212.

Abstract

In this paper, we discuss previous research in automatic prediction of court decisions. We define the difference between outcome identification, outcome-based judgement categorisation and outcome forecasting, and review how various studies fall into these categories. We discuss how important it is to understand the legal data that one works with in order to determine which task can be performed. Finally, we reflect on the needs of the legal discipline regarding the analysis of court judgements.

Automatically identifying eviction cases and outcomes within case law of Dutch Courts of First Instance

Medvedeva, M., Dam, T., Wieling, M., & Vols, M. (2021)

Proceedings paper In: Erich Schweighofer (eds.) Legal Knowledge and Information Systems, IOS Press, pp. 13-22.

Abstract

In this paper we attempt to identify eviction judgements within all case law published by Dutch courts in order to automate data collection, previously conducted manually. To do so we performed two experiments. The first focused on identifying judgements related to eviction, while the second focused on identifying the outcome of the cases in the judgements (eviction vs.~dismissal of the landlord's claim). In the process of conducting the experiments for this study, we have created a manually annotated dataset with eviction-related judgements and their outcomes.

Cognitive benefits of learning additional languages in old adulthood? Insights from an intensive longitudinal intervention study

Kliesch, M., Pfenninger, S., Wieling, M., Stark, E., & Meyer, M. (2021)

Journal paper Applied Linguistics, 43(4), 653-676.

Abstract

Second language (L2) learning has been promoted as a promising intervention to stave off age-related cognitive decline. While previous studies based on mean trends showed inconclusive results, this study is the first to investigate nonlinear cognitive trajectories across a 30-week training period. German-speaking older participants (aged 64-75 years) enrolled for a Spanish course, strategy game training (active control) or movie screenings (passive control). We assessed cognitive performance in working memory, alertness, divided attention and verbal fluency on a weekly basis. Trajectories were modelled using Generalized Additive Mixed Models to account for temporally limited transfer effects and intraindividual variation in cognitive performance. Our results provide no evidence of cognitive improvement differing between the Spanish and either of the control groups during any phase of the training period. We did, however, observe an effect of baseline cognition, such that individuals with low cognitive baselines increased their performance more in the L2 group than comparable individuals in the control groups. We discuss these findings against the backdrop of the cognitive training literature and Complex Dynamic Systems Theory.

Accuracy assessment of two electromagnetic articulographs: NDI Wave and NDI Vox

Rebernik, T., Jacobi, J., Tiede, M., & Wieling, M. (2021)

Journal paper Journal of Speech, Language, and Hearing Research, 64(7), 2637-2667.

Abstract

Purpose: This study compares two electromagnetic articulographs (EMA) manufactured by Northern Digital, Inc.: the NDI Wave System (2008) and the NDI Vox-EMA System (2020).

Method: Four experiments were completed: (a) comparison of statically positioned sensors; 4(b) tracking dynamic movements of sensors manipulated using a motor-driven LEGO apparatus; (c) tracking small and large movements of sensors mounted in a rigid bar manipulated by hand; and (d) tracking movements of sensors rotated on a circular disc. We assessed spatial variability for statically positioned sensors, variability in the transduced Euclidean distances (EDs) between sensor pairs, and missing data rates. For sensors tracking circular movements, we compared the fit between fitted ideal circles and actual trajectories.

Results: The average sensor pair tracking error (i.e., the standard deviation of the EDs) was 1.37 mm for the WAVE and 0.12 mm for the VOX during automated trials at the fastest speed, and 0.35 mm for the WAVE and 0.14mm for the VOX during the tracking of large manual movements. The average standard deviation of the fitted circle radii charted by manual circular disc movements was 0.72mm for the WAVE sensors and 0.14mm for the VOX sensors. There was no significant difference between the WAVE and the VOX in the number of missing frames.

Conclusions: In general, the VOX system significantly outperformed the WAVE on measures of both static precision and dynamic accuracy (automated and manual). For both systems, positional precision and spatial variability were influenced by the sensors' position relative to the field generator unit (FGU; worse when further away).

Automatic judgement forecasting for pending applications of the European Court of Human Rights

Medvedeva, M., Üstün, A., Xu, X., Vols, M., & Wieling, M. (2021)

Proceedings paper Proceedings of the Fifth Workshop on Automatec Semantic Analysis of Information in Legal Text (ASAIL 2021), pp. 12 - 23.

Abstract

Judicial decision classification using Natural Language Processing and machine learning has received much attention in the last decade. While many studies claim to 'predict judicial decisions', most of them only classify already made judgements. Likely due to the lack of data, there have been only a few studies that discuss the data and the methods to forecast future judgements of the courts on the basis of data available before the court judgement is known. Besides proposing a more consistent and precise terminology, as classification and forecasting each have different uses and goals, we release a first benchmark dataset consisting of documents of the European Court of Human Rights to address this task. The dataset includes raw data as well as pre-processed text of final judgements, admissibility decisions and communicated cases. The latter are published by the Court for pending applications (generally) many years before the case is judged, allowing one to forecast judgements for pending cases. We establish a baseline for this task and illustrate that it is a much harder task than simply classifying judgements.

Adapting monolingual models: data can be scarce when language similarity is high

de Vries*, W., Bartelds*, M., Nissim, M., & Wieling, M. (2021)

Proceedings paper Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 4901 - 4907.

Abstract

For many (minority) languages, the resources needed to train large models are not available. We investigate the performance of zero-shot transfer learning with as little data as possible, and the influence of language similarity in this process. We retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties, while the Transformer layers are independently fine-tuned on a POS-tagging task in the model's source language. By combining the new lexical layers and fine-tuned Transformer layers, we achieve high task performance for both target languages. With high language similarity, 10MB of data appears sufficient to achieve substantial monolingual transfer performance. Monolingual BERT-based models generally achieve higher downstream task performance after retraining the lexical layer than multilingual BERT, even when the target language is included in the multilingual model.

A review of data collection practices using electromagnetic articulography

Rebernik, T., Jacobi, J., Jonkers, R., Noiray, A., & Wieling, M. (2021)

Journal paper Key publication Laboratory Phonology, 12(1), 6. [doi: 10.5334/labphon.237].

Abstract

This paper reviews data collection practices in electromagnetic articulography (EMA) studies, with a focus on sensor placement. It consists of three parts: in the first part, we introduce electromagnetic articulography as a method. In the second part, we focus on existing data collection practices. Our overview is based on a literature review of 905 publications from a large variety of journals and conferences, identified through a systematic keyword search in Google Scholar. The review shows that experimental designs vary greatly, which in turn may limit researchers' ability to compare results across studies. In the third part of this paper we describe an EMA data collection procedure which includes an articulatory-driven strategy for determining where to position sensors on the tongue without causing discomfort to the participant. We also evaluate three approaches for preparing (NDI Wave) EMA sensors reported in the literature with respect to the duration the sensors remain attached to the tongue: 1) attaching out-of-the-box sensors, 2) attaching sensors coated in latex, and 3) attaching sensors coated in latex with an additional latex flap. Results indicate no clear general effect of sensor preparation type on adhesion duration. A subsequent exploratory analysis reveals that sensors with the additional flap tend to adhere for shorter times than the other two types, but that this pattern is inverted for the most posterior tongue sensor.

Prevalence of internalizing disorders, symptoms and traits across age using advanced non-linear methods

van Loo, H. M., Beijers, L., Wieling, M., de Jong, T. R., Schoevers, R. A., & Kendler, K. S. (2021)

Journal paper Psychological Medicine. [doi: 10.1017/S0033291721001148].

Abstract

Background: Most epidemiological studies show a decrease of internalizing disorders at older ages, but it is unclear how the prevalence exactly changes with age, and whether there are different patterns for internalizing symptoms and traits, and for men and women. This study investigates the impact of age and sex on the point prevalence across different mood and anxiety disorders, internalizing symptoms, and neuroticism.

Methods: We used cross-sectional data on 146,315 subjects, aged 18-80 years, from the Lifelines Cohort Study, a Dutch general population sample. Between 2012-2016, five current internalizing disorders - major depression, dysthymia, generalized anxiety disorder, social phobia and panic disorder - were assessed according to DSM-IV criteria. Depressive symptoms, anxiety symptoms, neuroticism, and negative affect were also measured. Generalized additive models were used to identify nonlinear patterns of internalizing disorders, symptoms and traits over lifetime, and to investigate sex differences.

Results: The point prevalence of internalizing disorders generally increased between the ages of 18-30 years, stabilized between 30-50, and decreased after age 50. The patterns of internalizing symptoms and traits were different. Negative affect and neuroticism gradually decreased after age 18. Women reported more internalizing disorders than men, but the relative difference remained stable across age (relative risk ~1.7).

Conclusions: The point prevalence of internalizing disorders was typically highest between age 30-50, but there were differences between the disorders, which could indicate differences in etiology. The relative gap between the sexes remained similar across age, suggesting that changes in sex hormones around the menopause do not significantly influence women's risk of internalizing disorders.

Dialectology for computational linguists

Nerbonne, J., Heeringa, W., Prokić, J., & Wieling, M. (2021)

Book chapter In: Marcos Zampieri, & Preslav Nakov (eds.) Similar Languages, Varieties, and Dialects: A Computational Perspective. Cambridge University Press, pp. 96 - 118.

L2 developmental measures from a dynamic perspective

Verspoor, M., Lowie, W., & Wieling, M. (2021)

Book chapter In: Magali Paquot, & Bert LeBruyn (eds.) Learner Corpora Researchs Meets Second Language Acquisition. Cambridge University Press, pp. 172 - 190. [doi: 10.1017/9781108674577.009].

A novel paradigm to investigate phonetic convergence in interaction

Wieling, M., Tiede, M., Rebernik, T., de Jong, L., Braggaar, A., Bartelds, M., Medvedeva, M., Heisterkamp, P., Freire Offrede, T., Sekeres, H., Pot, A., van der Ploeg, M., Volkers, K., & Mills, G. (2020)

Proceedings paper Proceedings of the 12th International Seminar on Speech Production, pp. 1 - 4.

Characterizing tongue tremor in Parkinson's disease using EMA

Jacobi, J., Rebernik, T., Jonkers, R., Maassen, B., Proctor, M., & Wieling, M. (2020)

Proceedings paper Proceedings of the 12th International Seminar on Speech Production, pp. 80 - 83.

Experimental approaches in electromagnetic articulography

Rebernik, T., Jacobi, J., Jonkers, R., Noiray, A., & Wieling, M. (2020)

Proceedings paper Proceedings of the 12th International Seminar on Speech Production, pp. 5 - 8.

Measuring foreign accent strength using an acoustic distance measure

Bartelds, M., de Vries, W., Richter, C., Liberman, M., & Wieling, M. (2020)

Proceedings paper Proceedings of the 12th International Seminar on Speech Production, pp. 17 - 20.

Vowel production in congenitally blind and sighted Australian English speakers

Janić, M., Rebernik, T., Veenstra, P., Wissink, E., Wieling, M., & Proctor, M. (2020)

Proceedings paper Proceedings of the 12th International Seminar on Speech Production, pp. 91 - 94.

Using ultrasound tongue imaging to improve L2 English pronunciation in Dutch students

de Jong, L., Rebernik, T., Vaziri, S., & Wieling, M. (2020)

Proceedings paper Proceedings of the 12th International Seminar on Speech Production, pp. 60 - 63.

JURI SAYS: an automatic judgement prediction system for the European Court of Human Rights

Medvedeva, M., Xu, X., Wieling, M., & Vols, M. (2020)

Proceedings paper In: Serena Villata, Jakub Harasta and Petr Kremen (eds.) Legal Knowledge and Information Systems, IOS Press, pp. 277 - 280.

Abstract

In this paper we present the web platform JURI SAYS that automatically predicts decisions of the European Court of Human Rights based on communicated cases, which are published by the court early in the proceedings and are often available many years before the final decision is made. Our system therefore predicts future judgements of the court. The platform is available at jurisays.com and shows the predictions compared to the actual decisions of the court. It is automatically updated every month by including the prediction for the new cases. Additionally, the system highlights the sentences and paragraphs that are most important for the prediction (i.e. violation vs. no violation of human rights).

LSDC - A comprehensive dataset for Low Saxon dialect classification

Siewert, J., Scherrer, Y., Wieling, M., & Tiedemann, J. (2020)

Proceedings paper Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pp. 25 - 35.

Abstract

We present a new comprehensive dataset for the unstandardised West-Germanic language Low Saxon covering the last two centuries, the majority of modern dialects and various genres, which will be made openly available in connection with the final version of this paper. Since so far no such comprehensive dataset of contemporary Low Saxon exists, this provides a great contribution to NLP research on this language. We also test the use of this dataset for dialect classification by training a few baseline models comparing statistical and neural approaches. The performance of these models shows that in spite of an imbalance in the amount of data per dialect, enough features can be learned for a relatively high classification accuracy.

The Impact of Alcohol on L1 vs. L2

Offrede, T., Jacobi, J., Rebernik, T., de Jong, L., Keulen, S., Veenstra, P., Noiray, A., & Wieling, M. (2021)

Journal paper Language and Speech, 64(3), 681-692. [doi: 10.1177/0023830920953169].

Abstract

Alcohol intoxication is known to affect many aspects of human behavior and cognition; one of such affected systems is articulation during speech production. Although much research has revealed that alcohol negatively impacts pronunciation in a first language (L1), there is only initial evidence suggesting a potential beneficial effect of inebriation on articulation in a non-native language (L2). The aim of this study was thus to compare the effect of alcohol consumption on pronunciation in an L1 and an L2. Participants who had ingested different amounts of alcohol provided speech samples in their L1 (Dutch) and L2 (English), and native speakers of each language subsequently rated the pronunciation of these samples on their intelligibility (for the L1) and accent nativelikeness (for the L2). These data were analyzed with generalized additive mixed modeling. Participants' blood alcohol concentration indeed negatively affected pronunciation in L1, but it produced no significant effect on the L2 accent ratings. The expected negative impact of alcohol on L1 articulation can be explained by reduction in fine motor control. We present two hypotheses to account for the absence of any effects of intoxication on L2 pronunciation: (i) there may be a reduction in L1 interference on L2 speech due to decreased motor control or (ii) alcohol may produce a differential effect on each of the two linguistic subsystems.

A new acoustic-based pronunciation distance measure

Bartelds, M., Richter, C., Liberman, M., & Wieling, M. (2020)

Journal paper Frontiers in Artificial Intelligence, doi: 10.3389/frai.2020.00039.

Abstract

We present an acoustic distance measure for comparing pronunciations, and apply the measure to assess foreign accent strength in American-English by comparing speech of non-native American-English speakers to a collection of native American-English speakers. An acoustic-only measure is valuable as it does not require the time-consuming and error-prone process of phonetically transcribing speech samples which is necessary for current edit distance-based approaches. We minimize speaker variability in the data set by employing speaker-based cepstral mean and variance normalization, and compute word-based acoustic distances using the dynamic time warping algorithm. Our results indicate a strong correlation of r = -0.71 (p < 0.0001) between the acoustic distances and human judgments of native-likeness provided by more than 1,100 native American-English raters. Therefore, the convenient acoustic measure performs only slightly lower than the state-of-the-art transcription-based performance of r = -0.77. We also report the results of several small experiments which show that the acoustic measure is not only sensitive to segmental differences, but also to intonational differences and durational differences. However, it is not immune to unwanted differences caused by using a different recording device.

Using machine learning to predict decisions of the European Court of Human Rights

Medvedeva, M., Vols, M., & Wieling, M. (2020)

Journal paper Artificial Intelligence and Law, 28, 237-266.

Abstract

When courts started publishing judgements, big data analysis (i.e. largescale statistical analysis of case law and machine learning) within the legal domain became possible. By taking data from the European Court of Human Rights as an example, we investigate how Natural Language Processing tools can be used to analyse texts of the court proceedings in order to automatically predict (future) judicial decisions. With an average accuracy of 75% in predicting the violation of 9 articles of the European Convention on Human Rights our (relatively simple) approach highlights the potential of machine learning approaches in the legal domain. We show, however, that predicting decisions for future cases based on the cases from the past negatively impacts performance (average accuracy range from 58% to 68%). Furthermore, we demonstrate that we can achieve a relatively high classification performance (average accuracy of 65%) when predicting outcomes based only on the surnames of the judges that try the case.

Effectiveness of a dynamic usage based Computer Assisted Language program

Irshad, M., Keijzer, M., Wieling, M., & Verspoor, M. (2019)

Journal paper Dutch Journal of Applied Linguistics, 8(2), 137 - 162.

The influence of alcohol on L1 vs. L2 pronunciation

Wieling, M., Blankevoort, C., Hukker, V., Jacobi, J., de Jong, L., Keulen, S., Medvedeva, M., van der Ploeg, M., Pot, A., Rebernik, T., Veenstra, P., & Noiray, A. (2019)

Proceedings paper Proceedings of ICPhS 2019, pp. 3622 - 3626.

The effect of Levodopa on articulation in Parkinson's disease: A cross-linguistic study

Jacobi, J., Rebernik, T., Jonkers, R., Maassen, B., Proctor, M., & Wieling, M. (2019)

Proceedings paper Proceedings of ICPhS 2019, pp. 1069 - 1073.

Back from the future: nonlinear anticipation in adults and children's speech

Noiray, A., Wieling, M., Abakarova, D., Rubertus, E., & Tiede, M. (2019)

Journal paper Journal of Speech Language and Hearing Research, 62, 3033 - 3054.

Reproducibility in computational linguistics: are we willing to share?

Wieling, M., Rawee, J., & van Noord, G. (2018)

Journal paper Computational Linguistics, 44(4), 641 - 649.

Abstract

This study focuses on an essential precondition for reproducibility in computational linguistics: the willingness of authors to share relevant source code and data. Ten years after Ted Pedersen's influential ``Last Words'' contribution in Computational Linguistics, we investigate to what extent researchers in computational linguistics are willing and able to share their data and code. We surveyed all 395 full papers presented at the 2011 and 2016 ACL Annual Meetings, and identified if links to data and code were provided. If working links were not provided, authors were requested to provide this information. While data was often available, code was shared less often. When working links to code or data were not provided in the paper, authors provided the code in about one third of cases. For a selection of ten papers, we attempted to reproduce the results using the provided data and code. We were able to reproduce the results approximately for half of the papers. For only a single paper we obtained the exact same results. Our findings show that even though the situation appears to have improved comparing 2016 to 2011, empiricism in computational linguistics still largely remains a matter of faith (Pedersen, 2008). Nevertheless, we are somewhat optimistic about the future. Ensuring reproducibility is not only important for the field as a whole, but also for individual researchers: below we show that the median citation count for studies with working links to the source code are higher.

Post-editing effort of a novel with statistical and neural machine translation

Toral, A., Wieling, M., & Way, A. (2018)

Journal paper Frontiers in Digital Humanities, 5. [doi: 10.3389/fdigh.2018.00009].

Abstract

We conduct the first experiment in the literature in which a novel is translated automatically and then post-edited by professional literary translators. Our case study is Warbreaker, a popular fantasy novel originally written in English, which we translate into Catalan. We translated one chapter of the novel (over 3,700 words, 330 sentences) with two data-driven approaches to Machine Translation (MT): phrase-based statistical MT (PBMT) and neural MT (NMT). Both systems are tailored to novels; they are trained on over 100 million words of fiction. In the post-editing experiment, six professional translators with previous experience in literary translation translate subsets of this chapter under three alternating conditions: from scratch (the norm in the novel translation industry), post-editing PBMT, and post-editing NMT. We record all the keystrokes, the time taken to translate each sentence, as well as the number of pauses and their duration. Based on these measurements, and using mixed-effects models, we study post-editing effort across its three commonly studied dimensions: temporal, technical and cognitive. We observe that both MT approaches result in increases in translation productivity: PBMT by 18%, and NMT by 36%. Post-editing also leads to reductions in the number of keystrokes: by 9% with PBMT, and by 23% with NMT. Finally, regarding cognitive effort, post-editing results in fewer (29% and 42% less with PBMT and NMT respectively) but longer pauses (14% and 25%).

Analyzing dynamic phonetic data using generalized additive mixed modeling: a tutorial focusing on articulatory differences between L1 and L2 speakers of English

Wieling, M. (2018)

Journal paper Key publication Journal of Phonetics, 70, 86 - 116.

Abstract

In phonetics, many datasets are encountered which deal with dynamic data collected over time. Examples include diphthongal formant trajectories and articulator trajectories observed using electromagnetic articulography. Traditional approaches for analyzing this type of data generally aggregate data over a certain timespan, or only include measurements at a fixed time point (e.g., formant measurements at the midpoint of a vowel). In this paper, I discuss generalized additive modeling, a non-linear regression method which does not require aggregation or the pre-selection of a fixed time point. Instead, the method is able to identify general patterns over dynamically varying data, while simultaneously accounting for subject and item-related variability. An advantage of this approach is that patterns may be discovered which are hidden when data is aggregated or when a single time point is selected. A corresponding disadvantage is that these analyses are generally more time consuming and complex. This tutorial aims to overcome this disadvantage by providing a hands-on introduction to generalized additive modeling using articulatory trajectories from L1 and L2 speakers of English within the freely available R environment. All data and R code is made available to reproduce the analysis presented in this paper.

Individual differences in very young Chinese children's English vocabulary breadth and semantic depth: internal and external factors

Sun, H., Steinkrauss, R., Wieling, M., & de Bot, K. (2018)

Journal paper International Journal of Bilingual Education and Bilingualism, 21(4), 405 - 425.

Providers' competencies positively affect personal recovery of involuntarily admitted patients with severe mental illness: A prospective observational study

Jas, E., & Wieling, M. (2018)

Journal paper International Journal of Social Psychiatry, 64(2), 145 - 155.

Border effects among Catalan dialects

Wieling, M., Valls, E., Baayen, R. H., & Nerbonne, J. (2018)

Book chapter In: Dirk Speelman, Kris Heylen, & Dirk Geeraerts (eds.) Mixed Effects Regression Models in Linguistics. Berlin: Springer (Quantitative methods in the Humanities and Social Sciences), pp. 71 - 97.

Statistics for aggregate variationist analyses

Nerbonne, J., & Wieling, M. (2018)

Book chapter In: Charles Boberg, John Nerbonne, & Dominic Watt (eds.) Handbook of Dialectology. Boston: Wiley, pp. 400 - 414.

From Semantics to Dialectometry. Festschrift in Honor of John Nerbonne

Wieling, M., Kroon, M., van Noord, G., & Bouma, G. (eds.) (2017)

Book Tributes 32, College Publications. [418 pp].

itsadug: Interpreting Time Series and Autocorrelated Data Using GAMMs

van Rij, J., Wieling, M., Baayen, R. H., & van Rij, H. (2020)

R packageVersion 2.4

Identifying predictive features for textual genre classification: the key role of syntactic features

Cimino, A., Wieling, M., Dell'Orletta, F., Montemagni, S., & Venturi, G. (2017)

Proceedings paper Proceedings of the Fourth Italian Conference on Computational Linguistics, Paper 39.

Articulatory differences between L1 and L2 speakers of English

Wieling, M., Veenstra, P., Adank, P., & Tiede, M. (2017)

Proceedings paper Proceedings of the 11th International Seminar on Speech Production, Paper 1.

Individual alpha peak frequency predicts 10 Hz flicker effects on selective attention

Gulbinaite, R., van Viegen, T., Wieling, M., Cohen, M., & VanRullen, R. (2017)

Journal paper The Journal of Neuroscience, 37(42), 10173 - 10184.

Sharing is caring: the future of shared tasks

Nissim, M., Abzianidze, L., Evang, K., van der Goot, R., Haagsma, H., Plank, B., & Wieling, M. (2017)

Journal paper Computational Linguistics, 43(4), 897 - 904.

The power of character n-grams in native language identification

Kulmizev, A., Blankers, B., Bjerva, J., Nissim, M., van Noord, G., Plank, B., & Wieling, M. (2017)

Proceedings paper Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 382 - 389.

Analysis of acoustic-to-articulatory speech inversion across different accents and languages

Sivaraman, G., Espy-Wilson, C., & Wieling, M. (2017)

Proceedings paper Proceedings of Interspeech 2017, pp. 974 - 978.

Quantitative identification of dialect-specific articulatory settings

Wieling, M., & Tiede, M. (2017)

Journal paper Journal of the Acoustical Society of America, 142(1), 389 - 394.

Eye-tracking the effect of word order in sentence comprehension in aphasia: Evidence from Basque, a free word order ergative language

Arantzeta, M., Bastiaanse, R., Burchert, F., Wieling, M., Zabaleta, M. M., & Laka, I. (2017)

Journal paper Language, Cognition and Neuroscience, 32(10), 1320 - 1343.

Multimodal character viewpoint in quoted dialogue sequences

Stec, K., Huiskes, M., Wieling, M., & Redeker, G. (2017)

Journal paper Glossa: a journal of general linguistics, 2(1), 39.

Exploring the role of extra-linguistic factors in defining dialectal variation patterns through cluster comparison.

Montemagni, S., & Wieling, M. (2017)

Book chapter In: Martijn Wieling, Martin Kroon, Gertjan van Noord, & Gosse Bouma (eds.) From Semantics to Dialectometry. Festschrift in honor of John Nerbonne. Tributes 32, London: College Publications, pp. 241 - 251. [Note: not peer-reviewed].

Variation and change in the use of hesitation markers in Germanic languages

Wieling, M., Grieve, J., Bouma, G., Fruehwald, J., Coleman, J., & Liberman, M. (2016)

Journal paper Key publication Language Dynamics and Change, 6(2), 199 - 234.

Abstract

In this study, we investigate crosslinguistic patterns in the alternation between UM, a hesitation marker consisting of a neutral vowel followed by a final labial nasal, and UH, a hesitation marker consisting of a neutral vowel in an open syllable. Based on a quantitative analysis of a range of spoken and written corpora, we identify clear and consistent patterns of change in the use of these forms in various Germanic languages (English, Dutch, German, Norwegian, Danish, Faroese) and dialects (American English, British English), with the use of UM increasing over time relative to the use of UH. We also find that this pattern of change is generally led by women and more educated speakers. Finally, we propose a series of possible explanations for this surprising change in hesitation marker usage that is currently taking place across Germanic languages.

Investigating dialectal differences using articulography

Wieling, M., Tomaschek, F., Arnold, D., Tiede, M., Bröker, F., Thiele, S., Wood, S. N., & Baayen, R. H. (2016)

Journal paper Key publication Journal of Phonetics, 59, 122 - 143.

Abstract

The present study uses electromagnetic articulography, by which the position of tongue and lips during speech is measured, for the study of dialect variation. By using generalized additive modeling to analyze the articulatory trajectories, we are able to reliably detect aggregate group differences, while simultaneously taking into account the individual variation of dozens of speakers. Our results show that two Dutch dialects show clear differences in their articulatory settings, with generally a more anterior tongue position in the dialect from Ubbergen in the southern half of the Netherlands than in the dialect of Ter Apel in the northern half of the Netherlands. A comparison with formant-based acoustic measurements further reveals that articulography is able to reveal interesting structural articulatory differences between dialects which are not visible when only focusing on the acoustic signal.

Read my points: effect of animation type when speech-reading from EMA data

James, K., & Wieling, M. (2016)

Proceedings paper Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 87 - 92.

ALT Explored: integrating an online dialectometric tool and an online dialect atlas

Wieling, M., Sassolini, E., Cucurullo, N., & Montemagni, S. (2016)

Proceedings paper Proceedings of LREC 2016, pp. 3265 - 3272.

How to analyze linguistic change using mixed models, growth curve analysis and generalized additive modeling

Winter, B., & Wieling, M. (2016)

Journal paper Journal of Language Evolution, 1(1), 7 - 18.

Infrequent forms: noise or not?

Wieling, M., & Montemagni, S. (2016)

Book chapter In: Marie-Hélène Côté, Remco Knooihuizen, & John Nerbonne (eds.) The Future of Dialects. Berlin: Language Science Press (Language Variation), pp. 215 - 224.

Tracking linguistic features underlying lexical variation patterns: a case study on Tuscan dialects

Wieling, M., & Montemagni, S. (2016)

Book chapter In: Marie-Hélène Côté, Remco Knooihuizen, & John Nerbonne (eds.) The Future of Dialects. Berlin: Language Science Press (Language Variation), pp. 117 - 134.

Automatically identifying characteristic features of non-native English accents

Bloem, J., Wieling, M., & Nerbonne, J. (2016)

Book chapter In: Marie-Hélène Côté, Remco Knooihuizen, & John Nerbonne (eds.) The Future of Dialects. Berlin: Language Science Press (Language Variation), pp. 155 - 172.

Age effects in L2 grammar processing as revealed by ERPs and how (not) to study them

Meulman, N., Wieling, M., Sprenger, S., Stowe, L., & Schmid, M. (2015)

Journal paper Key publication PLOS ONE, 10(12), e0143328.

Abstract

In this study we investigate the effect of age of acquisition (AoA) on grammatical processing in second language learners as measured by event-related brain potentials (ERPs). We compare a traditional analysis involving the calculation of averages across a certain time window of the ERP waveform, analyzed with categorical groups (early vs. late), with a generalized additive modeling analysis, which allows us to take into account the full range of variability in both AoA and time. Sixty-six Slavic advanced learners of German listened to German sentences with correct and incorrect use of non-finite verbs and grammatical gender agreement. We show that the ERP signal depends on the AoA of the learner, as well as on the regularity of the structure under investigation. For gender agreement, a gradual change in processing strategies can be shown that varies by AoA, with younger learners showing a P600 and older learners showing a posterior negativity. For verb agreement, all learners show a P600 effect, irrespective of AoA. Based on their behavioral responses in an offline grammaticality judgment task, we argue that the late learners resort to computationally less efficient processing strategies when confronted with (lexically determined) syntactic constructions different from the L1. In addition, this study highlights the insights the explicit focus on the time course of the ERP signal in our analysis framework can offer compared to the traditional analysis.

The role of frequency in the retrieval of nouns and verbs in aphasia

Bastiaanse, R., Wieling, M., & Wolthuis, N. (2015)

Journal paper Aphasiology, 30(11), 1221 - 1239.

Determinants of English accents

Wieling, M., Bloem, J., Baayen, R. H., & Nerbonne, J. (2015)

Proceedings paper Proceedings of the 6th Conference on Quantitative Investigations in Theoretical Linguistics, doi: 10.15496/publikation-8628.

Comparing L1 and L2 speakers using articulography

Wieling, M., Veenstra, P., Adank, P., Weber, A., & Tiede, M. (2015)

Proceedings paper Proceedings of ICPhS 2015, Paper 551.

Investigating dialectal differences using articulography

Wieling, M., Tomaschek, F., Arnold, D., Tiede, M., & Baayen, R. H. (2015)

Proceedings paper Proceedings of ICPhS 2015, Paper 231.

Advances in dialectometry

Wieling, M., & Nerbonne, J. (2015)

Journal paper Key publication Annual Review of Linguistics, 1, 243-264.

Abstract

Dialectometry applies computational and statistical analyses within dialectology, making work more easily replicable and understandable. This survey article first reviews the field briefly in order to focus on developments in the past five years. Dialectometry no longer focuses exclusively on aggregate analyses, but rather deploys various techniques to identify representative and distinctive features with respect to areal classifications. Analyses proceeding explicitly from geostatistical techniques have just begun. The exclusive focus on geography as explanation for variation has given way to analyses combining geographical, linguistic, and social factors underlying language variation. Dialectometry has likewise ventured into diachronic studies and has also contributed theoretically to comparative dialectology and the study of dialect diffusion. Although the bulk of research involves lexis and phonology, morphosyntax is receiving increasing attention. Finally, new data sources and new (online) analytical software are expanding dialectometry's remit and its accessibility.

The differential effects of direct and indirect speech on discourse comprehension in Dutch and English listeners with and without aphasia

Groenewold, R., Bastiaanse, R., Nickels, L., Wieling, M., & Huiskes, M. (2015)

Journal paper Aphasiology, 29(6), 685 - 704.

Social, geographical, and lexical influence on Dutch dialect pronunciations

Ko, V., Wieling, M., Wit, E., Nerbonne, J., & Krijnen, W. (2014)

Journal paper Computational Linguistics in the Netherlands Journal, 4(2), 253 - 269.

Lexical differences between Tuscan dialects and standard Italian: accounting for geographic and socio-demographic variation using generalized additive mixed modeling

Wieling, M., Montemagni, S., Nerbonne, J., & Baayen, R. H. (2014)

Journal paper Key publication Language, 90(3), 669 - 692.

Abstract

This study uses a generalized additive mixed-effects regression model to predict lexical differences in Tuscan dialects with respect to standard Italian. We used lexical information for 170 concepts used by 2,060 speakers in 213 locations in Tuscany. In our model, geographical position was found to be an important predictor, with locations more distant from Florence having lexical forms more likely to differ from standard Italian. In addition, the geographical pattern varied significantly for low- versus high-frequency concepts and older versus younger speakers. Younger speakers generally used variants more likely to match the standard language. Several other factors emerged as significant. Male speakers as well as farmers were more likely to use lexical forms different from standard Italian. In contrast, higher-educated speakers used lexical forms more likely to match the standard. The model also indicates that lexical variants used in smaller communities are more likely to differ from standard Italian. The impact of community size, however, varied from concept to concept. For a majority of concepts, lexical variants used in smaller communities are more likely to differ from the standard Italian form. For a minority of concepts, however, lexical variants used in larger communities are more likely to differ from standard Italian. Similarly, the effect of the other community- and speaker-related predictors varied per concept. These results clearly show that the model succeeds in teasing apart different forces influencing the dialect landscape and helps us to shed light on the complex interaction between the standard Italian language and the Tuscan dialectal varieties. In addition, this study illustrates the potential of generalized additive mixed-effects regression modeling applied to dialect data.

Measuring foreign accent strength in English: validating Levenshtein distance as a measure

Wieling, M., Bloem, J., Mignella, K., Timmermeister, M., & Nerbonne, J. (2014)

Journal paper Language Dynamics and Change, 4(2), 253 - 269.

Assessing the readability of sentences: which corpora and features?

Dell'Orletta, F., Wieling, M., Cimino, A., Venturi, G., & Montemagni, S. (2014)

Proceedings paper Proceedings of the 9th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 163 - 173.

Vowel articulation affected by word frequency

Tomaschek, F., Tucker, B. V., Wieling, M., & Baayen, R. H. (2014)

Proceedings paper Proceedings of the 10th International Seminar on Speech Production, pp. 429 - 432.

The effects of direct and indirect speech on discourse comprehension in Dutch listeners with and without aphasia

Groenewold, R., Bastiaanse, R., Nickels, L., Wieling, M., & Huiskes, M. (2014)

Journal paper Aphasiology, 28(7), 862 - 884.

A cognitively grounded measure of pronunciation distance

Wieling, M., Nerbonne, J., Bloem, J., Gooskens, C., Heeringa, W., & Baayen, R. H. (2014)

Journal paper PLOS ONE, 9(1), e75734.

Analyzing the BBC Voices data: contemporary English dialect areas and their characteristic lexical variants

Wieling, M., Upton, C., & Thompson, A. (2014)

Journal paper Literary and Linguistic Computing, 29(1), 107 - 117.

Word frequency, vowel length and vowel quality in speech production: an EMA study of the importance of experience

Tomaschek, F., Wieling, M., Arnold, D., & Baayen, R. H. (2013)

Proceedings paper Proceedings of Interspeech 2013, pp. 1302 - 1306.

Measuring socially motivated pronunciation differences

Nerbonne, J., van Ommen, S., Gooskens, C., & Wieling, M. (2013)

Book chapter In: Lars Borin, & Anju Saxena (eds.) Approaches to Measuring Linguistic Differences. Berlin: Mouton De Gruyter, pp. 107 - 140.

Physical predictors of cognitive performance in healthy older adults: a cross-sectional analysis

Blankevoort, G., Scherder, E., Wieling, M., Hortobágyi, T., Brouwer, W., Geuze, R., & van Heuvelen, M. (2013)

Journal paper PLOS ONE, 8(7), e70799.

Voices dialectometry at the University of Groningen

Wieling, M. (2013)

Book chapter In: Clive Upton, & Bethan Davies (eds.) Analysing 21st-century British English: Conceptual and Methodological Aspects of the BBC 'Voices' Project. London: Routledge, pp. 208 - 218.

Analyzing phonetic variation in the traditional English dialects: simultaneously clustering dialects and phonetic features

Wieling, M., Shackleton Jr., R. G., & Nerbonne, J. (2013)

Journal paper Literary and Linguistic Computing, 28(1), 31 - 41.

Synchronic patterns of Tuscan phonetic variation and diachronic change: evidence from a dialectometric study

Montemagni, S., Wieling, M., de Jonge, B., & Nerbonne, J. (2013)

Journal paper Literary and Linguistic Computing, 28(1), 157 - 172.

Linguistic advergence and divergence in north-western Catalan: a dialectometric investigation of dialect leveling and border effects

Valls, E., Wieling, M., & Nerbonne, J. (2013)

Journal paper Literary and Linguistic Computing, 28(1), 119 - 146.

Neuter is not common in Dutch: eye movements reveal asymmetrical gender processing

Loerts, H., Wieling, M., & Schmid, M. (2013)

Journal paper Journal of Psycholinguistic Research, 42(6), 551 - 570.

Applying the Levenshtein distance to Catalan dialects: a brief comparison of two dialectometric approaches

Valls, E., Nerbonne, J., Prokić, J., Wieling, M., Clua, E., & Lloret, M. (2012)

Journal paper Verba. Anuario Galego de Filoloxía, 39, 35 - 61.

A Quantitative Approach to Social and Geographical Dialect Variation

Wieling, M. (2012)

DissertationUniversity of Groningen. 178 pp.

Inducing a measure of phonetic similarity from pronunciation variation

Wieling, M., Margaretha, E., & Nerbonne, J. (2012)

Journal paper Journal of Phonetics, 40(2), 307 - 314.

Patterns of language variation and underlying linguistic features: a new dialectometric approach

Montemagni, S., Wieling, M., de Jonge, B., & Nerbonne, J. (2012)

Book chapter In: Patricia Bianchi, Nicola De Blasi, Chiara De Caprio, & Francesco Montuori (eds.) La Variazione Nell'italiano e Nella Sua Storia. Varietà e Varianti Linguistiche e Testuali. Atti dell'XI Congresso SILFI. Firenze: Franco Cesati Editore (Società Internazionale di Linguistica e Filologia Italiana), pp. 879 - 89. [Note: not peer-reviewed].

Inducing phonetic distances from dialect variation

Wieling, M., Margaretha, E., & Nerbonne, J. (2011)

Journal paper Computational Linguistics in the Netherlands Journal, 1, 109 - 118.

Quantitative social dialectology: explaining linguistic variation geographically and socially

Wieling, M., Nerbonne, J., & Baayen, R. H. (2011)

Journal paper PLOS ONE, 6(9), e23613.

Measuring linguistic variation commensurably

Wieling, M., & Nerbonne, J. (2011)

Journal paper Dialectologia, Special Issue II: Production & Perception and Attitude, 141 - 162.

Bipartite spectral graph partitioning for clustering dialect varieties and detecting their linguistic features

Wieling, M., & Nerbonne, J. (2011)

Journal paper Computer Speech and Language, 25(3), 700 - 715.

Some further dialectometrical steps

Nerbonne, J., Prokić, J., Wieling, M., & Gooskens, C. (2010)

Book chapter In: Gotzon Aurrekoexea, & José Luis Ormaetxea (eds.) Tools for Linguistic Variation. Bilbao: Supplements of the Anuario de Filologia Vasca "Julio de Urquijo", pp. 41 - 56.

Hierarchical spectral partitioning of bipartite graphs to cluster dialects and identify distinguishing features

Wieling, M., & Nerbonne, J. (2010)

Proceedings paper Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing, pp. 33 - 41.

The impact of online video lecture recordings and automated feedback on student performance

Wieling, M., & Hofman, A. (2010)

Journal paper Computers and Education, 54(4), 992 - 998.

The face reveals athletic flair: better National Football League quarterbacks are better looking

Williams, K., Park, J., & Wieling, M. (2010)

Journal paper Personality and Individual Differences, 48, 112 - 116.

A quantitative examination of variation in Dutch Low Saxon morphology

Heeringa, W., Wieling, M., van den Berg, B., & Nerbonne, J. (2009)

Book chapter In: Alexandra Lenz, Charlotte Gooskens, & Siemon Reker (eds.) Low Saxon Dialects across Borders - Niedersächsische Dialekte über Grenzen hinweg. Stuttgart: Franz Steiner Verlag (ZDL-Beiheft 138), pp. 195 - 216.

Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology

Wieling, M., & Nerbonne, J. (2009)

Proceedings paper Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, pp. 14 - 22.

Evaluating the pairwise string alignment of pronunciations

Wieling, M., Prokić, J., & Nerbonne, J. (2009)

Proceedings paper Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education, pp. 26 - 34.

Multiple sequence alignments in linguistics

Prokić, J., Wieling, M., & Nerbonne, J. (2009)

Proceedings paper Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education, pp. 18 - 25.

Sex-specific relationship between digit ratio (2D:4D) and romantic jealousy

Park, J., Wieling, M., Buunk, B., & Massar, K. (2008)

Journal paper Personality and Individual Differences, 44, 1039 - 1045.

Dialect pronunciation comparison and spoken word recognition

Wieling, M., & Nerbonne, J. (2007)

Proceedings paper Proceedings of the RANLP 2007 Workshop on Computational Phonology, pp. 71 - 78.

Inducing Sound Segment Differences using Pair Hidden Markov Models

Wieling, M., Leinonen, T., & Nerbonne, J. (2007)

Proceedings paper Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology, pp. 48 - 56.

An aggregate analysis of pronunciation in the Goeman-Taeldeman-van Reenen-Project data

Wieling, M., Heeringa, W., & Nerbonne, J. (2007)

Journal paper Taal & Tongval, 59, 84 - 116.

Does the face reveal athletic flair? Positions in team sports and facial attractiveness

Park, J., Buunk, B., & Wieling, M. (2007)

Journal paper Personality and Individual Differences, 43, 1960 - 1965.

Comparison of Dutch Dialects

Wieling, M. (2007)

Master thesisUniversity of Groningen. 87 pp.

Parsing partially bracketed input

Wieling, M., Nederhof, M., & van Noord, G. (2006)

Proceedings paper Proceedings of the Sixteenth Computational Linguistics in the Netherlands, pp. 1 - 16.

Presentations

Statistics workshop for linguists

I frequently teach (invited) statistics courses for linguists focusing on generalized additive modeling.This technique, which is also able to take into account subject- and item-related variability (i.e. similar to mixed-effects regression) is important as it allows to model complex non-linear relationships between predictors and the dependent variable (e.g., time-series data such as EEG data). I've been invited to teach these courses at (e.g.,) Cambridge, Montréal and Toulouse. Slides (which are regularly updated) of these courses can be found here. If you are interested in this type of statistics course (generally ranging from two to five days), you are welcome to contact me. Note that I do ask a fee for teaching these courses.

Upcoming presentations

No upcoming presentations scheduled at present

Past invited presentations

How popularization and quantification may benefit language variation research(ers) (invited presentation). 12th International Conference on Language Variation in Europe (ICLaVE), Vienna, Austria, July 8 - 11, 2024.
Public engagement in speech research: Some examples and (potential) benefits (invited presentation). Summer School on Coping with the complexity in speech production and perception, Chorin, Germany, July 8, 2022.
Approaches to investigating language variation and supporting minority languages (invited presentation). University of Helsinki, (online), April 15, 2021.
GAM consultation (invited course). Walter Reed National Military Medical Center (Stephanie Kuchinsky), (online), March 21 - June 2, 2021.
GAM Workshop (invited course). Universität Zürich, (online), December 7, 2020.
Generalized additive modeling (invited course). University of Salzburg, Salzburg, Austria, November 21 - 22, 2019.
The beauty of dialect variation (invited presentation). Festival della Scienza, Genua, Italy, October 27, 2019.
Investigating pronunciation variation from an aggregate perspective: from dialectometry to articulography (keynote) (invited presentation). Conference on Experimental Approaches to Perception and Production of Language Variation (ExAPP), Münster, Germany, September 26 - 28, 2019.
Dialectometry: modeling the influence of geography in dialect variation (invited presentation). University of Oldenburg, Oldenburg, Germany, July 4, 2019.
Dialectometry (invited course). University of Malta, Malta, June 27 - 28, 2019.
Generalized aditive modeling to analyze time series (invited presentation). Vrije Universiteit Brussel, Brussel, Belgium, October 2, 2018.
Dialectometry (invited course). Linguistic Convergence Laboratory, Higher School of Economics, Moscow, Russia, September 10 - 14, 2018.
Generalized additive modeling for phonetics (invited course). Humboldt-Universität zu Berlin, Institut für deutsche Sprache und Linguistik, Berlin, Germany, August 30 - 31, 2018.
(Mixed-effects) regression and generalized additive modeling (invited course). Universität Stuttgart, Stuttgart, Germany, May 3 - 4, 2018.
Statistics for Linguistics (invited course). LOT Winter School, Amsterdam, January 8 - 12, 2018.
Generalized additive modeling (invited presentation). Institut für Phonetik und Sprachverarbeitung, Ludwig-Maximilians-Universität München, Munich, Germany, December 20, 2017.
Generalized additive modeling (invited presentation). Winter School on Multilingualism, Groningen, November 9 - 10, 2017.
Quantitative analyses of dialect variation (invited presentation). Meertens Instituut, Amsterdam, October 24, 2017.
Discovering articulatory patterns using generalized additive modeling (invited presentation). Dynamic Modeling III workshop, Cologne, Germany, July 18 - 19, 2017.
Generalized additive modeling as a useful tool for dialectometry (invited presentation). Sociolectometry Panel, ICLaVE, Málaga, Spain, June 6 - 9, 2017.
Statistics for linguists in R (invited course). Istituto di Linguistica Computazionale, Consiglio Nazionale delle Ricerche, Pisa, Italy, May 2 - 5, 2017.
Mixed-effects regression and generalized additive modeling (invited course). Centre de Recherche Cerveau & Cognition, Toulouse, France, March 30 - 31, 2017.
Using generalized additive modeling for analyzing articulography data (and other time series data) (invited presentation). 5th International Winterschool "Speech Perception and Production", Chorin, Germany, January 9 - 13, 2017.
Modeling the influence of geography in dialect variation (invited presentation). FLAMES-colloquium "Topics for Spatial Statistics", Leuven, Belgium, November 28, 2016.
Investigating language variation quantitatively (invited presentation). EuroScience Open Forum 2016, Manchester, United Kingdom, July 23 - 27, 2016.
Pronunciation variation on the tip of the tongue (invited presentation). BCN Winter meeting, Groningen, February 11, 2016.
Investigating language variation using articulography (invited presentation). Netherlands Cancer Institute, Amsterdam, October 13, 2015.
Generalized additive modeling (invited course). McGill, Neurocognition of Language Lab, Montréal, Canada, March 20 - 22, 2015.
Mixed-effects regression and generalized additive modeling for linguists (invited course). Universität Tübingen, Englisches Seminar, Tübingen, Germany, February 9 - 10, 2015.
Mixed-effects regression and generalized additive modeling for linguists (invited course). Universität Oldenburg, Institut für Niederlandistik, Oldenburg, Germany, February 2 - 3, 2015.
Mixed-effects regression and generalized additive modeling for linguists (invited course). University of Cambridge, Department of Theoretical and Applied Linguistics, Cambridge, United Kingdom, January 13 - 14, 2015.
Methods in Dialectometry (invited presentation). Ghent University, Department of Linguistics, Ghent, Belgum, October 16, 2014.
Comparing pronunciations on the basis of transcriptions and articulography (invited presentation). Radboud University, Centre for Language Studies, Nijmegen, October 15, 2014.
What the tongue can tell us about dialects? (invited presentation). Frontiers of Language Variation Workshop, Methods in Dialectology XV, Groningen, August 11 - 14, 2014.
Generalized additive modeling for linguists (invited course). University Leiden, Center of Linguistics, Leiden, July 10 - 11, 2014.
Mixed-effects regression and generalized additive modeling for linguists (invited course). Universität Oldenburg, Institut für Niederlandistik, Oldenburg, Germany, June 30 - July 1, 2014.
Introduction to mixed-effects regression for (psycho)linguists (invited presentation). IMS Stuttgart, Stuttgart, Germany, December 16, 2013.
Integrating traditional dialectology and sociolinguistics: generalized additive modeling (invited presentation). New Ways of Analyzing Variation (NWAV) 42, Pittsburgh, USA, October 17 - 20, 2013 [with R. Harald Baayen].
Inducing and using phonetic similarity (invited presentation). Language Diversity Conference, Groningen, July 20, 2013 [with John Nerbonne].
A sociolinguistic analysis of linguistically sensitive dialectal word pronunciation distances (invited presentation). Workshop on Computational Approaches to the Study of Dialectal and Typological Variation, ESSLLI 2012, Opole, Poland, August 6 - 10, 2012.
Mixed-effects regression analyses of eyetracking data (invited presentation). Eye-tracking and EEG workshop, Groningen, July 7, 2011.

Past presentations

SPRAAKLAB: a mobile laboratory benefitting dialect data collection (presentation). Methods in Dialectology XVIII, Melbourne, Australia, July 1 - 4, 2024.
SPRAAKLAB: a mobile laboratory for collecting speech production data (presentation). 20th International Congress of Phonetic Sciences, Prague, Czech Republic, August 7 - 11, 2023.
Van Old noar Jong: an app en teaching programme to teach children about the regional language (presentation). Oldenburg-Groningen scientific exchange, Oldenburg, June 16, 2022.
A novel paradigm to investigate phonetic convergence in interaction (poster presentation). 12th International Seminar on Speech Production, online, December 18, 2020.
The influence of alcohol on L1 vs. L2 pronunciation (poster presentation). 19th International Congress of Phonetic Sciences, Melbourne, Australia, August 5 - 9, 2019.
The beauty of language variation (presentation). AGM Global Young Academy, Halle, Germany, May 2, 2019.
Generalized additive modeling (guest lecture). University of Auckland, Auckland, New Zealand, December 12, 2018.
Tutorial on generalized additive modelling to analyze time-series data (course). 17th Speech Science and Technology Conference, Coogee, Australia, December 4, 2018.
Generalized additive modeling (guest lecture). Centre for Language Sciences, Macquarie University, Sydney, Australia, December 3, 2018.
Generalized additive modeling to analyze articulatory data (guest lecture). Department of Information and Communication Engineering, University of Tokyo, Tokyo, Japan, November 5, 2018.
Introductory course on Advanced Regression Methods for Linguistics (course). ESSLLI 2018, Sofia, Bulgaria, August 6 - 10, 2018.
Articulatory differences between L1 and L2 speakers of English (presentation). Conference on Multilingualism, Groningen, November 6 - 9, 2017.
Articulatory differences between L1 and L2 speakers of English (presentation). 11th International Seminar on Speech Production, Tianjin, China, October 16 - 19, 2017.
Dialect-specific articulatory settings (presentation). Methods in Dialectology XVI, Tokyo, Japan, August 7 - 11, 2017.
Articulatory differences between L1 and L2 speakers of English (poster presentation). 7th International Conference on Speech Motor Control, Groningen, July 5 - 8, 2017.
Determinants of English Accents (presentation). 6th Conference on Quantitative Investigations in Theoretical Linguistics, Tübingen, Germany, November 4 - 6, 2015.
Taalvariatie: patronen in uitspraak en tongbewegingen (guest lecture). De Jonge Akademie meeting, Groningen, September 21, 2015.
Investigating dialectal differences using articulography (poster presentation). 18th International Congress of Phonetic Sciences, Glasgow, United Kingdom, August 10 - 14, 2015.
Analyzing time series data using generalized additive modeling (course). LSA Institute, Chicago, USA, July 6 - 17, 2015 [with Jacolien van Rij].
Measuring foreign accent strength (guest lecture). University of Cambridge, Department of Theoretical and Applied Linguistics, Cambridge, United Kingdom, January 13, 2015.
Patterns of language variation (guest lecture). Digital Humanities Day 2014, Groningen, October 31, 2014.
Gabmap tutorial (guest lecture). Ghent University, Department of Linguistics, Ghent, Belgium, October 15, 2014.
Comparing dialect and accented pronunciations on the basis of transcriptions and articulography (guest lecture). McGill, Center of Research on Brain, Language and Music, Montréal, Canada, September 18, 2014.
Analyzing linguistic time-series data with generalized additive modeling in R: a case study using ERP data (guest lecture). McGill, Motor Control Laboratory, Montréal, Canada, September 17, 2014.
Comparing dialect and accented pronunciations on the basis of transcriptions and articulography (guest lecture). Linguistic Data Consortium, Philadelphia, USA, September 12, 2014.
Generalized additive models in R (guest lecture). Haskins Laboratories, New Haven, USA, September 9, 2014.
Validating and using the PMI-based Levenshtein distance as a measure of foreign accent strength (poster presentation). Methods in Dialectology XV, Groningen, August 11 - 15, 2014 [best poster award].
Accessing large dialect databases on RADAR (presentation). Methods in Dialectology XV, Groningen, August 11 - 14, 2014 [with Clive Upton].
Large-scale analysis of articulatory trajectories using generalized additive modeling (poster presentation). 10th International Seminar on Speech Production, Cologne, Germany, May 5 - 8, 2014.
Comparing pronunciations on the basis of transcriptions and articulography (guest lecture). University College London, Department of Speech Hearing and Phonetic Sciences, London, United Kingdom, April 8, 2014.
Articulography for studying language variation (guest lecture). Haskins Laboratories, New Haven, USA, October 24, 2013.
Articulography for studying language variation (guest lecture). City University New York, Speech Production, Acoustics and Perception Laboratory, New York, USA, October 16, 2013.
A cognitively grounded measure of pronunciation distance (presentation). Donders Workshop, Radboud University, Nijmegen, September 23, 2013 - July 27, 2024.
A cognitively grounded automatic measure of pronunciation distance (presentation). 5th Conference on Quantitative Investigations in Theoretical Linguistics, Leuven, Belgium, September 12 - 14, 2013.
Hierarchical spectral partitioning of bipartite graphs to cluster dialects and identify distinguishing features (guest lecture). Istituto di Linguistica Computazionale, Consiglio Nazionale delle Ricerche, Pisa, Italy, July 17, 2013.
Comparing pronunciation on the basis of transcriptions and articulography (guest lecture). Università di Pisa, Dipartimento di Filologia, Letteratura e Linguistica, Pisa, Italy, July 16, 2013.
Mixed-effects regression and generalized additive modeling in linguistics (course). LOT Summer School, Groningen, June 24 - 28, 2013 [with Jacolien van Rij].
Co-clustering varieties and linguistic features using bipartite spectral graph partitioning (poster presentation). Forum Sprachvariation, Erlangen, Germany, October 14 - 15, 2012.
Lexical differences between Tuscan dialects and standard Italian: a sociolinguistic analysis using generalized additive mixed modeling (presentation). Leuven Statistics Days, Leuven, Belgium, June 7 - 8, 2012.
A quantitative investigation of English accents (poster presentation). BCN New Year's Meeting, Groningen, February 2, 2012 [best poster award].
Segment distances and foreign accents (guest lecture). LOT Winter School, Tilburg, January 12, 2012.
Gabmap tutorial (guest lecture). Göteborgs Universitet, Gothenburg, Sweden, October 26, 2011 [with John Nerbonne and Therese Leinonen].
Inducing a measure of phonetic similarity from dialect variation (presentation). Workshop on Comparing Approaches to Measuring Linguistic Differences, Gothenburg, Sweden, October 24 - 25, 2011.
A sociolinguistic analysis of aggregate dialect distances (presentation). Methods in Dialectology XIV, London (ON), Canada, August 1 - 5, 2011.
A sociolinguistic analysis of dialect distances (poster presentation). BCN New Year's Meeting, Groningen, February 17, 2011.
On the sublinear distribution of linguistic variation with respect to geography (presentation). Forum Sprachvariation, Erlangen, Germany, October 14 - 15, 2010.
Dialectometry and hierarchical bipartite spectral graph partitioning (guest lecture). Makerere University, Department of Machine Learning, Kampala, Uganda, September 23, 2010.
Text Manipulation and Hidden Markov Models (course). Makerere University, College of Computing and Information Sciences, Kampala, Uganda, September 20 - 22, 2010.
Modelling geography's influence on language variation: calibrating linguistic measurements (presentation). Cultural Evolution in Spatially Structured Populations Conference, London, United Kingdom, September 13 - 15, 2010.
Dialectometry in Groningen (guest lecture). University of Alberta, Department of Linguistics, Edmonton, Canada, August 11, 2010.
Hierarchical bipartite spectral graph partitioning to cluster dialect varieties and determine their most important linguistic features (presentation). TextGraphs-5 ACL Workshop, Uppsala, Sweden, July 16, 2010.
A graph-theoretic analysis of English dialects (presentation). Bloomsday 2010: Echoes of Albion Workshop, Groningen, June 16, 2010.
On the sublinear distribution of linguistic variation with respect to geography (presentation). 31st TABU Dag, Groningen, June 3 - 4, 2010.
Co-clustering varieties and linguistic features using bipartite spectral graph partitioning (presentation). Computational Linguistics in The Netherlands 20, Utrecht, February 5, 2010 - July 27, 2024.
Understanding Séguy's law: on the sublinear distribution of linguistic variation with respect to geography (presentation). Past and Present Processes of Dialect Convergence: Data and Methods, Amsterdam, February 4, 2010 [with John Nerbonne].
Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology (presentation). TextGraphs-4 ACL Workshop, Singapore, August 7, 2009.
Tools for detecting geographical and structural affinities (presentation). Workshop on Small Tools for Cross-Linguistic Research, Utrecht, June 15 - 16, 2009.
Evaluating the pairwise string alignment of pronunciation (presentation). EACL Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education, Athens, Greece, March 30, 2009.
Evaluation of pairwise string alignment methods (presentation). Computational Linguistics in the Netherlands 19, Groningen, January 22, 2009.
Dialect pronunciation comparison using the Goeman-Taeldeman-Van Reenen-project data (presentation). Computational Linguistics in the Netherlands 17, Leuven, Belgium, January 12, 2007.
Parsing partially bracketed input (presentation). Computational Linguistics in the Netherlands 16, Amsterdam, December 16, 2005.
Gabor filtering augmented with surround inhibition for improved contour detection by texture suppression (poster presentation). 27th Annual Meeting of the European Conference on Visual Perception, Budapest, Hungary, August 22 - 26, 2004.

Outreach

Streektaalstrijd

Our board game Streektaalstrijd has launched and after selling out the first edition in only two weeks, the new edition can be purchased at the online shop of the University of Groningen. More information about this board game (in Dutch) can be found via the board game's website www.streektaalstrijd.nl. The launch was covered by various news media, including the Dutch national newspaper De Volkskrant (see News coverage, below).

Van Old noar Jong

Our Gronings app 'Van Old noar Jong' has launched and can be freely downloaded for Apple and Android. The app is integrated in a ten-week lesson series about the regional language Gronings for primary schools. Interested schools can order all material (including a copy of De Gruvvalo) for free via the website of the University of Groningen Scholierenacademie. The launch was covered by various news media, including the Dutch national Jeugdjournaal (see News coverage, below).

SPRAAKLAB!

As of February 2021, we have a mobile laboratory available for our outreach initiatives and conducting research in the field: SPRAAKLAB. See this paper for all specifications. The launch was covered by various news media, including RTV Noord (see News coverage, below). With SPRAAKLAB, we often visit public engagement activities and festivals. For example, we have participated in Noorderzon (2021-2024), Zwarte Cross (2022-2024), and Lowlands (2024).

Lowlands Science 2019

In August 2019, Dr. Gregory Mills and I investigated how language evolves and changes using a interactive game between two players. It was an incredible experience and we were able to collect speech production data for about 75 pairs of speakers during only three days! Our participation was made possible through financial contributions of the University of Groningen, the Young Academy Groningen and the Groningen University Fund. Below you can see an impression of this event. The event was covered by various news media, including NPO Radio 1 (see News coverage, below).

Lowlands Science 2018

In August 2018, we investigated the influence of alcohol on native and non-native speech using ultrasound tongue imaging. It was an incredible experience and we were able to collect speech production data for about 150 speakers during only three days! Our participation was made possible through financial contributions of the University of Groningen, the Young Academy Groningen and the Groningen University Fund. Below you can see an impression of this event. The event was covered by various national news media, including NPO Radio 1 (see News coverage, below).

Seeing your tongue move

We enjoy demonstrating how we collect data on tongue and lip movement during speech. If you'd like a demonstration at your school or event, please contact me. Below you can see an impression of my team at the Experiment Event for children organized by De Jonge Akademie, the NS, the Spoorwegmuseum, and Quest Junior.

Comic about my research

Through a project grant of De Jonge Akademie, I was able to create a comic about my research (designed and drawn by Lorenzo Milito and Ruggero Montalto). You can download it here for free. Please contact me if you would like to receive a printed copy of the Dutch version of the comic (as long as supplies last).

Presentations for the general public

SPRAAKLAB: een uniek mobiel laboratorium voor taalonderzoek en zichtbaarheid van de wetenschap (invited), Opening House of Connections, Groningen, June 24, 2023.
Streektaalonderwijs en -onderzoek: Van Old noa/aor Jong (invited), Studiedag Levende Talen Nedersaksisch, Mañana Mañana festival, Laren, June 9, 2023.
Van Old noar Jong: app en lesprogramma voor het Gronings (invited), GRN Festival, Hoogezand, November 19, 2022.
Hoe talen geleidelijk kunnen veranderen, maar snel kunnen verdwijnen (invited), Zwarte Cross, Lichtenvoorde, July 15, 2022.
Onderzoek doen naar dialectvariatie (invited), Streekhistorisch Centrum Stadskanaal, Stadskanaal, May 15, 2022.
Panel member: Kunnen podcasts dialect in stand houden? (invited), Podcastfestival Groningen, Groningen, September 18, 2020.
Van Old noar Jong (invited), Dag van de Grunneger Toal, Groningen, March 16, 2019.
Taalvariatie en taalverandering in Nederland (invited), CSG Guido de Brès, Amersfoort, March 10, 2017.
My career as a scientist (invited), KEI to your future event, Groningen, August 16, 2016.
Taalvariatie en taalverandering in Nederland (invited), Theatre festival Oerol, Terschelling, June 13, 2016.
Taalvariatie op de tong, NWO Bessensap, Amsterdam, June 10, 2016.
Het verstrijken van de taal, De Jonge Akademie interscience symposium "Tijd", Amsterdam, April 4, 2016 [with John Nerbonne].
Van dialecten tot accenten: variatie en verandering in uitspraak en tongbewegingen, Nacht van de Kunst en Wetenschap (pop-up lecture), Groningen, June 6, 2015.
Talkshow guest (invited), Stand van Stad, Groningen, September 28, 2014.

Popular science publications

Raoul Buurke, Hedwig Sekeres, Lourens Visser and Martijn Wieling (2024). Streektaalstrijd: een strategisch kennisspel over taal en dialect, Dongen: Wink Games, Tweede druk. Website: https://www.streektaalstrijd.nl.
Lespakket Van Old naor Jong: Achterhooks, lesson set, University of Groningen, 2022. Available via https://www.rug.nl/lespakketten.
Dialect digitaal: nieuwe kansen voor streektaalgebruik en -onderzoek, inaugural lecture booklet, University of Groningen, 2022. Available at https://books.ugp.rug.nl/index.php/ugp/catalog/book/75.
Lespakket Van Old noar Jongs, lesson set, University of Groningen, 2021. Available via https://www.rug.nl/lespakketten.
"Wie zeggen het vaakst EUHM? Jongens of meisjes?", Science box, Experiment Event, De Jonge Akademie, 2018.
Several contributions (with others), Young Scientist Wetenschapskalender 2018, New Scientist.
Martin & Emma investigate the Dutch dialect landscape, comic, The Young Academy, 2017. Available at http://www.martijnwieling.nl/comic.
Martin & Emma onderzoeken het Nederlandse dialectlandschap, comic, De Jonge Akademie, 2017. Available at http://www.martijnwieling.nl/strip.
"Gebruiken mensen hun ogen om te luisteren?", Wetenschapskalender kids 2017, New Scientist.
Dialectometrische indeling van de Nederlandse dialecten (with Peter Kleiweg and John Nerbonne), Nicoline van der Sijs (ed.) Dialectatlas van het Nederlands, Prometheus Bert Bakker, Amsterdam, pp. 60-61, 2011.

News coverage

About my comic

Taalwetenschapper verstript zijn onderzoek (newspaper), Dagblad van het Noorden, December 8, 2017.
Wetenschap als stripavontuur (online), Universiteitskrant, December 6, 2017.
RUG onderzoeker presenteert onderzoek in stripvorm (TV), OOG TV, December 4, 2017.
Een strip over dialectonderzoek (online), Neerlandistiek, December 4, 2017.
Strip over dialectonderzoek Martijn Wieling (press release), University of Groningen, December 4, 2017.

About articulography

Onderzoek: Engels spreken met een echo-apparaat onder de kin (newspaper), Dagblad van het Noorden, November 13, 2018.
Noordelijk dialect praat je achter in de mond (online), Neerlandistiek, January 31, 2017.
Prijs voor tongbewegingen (online), Kennislink, July 27, 2016.
Tongpositie verklaart buitenlands accent (newspaper), Dagblad van het Noorden, September 5, 2015.
Hoe bewegen tong en lippen bij het spreken? (magazine), New Scientist, September 1, 2015.
Tongpositie verklaart buitenlands accent (newspaper), Het Parool, August 29, 2015.
Interview (radio), NPO Radio 1: Nieuwsshow, May 16, 2015.
Interview (radio), RTV Emmen: Call it a day, May 13, 2015.
Interview (radio), NPO Radio 1: WNL - Nog steeds wakker, May 13, 2015.
Interview (radio), RTV Noord Holland: Spitstijd, May 12, 2015.
Het dialectverschil zit 'm in de tong (TV), RTV Noord: Noord Vandaag, May 12, 2015.
Dialectverschillen zien (TV), Omproep Max (NPO 2): Hallo Nederland & Tijd voor Max, May 12, 2015.
Interview (radio), Simone FM: De Vos is Los!, May 12, 2015.
Dialectverschil komt door de tong (TV), RTV Drenthe, Drenthe Nu, May 12, 2015.
Dialectverschil heeft alles te maken met de tong (radio), NPO Radio 5: Plein 5, May 12, 2015.
Dialect is niet alleen te horen maar ook te zien (radio), BNR Nieuwsradio: BNR B.L.T., May 12, 2015.
Het dialectverschil zit 'm in de tong (radio), RTV Noord: Vroeg op Noord, May 12, 2015.
Interview (radio), Q-music: Mattie en Wietze, May 12, 2015.
Groningse tong staat anders dan een Gelderse (online), Nederlands Dagblad, May 12, 2015.
Dialecten beginnen bij de tong, waarschijnlijk ook in Amsterdam (online), Het Parool, May 12, 2015.
Dit is waarom jouw Engels niet goed klinkt (online), NOS op 3, May 12, 2015.
Nieuw gezicht (newspaper), De Volkskrant, May 9, 2015.
Wetenschap: de mysteries van de spraak (radio), NPO Radio 1: NOS - Wetenschap op woensdag, April 8, 2015.
Experiment met de articulograaf (online), Universiteitskrant, March 31, 2014.
Revealing the tip of the... tongue (newsletter), BCN Newsletter, March 1, 2014.
Tong en lippen bepalen uitspraak (online), Unifocus, January 8, 2014.
'Ik heb meteen een sollicitatie afgezegd' (online), Universiteitskrant, September 2, 2013.
Drent onderzoekt hardheid 'g' in Ter Apel (newspaper), Dagblad van het Noorden, March 7, 2013.
Hoe hard is de harde g precies in Ter Apel? (newspaper), Dagblad van het Noorden, March 6, 2013.
Plat praten met sensor op de tong (newspaper), Dagblad van het Noorden, March 5, 2013.

About the critical period in second language acquisition

There is no critical period for learning a second language, kids just have more free time and motivation (online), Medical Daily, February 9, 2016.
At what age is it easiest to learn a second language? (online), The Conversation, February 8, 2016.
The best age to learn a second language (online), The Independent, February 8, 2016.
Geen kritische periode voor leren tweede taal (online), Kennislink.nl, February 5, 2016.
Nieuw onderzoek: geen harde leeftijdsgrens voor het leren van grammatica (online), Genootschap Onze Taal, January 19, 2016.

About the UH/UM distinction

Zeg eens 'ehm' (newspaper), De Standaard, January 6, 2017.
De stille opmars van ...ehm (newspaper), NRC & nrc.next, January 5, 2017.
Afscheid van 'eh' (online), Vrij Nederland (vn.nl), August 21, 2015.
Uh, why do we say um? (online), Braindecoder, August 12, 2015.
Wetenschap: de mysteries van de spraak (radio), NPO Radio 1: NOS - Wetenschap op woensdag, April 8, 2015.
Verschil tussen 'eh' en 'ehm' (radio), NPO Radio 1: NOS - Met het oog op morgen, February 8, 2015.
Why we are saying 'uh' less and 'um' more (online), BBC, February 8, 2015.
Are we witnessing the death of 'uh'? [...] (radio), Public Radio International, February 5, 2015.
Things that make you go 'um' (online), The Atlantic, November 17, 2014.
Zeg eens uh en ik zeg wie je bent (newspaper), De Morgen, October 8, 2014.
UH / UM in Norwegian (online, guest post), Language Log, October 8, 2014.
Trending in the Media: Um, not exactly... (online), Language Log, October 7, 2014.
Stuck for words? How saying 'um' or 'er' in conversation [...] (online), Daily Mail, October 6, 2014.
Men are from 'er' and women are from 'um' [...] (online), Daily Mail, October 6, 2014.
Ah: Sounds to signal hesitation are part of our linguistic heritage (online), The Times, October 6, 2014.
Um or er: which do you, um, use more in, er, conversation? (online), The Guardian, October 6, 2014.
To um or to er? Studies probe how brains fill the speech-thought gap (online), The Times, October 4, 2014.
UM / UH in German (online, guest post), Language Log, September 29, 2014.
Um and Uh in Dutch (online, invited guest post), Language Log, September 16, 2014.

(Note that especially the English news coverage in 2014 got many details wrong, see this Language Log post.)

About dialectometry

New linguistic tools can predict your dialect characteristics (online), LSA press release, September 24, 2014.
Uitspraakverschillen (newspaper), Leeuwarder Courant, September 17, 2011.

Miscellaneous

De toga: van exclusief naar inclusief? (newspaper), Universiteitskrant, October 27, 2022.
Robot weet welke uitspraak het hof zal doen (newspaper), NRC, December 30, 2020.
Computersysteem voorspelt EHRM-uitspraken (online), Mr. Online, December 14, 2020.
Mis je sociaal contact tijdens je werk? Met dit virtuele 'game'-kantoor maakt thuiswerker Martijn Wieling zijn baan in coronatijd weer leuk (newspaper), Dagblad van het Noorden, November 17, 2020.
Groningse onderzoekers vragen coronasteun: Voorkom een verloren generatie (radio), RTV Noord, July 27, 2020.
Erken en waardeer solidariteit in tijden van crisis (online), ScienceGuide, April 1, 2020.
Geef niet alleen geld aan supersterren (newspaper), NRC, February 4, 2019.
De roze bril van wetenschapsminister Van Engelshoven (online), NRC, February 3, 2019.
Interview: Promotiestudent blijft de meningen verdelen (magazine), Onderzoek Nederland, November 23, 2018.
Interview: How can universities support open science? (online), UG Open Access newsletter, November 6, 2018.
Maak alle promovendi beurspromovendus: Naar een houdbaar en beter promotiestelsel (online), ScienceGuide, November 5, 2018.
Opinie: Naar een rechtvaardig promotiestelsel (online), Universiteitskrant, October 31, 2018.
Waarom hoogleraren en docenten vandaag een rood vierkantje dragen (online), NOS, September 3, 2018.
Jonge onderzoekers in de sociale- en geesteswetenschappen vallen buiten de boot: subsidies beperkt (newspaper), de Volkskrant, July 4, 2018.
NWO-beurs: extensie van extensieregel' (online), U-Today, May 4, 2018.
NWO helpt ook vaders zorgen (online), Universiteitskrant, May 1, 2018.
NWO geeft voortaan alle ouders langer de tijd voor een aanvraag (online), Science Guide, April 30, 2018.
Datasets beschikbaar via RUG onderzoeksdatabase (online), Pictogram, March 1, 2018.
Opinie: "Zorg zelf voor diversiteit uraad!" (online), Universiteitskrant, February 7, 2017.
Martijn Wieling wint prestigieuze European Young Researchers Award (press release), University of Groningen, May 4, 2016.
European Young Researchers Award: the EYRA 2016 laureates (online), EuroScience, May 3, 2016.
Promoveren werkt, maar hoe lang nog? (online; with Sicco de Knecht), ScienceGuide, April 19, 2016.
The future of science: Harder, better, faster? (newsletter), BCN Newsletter, December 1, 2015.
SZW benadeelt jong talent in R&D (online), ScienceGuide, October 23, 2015.
Snel een aardbeving detecteren met Twitter (magazine), I/O Magazine, September 1, 2015.
Ook niet-hoogleraren moeten promotor kunnen zijn (newspaper), NRC, April 28, 2015.
Presentation at the general meeting of De Jonge Akademie (online), De Jonge Akademie, March 31, 2015.
The fruits of your labors (online), Language Log, July 26, 2013.
Rating American English accents (online), Language Log, May 19, 2012.

Miscellaneous

Some interests outside of academia

Spending time with my family (2015: Emma, 2018: David)
Travelling to see a bit of the world (see map)
Playing squash
Microlight flying
Digital photography
Reading, especially fantasy books
Going to concerts or music festivals
Playing various computer, board and card games

Some useful resources

Price list of dissertation printing companies (created in March 2017)
Automatic summarizer (in Java) for Dutch and English scientific documents (manual and license)
Online Gabor filter for image processing and computer vision
Online Canny edge detector (Canny filter) for image processing and computer vision

Martijn Wieling

University of Groningen

Martijn Wieling

Academic positions

Professor by special appointment of Low Saxon / Groningen Language and Culture

Associate Professor

Assistant Professor (with tenure)

Post-doctoral researcher (NWO Veni)

Post-doctoral researcher (NWO Rubicon)

Education & Training

Honors, Awards and Grants (>€25,000)

Publications

Exploring self-supervised speech representations for cross-lingual acoustic-to-articulatory inversion

Abstract

Quantifying the effect of speech pathology on automatic and human speaker verification

Abstract

Multiple estimates of the Frisian and Low Saxon speaker population size in the Netherlands

Abstract

A cognitive geographic approach to dialectology: Cognitive distance as a predictor for perceptual dialect distance

Abstract

Developing infrastructure for low-resource language corpus building

Abstract

The impact of electromagnetic articulography sensors on the articulatory-acoustic vowel space in speakers with and without Parkinson's Disease

Abstract

The effect of speaking style on the articulatory-acoustic vowel space in individuals with tongue cancer before and after surgical treatment

Abstract

Assessing differences in articulatory-acoustic vowel space in Parkinson's Disease by sex and phenotype

Abstract

Production allophones of North American English liquids

Abstract

Minority language happiness: the link between social inclusion, well-being, and speaking a regional language in the northern Netherlands

Abstract

GAM-based individual difference measures for L2 ERP studies

Abstract

Sound change estimation in Netherlandic regional languages: reducing inter-transcriber variability in dialect corpora

Abstract

Changing usage of Low Saxon auxiliary and modal verbs

Abstract

Predicting citations in Dutch case law with natural language processing

Abstract

Making more of little data: Improving low-resource automatic speech recognition using data augmentation

Abstract

DUMB: A Benchmark for Smart Evaluation of Dutch Models

Abstract

5-minute formant adaptation task in Dutch children

Abstract

SPRAAKLAB: a mobile laboratory for collecting speech production data

Abstract

Estimating the level and direction of aggregated sound change of dialects in the northern Netherlands

Abstract

Longitudinal analyses of depression, anxiety, and suicidal ideation highlight greater prevalence in the northern Dutch population during the COVID-19 lockdowns

Abstract

Distance over time in a maximal sprint: understanding athletes' action boundaries in sprinting

Abstract

Communicative language teaching: Structure-based or Dynamic Usage-Based?

Abstract

Quantifying language variation acoustically with few resources

Abstract

Make the best of cross-lingual transfer: Evidence from POS tagging with over 100 languages

Abstract

Low Saxon dialect distances at the orthographic and syntactic level

Abstract

Neural representations for modeling variation in speech

Abstract

Rethinking the field of automatic prediction of court decisions

Abstract

Automatically identifying eviction cases and outcomes within case law of Dutch Courts of First Instance

Abstract

Cognitive benefits of learning additional languages in old adulthood? Insights from an intensive longitudinal intervention study

Abstract

Accuracy assessment of two electromagnetic articulographs: NDI Wave and NDI Vox

Abstract

Automatic judgement forecasting for pending applications of the European Court of Human Rights

Abstract

Adapting monolingual models: data can be scarce when language similarity is high

Abstract

A review of data collection practices using electromagnetic articulography

Abstract

Prevalence of internalizing disorders, symptoms and traits across age using advanced non-linear methods

Abstract