Published online by Cambridge University Press: 16 August 2018
Researchers in dialectometry have begun to explore measurements based on fundamentally quantitative metrics, often sourced from dialect corpora, as an alternative to the traditional signals derived from dialect atlases. This change of data type amplifies an existing issue in the classical paradigm, namely that locations may vary in coverage and that this affects the distance measurements: pairs involving a location with lower coverage suffer from greater noise and therefore imprecision. We propose a method for increasing robustness using generalized additive modeling, a statistical technique that allows leveraging the spatial arrangement of the data. The technique is applied to data from the British English dialect corpus FRED; the results are evaluated regarding their interpretability and according to several quantitative metrics. We conclude that data availability is an influential covariate in corpus-based dialectometry and beyond, and recommend that researchers be aware of this issue and of methods to alleviate it.