Reading ability is critical for achievement in school, which in turn impacts on success in adulthood (Ritchie & Bates, Reference Ritchie and Bates2013). Both impairments and normal variability in reading and language capabilities are highly heritable (Bates et al., Reference Bates, Castles, Coltheart, Gillespie, Wright and Martin2004; Harlaar et al., Reference Harlaar, Spinath, Dale and Plomin2005), but little is known about the genetic architecture underlying these complex traits. Identifying the key genetic factors that contribute is important for understanding the etiology of reading and language disorders and therefore informing intervention strategies. A key to progress in molecular understanding is increased sample size of study cohorts. To date, most data in language disorder have come from affected samples, often of school age. These samples are modest in size, limiting power. By contrast, large genotyped samples of thousands of unselected adults are now being accumulated (e.g. UK Biobank), although collection of data on reading/language-related skills has seldom been prioritized. Here, we report the results of phenotyping a range of reading, spelling and language measures in an unselected adult sample of >1000 people, followed by testing for replication of prior associations, to validate this approach for future, large-scale studies of language-related traits and associated disorders.
A number of candidate genes for dyslexia and developmental language disorder (DLD; previously known as specific language impairment or SLD) have been identified through linkage mapping and targeted association (e.g. Francks et al., Reference Francks, Paracchini, Smith, Richardson, Scerri, Cardon and Monaco2004; Meng et al., Reference Meng, Smith, Hager, Held, Liu, Olson and Gruen2005; Nopola-Hemmi et al., Reference Nopola-Hemmi, Myllyluoma, Haltia, Taipale, Ollikainen, Ahonen and Widen2001) and replicated in genetic association studies of children, adolescents and young adults (e.g. Bates et al., Reference Bates, Castles, Luciano, Wright, Coltheart and Martin2007; Bates et al., Reference Bates, Lind, Luciano, Montgomery, Martin and Wright2010; Scerri et al., Reference Scerri, Morris, Buckingham, Newbury, Miller, Monaco and Paracchini2011). However, many hundreds of quantitative trait loci (QTL) of small effect size (<1%) are likely to contribute to these complex, heterogeneous disorders, and much of the relevant genetic variance still remains unaccounted for (Bishop, Reference Bishop2015; Carrion-Castillo et al., Reference Carrion-Castillo, Franke and Fisher2013; Deriziotis & Fisher, Reference Deriziotis and Fisher2017). Higher powered genome-wide association (GWA) studies derived from larger cohorts are needed to provide further validations of known candidates and to increase sensitivity for identifying new QTL.
Unselected adult cohorts are often orders of magnitude larger than even the largest case–control studies of children. Because specific reading and language impairments are theoretically viewed as the extreme end of a continuum of normally varying ability (Leonard, Reference Leonard1991; Rodgers, Reference Rodgers1983), selecting samples from the general population should remain sensitive for detecting relevant genetic factors — an expectation borne out in research on normal adolescents for both dyslexia and poor reading skill (e.g. Lind et al., Reference Lind, Luciano, Wright, Montgomery, Martin and Bates2010). Cognitive abilities have been shown to remain stable throughout life (Deary et al., Reference Deary, Whalley, Lemmon, Crawford and Starr2000), and reading comprehension measured in adolescents explains ~80% of the variance in adult reading comprehension (Smith, Reference Smith and Yussen1993). Measures of reading ability taken even in adulthood may be as informative as adolescent measures. As maximal reading skill is not reached until the mid-20s, perhaps involving similar mechanisms to those that underlie increasing heritability of intelligence after childhood (McArdle et al., Reference McArdle, Ferrer-Caja, Hamagami and Woodcock2002), adult cohorts may provide even more sensitive tests of genetic (as opposed to environmental) variation than do child cohorts. It is currently not known whether general reading ability in adults is underpinned by the same genetic factors as in children with dyslexia.
To probe the utility of unselected adult cohorts, Luciano et al. (Reference Luciano, Gow, Pattie, Bates and Deary2018) tested a set of 14 dyslexia candidate genes originally associated with reading disability in children in a meta-analysis of two cohorts of older adults (mean age = 79 years). They found that the gene set was significantly associated with a reading index (p = .016) and that individual single nucleotide polymorphism (SNP) associations, although not significant, had allelic effects in the same direction as earlier studies. These results suggest that the same genetic factors underlying reading disability in children may contribute to variation in the normal range of reading ability in later life. However, the measures used to create the reading index in the unselected adults were not ideal. Specifically, Luciano et al. (Reference Luciano, Gow, Pattie, Bates and Deary2018) employed only two word recognition tests: the National Adult Reading Test (Nelson & Willison, Reference Nelson and Willison1991) and the Wechsler Test of Adult Reading (Holdnack, Reference Holdnack2001), which both require pronunciation of irregular words. Performance on such tests is strongly influenced by vocabulary size, and since the latter is correlated with intelligence quotient (IQ), with these tasks it is hard to disentangle reading skill from general cognitive ability (Dykiert & Deary, Reference Dykiert and Deary2013). Here, we report an association analysis of the same set of candidate genes in an unselected Australian adult sample using validated reading and language measures, including nonword reading to assess phonological processing, a core component of reading skill.
Our strategy for the present study was to identify adults who had already been genotyped across the genome in the context of earlier genome-wide association studies (GWAS) and to perform targeted phenotyping with reading, spelling and language measures. There were three main aims for the research: (1) to demonstrate the reliability and validity of the reading, spelling and language measures (see Table 1) in adults since such studies are uncommon; (2) to confirm in a middle-aged sample (mean = 58.7 years) that while skill may vary with age such variation is not a significant issue for gene finding and (3) to demonstrate the validity of using unselected adults to identify genetic factors associated with reading and language abilities.
CC2A = Castles and Coltheart Test 2 Adults, IQ = intelligence quotient, ASD = Autism Spectrum Disorder.
Our long-term goal is to contribute to large-scale GWAS meta-analyses of speech, language and reading skills, given that genomic studies of these phenotypes are lagging behind those of other genetically complex traits (Deriziotis & Fisher, Reference Deriziotis and Fisher2017). Since the cohort described here by itself lacks power for fully genome-wide investigations, for the current study, we focused attention on the most prominent genetic associations from the prior literature. Specifically, we analyzed a set of 14 genes that have been reported to show associations with dyslexia — CMIP, CNTNAP2, CYP19A1, DCDC2, DIP2A, DXY1C1, GCFC2 (or C2orf3), KIAA0319, KIAA0319L, MRPL19, ROBO1, PCNT, PRMT2 and S100B (see Luciano et al., Reference Luciano, Gow, Pattie, Bates and Deary2018, for rationale). We also analyzed a set of five genes previously associated with language disorders of various kinds — ATP2C2, CMIP, CNTNAP2, FOXP2 and TM4SF20. The rationale for selecting these five genes is as follows. Studies of nonword repetition in a DLD cohort collected by the UK SLI Consortium identified associations with SNPs in ATP2C2 and CMIP (Newbury et al., Reference Newbury, Winchester, Addis, Paracchini, Buckingham, Clark and Monaco2009) as well as in CNTNAP2 (Vernes et al., Reference Vernes, Newbury, Abrahams, Winchester, Nicod, Groszer and Fisher2008). Nonword repetition was chosen for those studies (and also the present work) since it is a measure of phonological short-term memory that is often impaired in DLD (Gathercole et al., Reference Gathercole, Willis, Baddeley and Emslie1994; Newbury et al., Reference Newbury, Bishop and Monaco2005). Mutations in the FOXP2 gene have been reported to segregate with severe speech and language disorders, mainly characterized by childhood apraxia of speech, in a large family pedigree (Lai et al., Reference Lai, Fisher, Hurst, Vargha-Khadem and Monaco2001), and additional FOXP2 mutations have been found in independent cases with similar impairments (Morgan et al., Reference Morgan, Fisher, Scheffer, Hildebrand, Adam, Ardinger, Pagon, Wallace, Bean, Stephens and Amemiya1993). TM4SF20 was associated with early language delay in Southeast Asian families (Wiszniewski et al., Reference Wiszniewski, Hunter, Hanchard, Willer, Shaw, Tian and Lalani2013). In addition to analyses of the gene sets as a whole, we examined individual SNPs from within the relevant candidate genes that were previously reported to be associated with reading/language ability or impairment.
Finally, we included as a target the axon guidance pathway (GO:0007411: ‘chemotaxis process that directs the migration of an axon growth cone to a specific target site’; 216 genes) and the neuron migration pathway (GO:0001764: ‘movement of an immature neuron from germinal zones to specific positions where they will reside as they mature’; 214 genes) which have both been suggested to be implicated in dyslexia (Poelmans et al., Reference Poelmans, Buitelaar, Pauls and Franke2011), although see Guidi et al. (Reference Guidi, Velayos-Baeza, Martinez-Garay, Monaco, Paracchini, Bishop and Molnar2018) for a critical review.
Materials and Methods
Participants
In 2017, we recruited participants from earlier twin studies at the QIMR Berghofer Medical Research Institute in Australia. The final cohort consisted of 1550 participants (78.06% female), 1505 of whom had previously been genome-wide genotyped using SNP arrays and were living in Australia. Ages ranged from 41.7 to 73.2 years (mean = 58.7, SD = 7.8). Self-report data on dyslexia, DLD and related traits were collected in all 1505 participants (including 227 sibling pairs, 76 of whom were monozygotic (MZ) twins). Reading and language test data were collected in 1112 participants (including 197 sibling pairs and 70 MZ twins). All participants were free from neurological conditions and major psychiatric illness at the time of testing.
Genotyping
Participants had been genotyped on standard Illumina SNP arrays, the chip model varying, and merged after quality control (QC; including Mendelian checks, as data are typically family based). Within each batch, and across batches, sample errors or failures were identified using sex and relatedness tests, and either corrected or removed as appropriate. Samples were also removed if they were below a 97% call rate or (at a later stage post-merging) of non-European ancestry as judged from nonclustering with known European populations in a principal component analysis (PCA). Markers in a batch were dropped due to Illumina-recommended QC filters (e.g. GenTrain score), as well as: (1) there were issues with map placement or strand alignment in a Basic Local Alignment Search Tool search of primers; (2) call rate was <95%; (3) p < 1 × 10−6 in Hardy–Weinberg equilibrium tests; (4) minor allele frequency (MAF) was <1%; (5) (for chromosome X) male heterozygosity <%; (6) for older chips, there was a low mean GenCall score, <0.7 (Duffy et al., Reference Duffy, Zhu, Li, Sanna, Iles, Jacobs and Martin2018; Medland et al., Reference Medland, Nyholt, Painter, McEvoy, McRae, Zhu and Martin2009).
Data were imputed to the Haplotype Reference Consortium reference panel version r1.1 (Haplotype Reference Consortium et al., Reference McCarthy, Das, Kretzschmar, Delaneau, Wood and Marchini2016) and SNPs with a MAF of <.05 and an imputation accuracy of <.8 were excluded. Imputed genotypes were taken from three imputation runs (each using Eagle for phasing and minimac3 (autosomes) or minimac4 (chromosome X) for imputation, on the University of Michigan Imputation Server). Each run used individuals genotyped in a specific chip family, one of (1) the oldest HapMap-based Illumina chips; (2) GSA chips; (3) Omni and Core+Exome/PsychArray chips; and observed markers passing QC for all corresponding batches of genotyping. The three imputation runs were then merged by taking (for each individual) preferentially (1), (2) or (3) in that order (as this generally corresponds to the best-quality imputation).
The breakdown of chip models is (1) HapMap-based: 610 K-quad (n = 427), 660 K/670 K-quad (n = 213), CNV370 (n = 399), 317 K (n = 63); (2) GSA: GSA Avera (n = 2); (3) Core+Exome (n = 152), PsychArray (n = 65), Omni2.5 (n = 34), OmniExpress (n = 4).
Measures and Procedure
An approach email was sent to participants with a link to the detailed information sheet and online consent form. They were then directed to a brief self-report questionnaire, which included education, how frequently they read books (excluding magazines and newspapers), their estimated IQ, whether or not they or their child has a reading or language disorder and whether or not they have a range of other behavioral or psychiatric conditions (Table 1). Within two weeks of completion of this survey, eligible participants who provided informed online consent were contacted for a telephone interview. At the time of the interview, participants were then emailed an online link to access the tests.
Three tests were administered: the Castles and Coltheart Test 2 Adults (CC2A) reading test (Castles & Coltheart, Reference Castles and Coltheart1993), the Gathercole and Baddeley Nonword Repetition Test (Gathercole et al., Reference Gathercole, Willis, Baddeley and Emslie1994) and a spelling test, including phonetic spelling (Table 1). CC2A requires the reading aloud and correct pronunciation of 55 each of regular words, irregular words and nonwords. Irregular word reading assesses the lexical route of reading while nonword reading specifically assesses phonological processing. Similarly, our spelling test includes 22 regular and 14 irregular words plus a phonetic spelling task to spell 18 irregular words ‘as they sound’ to assess phonological processing. Gathercole et al.’s (Reference Gathercole, Willis, Baddeley and Emslie1994) task of nonword repetition measures language ability as related to phonological encoding and memory. Data for one individual with 10 missing items were excluded from the nonword repetition task.
Statistical Analyses
Multiple regression was used to predict each of the reading, spelling and language outcome measures from age, sex and hearing difficulties. Hearing difficulties included any respondent who reported hearing difficulties or use of a hearing aid, or any respondent identified as having hearing difficulties by the interviewer. We used residual scores in further analyses. A unitary reading and spelling ability measure was created based on the scores on the first principal component (PC) of a PCA of the regular and irregular word and nonword reading, and regular, irregular and phonetic word spelling measures.
GWA results were generated for each of five variables (the reading and spelling PC, nonword reading, phonetic spelling, nonword repetition and self-reported reading impairment). This was undertaken using the Genome-wide Complex Trait Analysis software (Yang et al., Reference Yang, Lee, Goddard and Visscher2011), which can account for family relatedness. Where more than one member of an MZ twin pair had been assessed, only one member was selected at random for the analyses, with final genetic association samples of 1425 for self-reported reading impairment, 1290 for the reading and spelling PC, 1293 for nonword reading and 1292 for phonetic spelling and nonword repetition.
Using the available summary statistics from the above, gene-set analysis was performed for four gene sets: dyslexia candidate genes (N = 14), speech/language disorder candidate genes (N = 5), the axon guidance pathway (gene ontology (GO) term GO:0007411; 216 genes) and the neuron migration pathway (GO:0001764; 145 genes). Individual gene-based analysis was also performed for genes within the dyslexia and speech/language disorder candidate sets. Bonferroni correction derived a critical p value of .003. Analyses were performed using MAGMA (de Leeuw et al., Reference de Leeuw, Mooij, Heskes and Posthuma2015) to test for overrepresentation of significantly associated SNPs within each set and within each candidate gene. Bonferroni correction for multiple testing was too conservative because the candidate gene sets wholly overlapped with the biological pathway gene sets, so an effective number of independent tests of 2 were used to derive an adjusted critical p value of .025.
Within the candidate gene sets, 77 SNPs had previously been associated with reading or language ability or disability, or were variants identified through fluorescence in situ hybridization and SNP microarray analysis of a small deletion at 21q22.3 segregating with dyslexia in a family (see Supplementary Material). A total of 68.7 independent tests were derived through matrix spectral decomposition, taking into account linkage disequilibrium (Nyholt, Reference Nyholt2004). Bonferroni correction gave an adjusted α level of 7.28 × 10−4.
Results
Phenotypic Analyses
The distributions of the raw reading, spelling and language test scores were slightly negatively skewed. We used multiple regression to predict each of the outcome measures from age, sex and hearing difficulties (Supplementary Table S1) with the resulting residual scores (used for genetic association analysis) normally distributed. The multiple regression results indicate that age-squared was only a significant predictor for nonword repetition (β = −0.00, p = .042). Females scored higher than males for regular word reading (β = −0.45, p = .006), nonword repetition (β = −0.96, p = .004), regular word spelling (β = −0.03, p < .001) and irregular word spelling (β = −0.61, p < .001). Hearing difficulties were associated with worse irregular word reading (β = −1.36, p = .029), nonword reading (β = −3.23, p < .001), nonword repetition (β = −0.96, p = .004) and regular word spelling (β = −0.04, p = .023) performance. Outliers were set to a trimmed minimum of negative four standard deviations.
Table 2 shows a correlation matrix of raw reading, spelling and language scores and covariates. Minimum and maximum values, means and standard deviations for each of the variables are in Table 3, while Table 4 gives the frequencies of discrete variables. Frequent book reading correlates with higher scores on reading and spelling tasks but not with nonword repetition. More years at school is correlated with higher scores in all reading, spelling and language tasks. Self-report of a reading impairment is associated with lower scores in reading and spelling tasks but bears no relationship with nonword repetition. Self-reported language impairments do not correlate with any task, including nonword repetition.
Note: Correlations are not adjusted for case nonindependence.
PC = principal component.
* p ≤ .05; sex (males), impairment and hearing difficulties are coded positively.
Note: For reading, spelling and language scores, summary statistics are calculated from the percentage of correct items.
In the PCA of reading and spelling scores, a scree plot of the eigenvalues shows the first PC is sufficient to explain the majority of variation (63.1%) in reading and spelling skills (Supplementary Figure S1).
Genetic Association Results
Quantile–quantile plots of the expected distribution of p values across SNPs within the dyslexia and speech/language disorder candidate gene sets (Supplementary Figure S2) demonstrate a slight positive deviation from the null distribution, indicative of genetic signal, for phonetic spelling for dyslexia candidate gene subset of SNPs (Supplementary Figure S2(b)) and for nonword repetition for both dyslexia and speech/language disorder candidate gene SNP subsets (Supplementary Figure S2(e) and (f)). In gene-based analyses (Table 5), FOXP2 was associated with nonword repetition (p < .001), phonetic spelling (p = .002) and the reading and spelling composite score (p < .001), withstanding a corrected α level of .003. For nonword repetition, FOXP2 was in the top three most significant genes.
Note: Bold type indicates nominal significance.
Chr = chromosome.
Gene-set analysis of the neuron migration pathway revealed a nominal association with the reading and spelling composite score (p = .037; Table 6), which did not survive correction for multiple testing, and gene-set analyses of 14 candidate dyslexia genes, five candidate speech/language disorder genes and the axon guidance pathway were also not significant.
Note: Bold type indicates nominal significance.
PC = principal component, DLD = developmental language disorder.
Of the SNPs within the dyslexia and speech/language disorder candidate gene sets, 77 have previously been reported to be associated with reading or language ability or impairment (Supplementary Tables S2–S6). None were close to the corrected significance level of p < 7.28 × 10−4.
Discussion
In this study, we introduced a new population sample of previously genotyped adults for whom we have recently obtained reading and language measures. Our aim was to determine the validity of using unselected adults to identify genetic factors associated with reading and language abilities. We demonstrate the suitability of the reading and language measures to determine ability among unimpaired adults, and we confirm that age is not a confounder. Notably, there was no association between age and the most sensitive index of reading skill, namely phonological decoding (assessed through nonword reading). In our adult population, we observed associations at the gene-based level for candidate genes that have previously been implicated in dyslexia or speech/language disorders in children and adolescents; for example, finding that variation in FOXP2 (a gene implicated in a monogenic form of speech apraxia) was associated with nonword repetition. Further, in gene pathway analyses, we find some support for associations of genes involved in neuronal migration with reading skill, albeit at a nominal level of significance that does not survive multiple-testing adjustment.
Establishing sensitive measures of adult reading and language abilities is crucial because individuals with an impairment may develop coping strategies over the life course. We demonstrated that the CC2A reading task and our spelling task, which included reading nonwords and phonetic spelling, correlate with how often individuals read books. Reading books, more so than other forms of print, is associated with higher literacy proficiency (Smith, Reference Smith1996). Further, performance on the reading, spelling and language measures correlated with the number of school years individuals completed, supporting the known association between educational achievement and reading and language abilities (Garnier et al., Reference Garnier, Stein and Jacobs1997; Snowling et al., Reference Snowling, Adams, Bishop and Stothard2001). We also found that the reading and spelling scores in our cohort correlated with whether individuals self-reported a reading impairment but not with self-report of a language impairment. Unexpectedly, nonword repetition scores showed no relationship to self-report of a language impairment, even though individuals with DLD are less able to acquire phonological forms of new words (Gathercole, Reference Gathercole2006; Newbury et al., Reference Newbury, Bishop and Monaco2005). We may be statistically underpowered to detect a relationship due to the low frequency of reports of language impairments (1.9%) in our modestly sized cohort, and as such this could be a type II error. Alternatively, this result may reflect an ascertainment bias in addition to the unreliability of self-reported measures for accurately measuring true frequencies of learning disabilities, particularly considering historical context: the youngest members of this cohort were born in 1975, six years before a standard set of criteria for diagnosing DLD (formerly known as SLI) existed (Stark & Tallal, Reference Stark and Tallal1981). The population frequency of language deficits not attributable to hearing impairment, low nonverbal intelligence or neurological damage is estimated to be closer to 7% (Leonard, Reference Leonard2014).
In gene-based analyses of prior candidate genes from the dyslexia and speech/language disorder literature, we identified associations with several reading, spelling and language measures in our cohort of largely unimpaired adults. A discussion of the individual SNP results can be found in the Supplementary Material. Variation in FOXP2 was associated with nonword repetition as well as phonetic spelling, and a reading/spelling composite score. FOXP2 (Forkhead Box P2) encodes a transcription factor involved in the development of the brain (among other tissues) and acts through regulating hundreds of genes (Fisher & Scharff, Reference Fisher and Scharff2009). The gene was first identified through positional cloning studies of a severe speech and language disorders involving childhood apraxia of speech in a large multigenerational pedigree (Fisher et al., Reference Fisher, Vargha-Khadem, Watkins, Monaco and Pembrey1998). All affected cases in this family were found to carry a missense mutation in the DNA-binding domain of the encoded protein, and a translocation disrupting FOXP2 was discovered in an unrelated individual with a similar disorder (Lai et al., Reference Lai, Fisher, Hurst, Vargha-Khadem and Monaco2001). Subsequently, additional rare protein-coding changes (including both missense and nonsense mutations) have been identified as causes of developmental speech and language disorders in multiple independent families and cases (MacDermot et al., Reference MacDermot, Bonora, Sykes, Coupe, Lai, Vernes and Fisher2005; Morgan et al., Reference Morgan, Fisher, Scheffer, Hildebrand, Adam, Ardinger, Pagon, Wallace, Bean, Stephens and Amemiya1993; Reuter et al., Reference Reuter, Riess, Moog, Briggs, Chandler, Rauch and Zweier2017). Despite robust evidence implicating rare disruptions of FOXP2 in severe speech and language deficits across independent studies, the contributions of common variation in this gene to language-related phenotypes remain open to debate (see Uddén et al., Reference Uddén, Hulten, Bendtz, Mineroff, Kucera, Vino and Fisher2019). For example, in one of the largest prior studies to assess this issue, Mueller et al. (Reference Mueller, Murray, Michaelson, Christiansen, Reilly and Tomblin2016) tested for a relationship of 13 SNPs in FOXP2 and language ability in a modestly sized population cohort of children (N = 812) and found no significant associations. Given our contradictory findings in the present study, further investigations using robust measures in larger samples of adults and children are warranted to resolve this long-standing question. Of note, in a recent meta-GWAS of >20k individuals diagnosed with attention deficit/hyperactivity disorder (ADHD) compared to >35k controls, SNPs in FOXP2 were among the top genome-wide significant hits, which is intriguing in light of the known overlaps between ADHD and reading disabilities (Demontis et al., Reference Demontis, Walters, Martin, Mattheisen, Als, Agerbo and Neale2019).
ATP2C2 (ATPase secretory pathway Ca2+ transporting 2) catalyzes ATP hydrolysis coupled with calcium transportation. The gene was identified as a candidate for involvement in DLD susceptibility by the SLI Consortium (2002) following an early linkage study of families with DLD probands that included nonword repetition as a quantitative measure. In targeted analyses of the linkage region, SNPs in ATP2C2 were found to be associated with both nonword repetition and reading measures in language-impaired individuals, but not in an unselected cohort (Newbury et al., Reference Newbury, Paracchini, Scerri, Winchester, Addis, Richardson and Monaco2011; Newbury et al., Reference Newbury, Winchester, Addis, Paracchini, Buckingham, Clark and Monaco2009). In the present study, we detected association of ATP2C2 with nonword repetition, phonetic spelling, nonword reading and the reading/spelling composite score, although the significance levels were not robust to multiple-testing adjustment.
We also detected nominally significant associations of the dyslexia candidate genes MRPL19 (with phonetic spelling) and S100B (with nonword repetition). MRPL19 (mitochondrial ribosomal protein L19) encodes a ribosomal subunit and is involved in protein synthesis. A risk haplotype in a locus containing MRPL19 and C2ORF3 was associated with dyslexia in Finnish families and replicated in a German sample (Anthoni et al., Reference Anthoni, Zucchelli, Matsson, Muller-Myhsok, Fransson, Schumacher and Peyrard-Janvid2007). Heterozygous carriers of the risk haplotype had reduced expression of both genes. MRPL19 expression correlates with that of dyslexia candidate genes DCDC2, DYX1C1, KIAA0319 and ROBO1; however, the NeuroDys study of 900 individuals with dyslexia across eight countries failed to replicate the effects of MRPL19 (Becker et al., Reference Becker, Czamara, Scerri, Ramus, Csepe, Talcott and Schumacher2014). S100B (S100 Calcium Binding Protein B) is involved in neurite outgrowth and neuronal migration (Huttunen et al., Reference Huttunen, Kuja-Panula, Sorci, Agneletti, Donato and Rauvala2000; Poelmans et al., Reference Poelmans, Buitelaar, Pauls and Franke2011) and was identified as one of four genes in a deleted region co-segregating with dyslexia in a family (Poelmans et al., Reference Poelmans, Engelen, Van Lent-Albrechts, Smeets, Schoenmakers, Franke and Schrander-Stumpel2009). A noncoding variant was later associated with spelling in German families (Matsson et al., Reference Matsson, Huss, Persson, Einarsdottir, Tiraboschi, Nopola-Hemmi and Kere2015), but no other studies have identified the gene in association with language ability or impairment.
An association between genes in the neuron migration pathway and the reading and spelling composite score supports proposals from Galaburda et al. (Reference Galaburda, LoTurco, Ramus, Fitch and Rosen2006), Paracchini et al. (Reference Paracchini, Scerri and Monaco2007) and Poelmans et al. (Reference Poelmans, Buitelaar, Pauls and Franke2011), who hypothesized that dyslexia candidate genes are part of a molecular network that regulates neuronal migration and neurite outgrowth. A more recent review from Guidi et al. (Reference Guidi, Velayos-Baeza, Martinez-Garay, Monaco, Paracchini, Bishop and Molnar2018) critically evaluated this hypothesis and suggested that there is a lack of robust evidence supporting the theory. We did not find an association of the neuron migration pathway with any measure other than our composite score, nor was the axon guidance pathway significant in our study. However, the GO terms defining these pathways are incompletely annotated and continue to expand. At the time of publishing a previous paper (Luciano et al., Reference Luciano, Gow, Pattie, Bates and Deary2018), the neuron migration pathway contained 103 genes and the axon guidance pathway contained 203 genes, in comparison to 145 and 216 genes, respectively, at present. In the previous paper, no significant associations were found for either pathway, but here we detected an association with a reading and spelling score, albeit not robust to correction for multiple testing, highlighting the potential value of continuing to probe these pathways in their possible link to dyslexia as they are annotated with increasing resolution.
Our failure to replicate previous genetic associations may be due to a lack of statistical power to detect genetic variants of small effect size, may represent true null associations and, further, findings from prior studies could be false positives. Here, we had 78.43% power to find an effect size minimum of 0.005 (calculated using the Genetic Power Calculator; Purcell et al., Reference Purcell, Cherny and Sham2003). Further, our participants were recruited through a twin registry, which may be subject to a sampling bias: frequencies of self-reported reading (5.26%) and language (1.90%) impairments were below the estimated population frequencies (10% and 7%, respectively). Variants may have stronger effects at the tail end of ability or in individuals with an impairment, and hence greater statistical power is required to replicate them in unselected populations compared with case–control studies. Future meta-analyses and larger GWA studies of both selected and unselected cohorts of children and adults will enable stronger conclusions to be drawn about the genetic influences on reading acquisition and continuity of reading skill over the life course and their relationship to reading disorder. In this study, we were unable to disentangle general cognitive ability from reading and language skills, which are highly correlated traits. The inclusion of an IQ test as a covariate in future studies would enable better isolation of specific reading and language abilities.
This study introduces an important new population cohort of genotyped adults with validated measures of reading and language abilities. We also measured a self-reported binary status on a range of comorbidities of dyslexia and DLD, including stutters, autism spectrum disorder (ASD) and ADHD. We have shown that at least some candidate genes associated with dyslexia and speech/language disorders in children and adolescents may show effects in unselected adult populations, demonstrating the potential of such resources (when suitably scaled-up) for the discovery of novel genetic variants associated with reading and language traits. Future studies should aim to conduct large-scale GWA analyses and meta-analyses of unselected adults to identify genetic variants that are associated with measures of reading and language abilities, accounting for general cognitive ability where possible. Analyses of relevant continuous traits in unselected populations generalize to other learning impairments and neurological traits. Ultimately, uncovering the genetic etiology of developmental disorders will enable early diagnosis and appropriate intervention.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/thg.2020.7.
Acknowledgments
We would like to thank the research interviewers and the study participants. We are grateful to Marie-Christine Opitz for data cleaning assistance.
Financial support
S.E.F. is supported by the Max Planck Society. Data collection was funded in part by a Centre for Cognitive Ageing and Cognitive Epidemiology Pilot grant.
Conflict of interest
None.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.