1. Introduction
Suppose one were to ask an English speaker, “What English letter comes after F?” Chances are the English speaker would start reciting The ABC Song to arrive at the correct response, particularly if that was how they learned the English alphabet. This propensity to retrieve information from memory using songs suggests that information is better remembered and recalled when it is set to a musical melody. It is not surprising, then, that music is often used as a mnemonic device to facilitate learning and remembering information. The present study expands this common intuition by investigating the efficacy of music in language learning; specifically, whether music may be used to support word learning (i.e., the learning of labels for referents).
While anecdotal evidence points to a positive effect of music on learning and memory, empirical evidence for this is less straightforward. Music is typically operationalised in these empirical studies as background music (vs. in quiet) or as a song (vs. speech). Mixed results were obtained from studies comparing memory for items in the presence of background music and in quiet. Some reported a facilitative effect: for instance, learners remembered word lists and second-language vocabulary better when the items were presented with background music compared to those presented in silence (Bottiroli et al., Reference Bottiroli, Rosi, Russo, Vecchi and Cavallini2014; de Groot, Reference de Groot2006; Kang & Williamson, Reference Kang and Williamson2013). Others found that the background music benefit was not seen in item memory (i.e., memory for specific items) but rather in source memory (i.e., memory for the context in which items occur), suggesting that background music provides a facilitative context for encoding information (Ferreri et al., Reference Ferreri, Bigand, Perrey, Muthalib, Bard and Bugaiska2014, Reference Ferreri, Bigand and Bugaiska2015). On the other hand, some studies reported no difference in recall accuracy between items presented with background music and those presented in silence (de Groot & Smedinga, Reference de Groot and Smedinga2014a; Jäncke et al., Reference Jäncke, Brügger, Brummer, Scherrer and Alahmadi2014; Jäncke & Sandmann, Reference Jäncke and Sandmann2010; Nguyen & Grahn, Reference Nguyen and Grahn2017; Reaves et al., Reference Reaves, Graham, Grahn, Rabannifard and Duarte2016). Some even reported worse performance in the background music condition relative to silence, especially in cases where vocal music is used as background music (de Groot & Smedinga, Reference de Groot and Smedinga2014a; Salame & Baddeley, Reference Salame and Baddeley1989). In an attempt to clarify the inconsistent findings, a meta-analysis revealed that background music appears to have a small detrimental effect on memory in general (Kämpfe et al., Reference Kämpfe, Sedlmeier and Renkewitz2011). Taken together, the findings on this issue have been highly inconsistent, with the effect of background music on verbal memory ranging from positive to none to negative.
In contrast, studies that compared memory for items that were sung vs. spoken revealed that memory for sung items is typically better than that of spoken (Chazin & Neuschatz, Reference Chazin and Neuschatz1990; Ratovohery et al., Reference Ratovohery, Baudouin, Gachet, Palisson and Narme2018; Thiessen & Saffran, Reference Thiessen and Saffran2009, but see Racette & Peretz, Reference Racette and Peretz2007, for conflicting results). For example, adult learners recalled foreign phrases more accurately when the phrases were sung to them compared to those that were spoken to them (Ludke et al., Reference Ludke, Ferreira and Overy2014). Infants, too, appear to remember more from song: 6.5- to 8-month-old infants remembered sung sequences more than spoken sequences as indexed by their ability to differentiate old vs. new sequences (Thiessen & Saffran, Reference Thiessen and Saffran2009). Moreover, the song advantage is reported to be long-lasting, with memory for sung items lasting up to 6 months (Good et al., Reference Good, Russo and Sullivan2015; Ratovohery et al., Reference Ratovohery, Baudouin, Gachet, Palisson and Narme2018). Based on these positive findings, some have explored whether this song advantage may be used with clinical populations such as patients with aphasia (Racette et al., Reference Racette, Bard and Peretz2006) and patients with mild Alzheimer’s disease (Moussard et al., Reference Moussard, Bigand, Belleville and Peretz2012, Reference Moussard, Bigand, Belleville and Peretz2014; Simmons-Stern et al., Reference Simmons-Stern, Budson and Ally2010) and the results have generally been positive with some qualification (e.g., the song advantage tends to be seen after a delay or when sung in unison with an auditory model). While converging evidence suggests a song advantage relative to speech, the advantage is found to be due to certain characteristics of the song itself, such as familiarity of the melody, repetition of melodic structures, and tempo differences between song and speech. Indeed, when these characteristics are controlled, the song advantage tends to disappear (Calvert & Tart, Reference Calvert and Tart1993; Kilgour et al., Reference Kilgour, Jakobson and Cuddy2000; Wallace, Reference Wallace1994).
Different accounts were put forward to explain the disparate findings in the research on background music and song on verbal memory (Ferreri & Verga, Reference Ferreri and Verga2016). One such is the ‘limited resources’ account, which is based on the idea that attentional or cognitive resources are limited (Barrouillet & Camos, Reference Barrouillet, Camos, Osaka, Logie and D’Esposito2012; Kahneman, Reference Kahneman1973). So, in a situation of a dual task, such as performing a memory task while listening to/processing music, learners’ cognitive resources would be divided between each task, leaving them with less available resources to encode and maintain the target items for the memory task efficiently. Another explanation is the ‘arousal/mood booster’ account (Thompson et al., Reference Thompson, Schellenberg and Husain2001), in which it is argued that the facilitative effect of music on cognitive tasks is due to an improvement in one’s mood and an increase in one’s arousal level. This would essentially put one in a better state of mind to perform the task. Indeed, some have speculated that auditory stimulation in the background (e.g., white noise or music) may increase dopamine activity, leading to increased attention to perform the task (Angwin et al., Reference Angwin, Wilson, Arnott, Signorini, Robert and Copland2017). A third account to explain the effect of music on memory is the ‘scaffold/template’ account, which posits that music facilitates one to learn and remember items as it provides a frame to which items can be attached (Purnell-Webb & Speelman, Reference Purnell-Webb and Speelman2008; Thaut et al., Reference Thaut, Peterson and McIntosh2005). Given that the memory representation of melody and text are integrated to some extent (Serafine et al., Reference Serafine, Crowder and Repp1984), music would thus facilitate by ‘filling in the blanks’ if one should fail to retrieve portions of the verbal material (Ginsborg & Sloboda, Reference Ginsborg and Sloboda2007). Finally, the findings may also be understood from an informational masking perspective (Eskridge et al., Reference Eskridge, Galvin, Aronoff, Li and Fu2012; Scharenborg & Larson, Reference Scharenborg and Larson2018), such that background music, especially ones that have vocals, may mask one’s ability to decode the speech, whereas this would not be the case for sung vocals, in which the speech and music signals are integrated as one.
While this area of research has been studied extensively, there remain several outstanding questions on the influence of the different forms of music on learning and memory, which motivates the present study. Whereas previous studies have compared spoken-in-quiet vs. background music and spoken vs. sung, there has not been any direct comparison between background music vs. sung. Thus, it is unclear whether the two forms of music may have a different effect on word learning. Based on previous findings (i.e., worse performance for background music vs. spoken, and positive or no difference between sung vs. spoken), it is likely that performance on learning sung materials would be superior to learning materials presented with background music. Previous studies have also only used grammatical, in-key music to examine the effect of music on learning and so it is unclear whether music needs to be grammatical to have an effect. With all other things being equal, on the one hand, music that is out-of-key may be a cognitive distraction, costing learners valuable processing resources needed for encoding target items (Slevc et al., Reference Slevc, Rosenberg and Patel2009). On the other hand, music that is out-of-key note may be considered to be a surprising albeit unpleasant event, especially if it occurs less often than in-key melodies, which may increase its salience and therefore its memorability (Foster & Keane, Reference Foster and Keane2019).
Another outstanding question is what might explain the equivocal findings in the literature. Certainly, the mixed findings are partly attributed to methodological differences across different studies – for example, whether the music used was instrumental or had lyrics, and if so, the language of the lyrics (de Groot & Smedinga, Reference de Groot and Smedinga2014b; Salame & Baddeley, Reference Salame and Baddeley1989); whether a recognition or a recall task was used (Nguyen & Grahn, Reference Nguyen and Grahn2017); and whether the acoustic differences between the different conditions were properly controlled (Kilgour et al., Reference Kilgour, Jakobson and Cuddy2000). Another possibility is related to individual characteristics of the learner, since it appears that not all learners exhibit the same effect of music even within the same study (de Groot, Reference de Groot2006; Küssner et al., Reference Küssner, De Groot, Hofman and Hillen2016). We argue that by not taking these individual characteristics into account, any effect of music on verbal memory may be ‘cancelled out’ when averaged across participants.
Since we conceptualised word learning as a memory task, that is, to commit associations between items (e.g., auditory words and visual forms) to long-term memory (LTM), our general hypothesis is that the influence of music on word learning (or verbal memory, in general) may be modulated by the learners’ cognitive resources and cognitive abilities, assuming all else being equal. That is, those with more resources and enhanced abilities may reap the benefit of music since they have sufficient resources to simultaneously process music and the target items efficiently whereas those with less resources and lower abilities may show the opposite effect (e.g., due to music being distracting, leaving them with less resources to learn target items). In the present study, we operationalised one’s cognitive resources as working memory (WM). We propose that WM is a potentially important moderator of the relationship between music and word learning, given its implication in word learning (Baddeley et al., Reference Baddeley, Gathercole and Papagno1998; Martin & Ellis, Reference Martin and Ellis2012) and more generally its relation to LTM. For example, according to some WM models, LTM modulates the information held by WM (Jones et al., Reference Jones, Gobet and Pine2007), whereas according to others, WM is an activated part of the LTM (Cowan, Reference Cowan, Miyake and Shah1999). Thus, it follows that any effect of music that is modulated by WM may invariably also affect word learning.
In addition to WM, we also examined two other individual characteristics that have been shown to affect learners’ cognitive resources and/or cognitive abilities that might influence the effect of music on word learning: age and musical training. Older adults are reported to have less cognitive resources and poorer episodic memory than younger adults (Craik & Byrd, Reference Craik, Byrd, Craik and Trehub1982; Nilsson, Reference Nilsson2003; Rabinowitz et al., Reference Rabinowitz, Craik and Ackerman1982) and as such, they tend to show poorer performance in paired-associates task and verbal learning (Korchin & Basowitz, Reference Korchin and Basowitz1957; Meijer et al., Reference Meijer, de Groot, Van Boxtel, Van Gerven and Jolles2008; Service & Craik, Reference Service and Craik1993). In line with our hypothesis that age may moderate the effect of music on word learning, previous studies have demonstrated that older adults tend to be affected by music more so than young adults: whereas there was no effect of music among young adults on verbal memory, older adults tend to show a detrimental effect of background music relative to silence (Reaves et al., Reference Reaves, Graham, Grahn, Rabannifard and Duarte2016) and a facilitative effect of positively valenced familiar song over speech (Ratovohery et al., Reference Ratovohery, Baudouin, Gachet, Palisson and Narme2018). Individuals with musical training are reported to have enhanced cognitive abilities including working memory, attention, and executive functioning (Schellenberg & Weiss, Reference Schellenberg, Weiss and Deutsch2013; Talamini et al., Reference Talamini, Altoè, Carretti and Grassi2017) than those without musical training. Moreover, they appear to show an aptitude for learning languages, often outperforming their non-musically trained counterparts in various linguistic tasks including word learning (Chobert & Besson, Reference Chobert and Besson2013; Dittinger et al., Reference Dittinger, Barbaroux, D’Imperio, Jäncke, Elmer and Besson2016; Kilgour et al., Reference Kilgour, Jakobson and Cuddy2000). Thus, given the enhanced cognitive resources/abilities and linguistic abilities associated with musical training, it is possible that musical training may modulate the effect of music on word learning.
In summary, the present study seeks to clarify the mixed findings in the field by examining whether there is an effect of music (operationalised as background music and sung) on word learning relative to spoken-in-quiet. We also explored outstanding questions in the field such as whether there might be differences in word learning performance between items presented in the presence of background music and items that were sung as well as between items sung to melodies that are in-key and those that are out-of-key. Importantly, we test our proposal that the effect of music on word learning may be moderated by individual characteristics such as WM, age, and musical training. Our general hypothesis is that these moderators will influence the direction of the music effect, such that those with higher cognitive resources/abilities will benefit from the effect of music, whereas music will have a detrimental effect on those with lower cognitive resources/abilities.
2. Methods
2.1. Participants
There were two groups of participants: younger and older adults, all of whom were multilingual with English being one of their primary languages. The younger adults were 28 undergraduate students (17 females and 11 males) recruited from a local university. Their age ranged between 20 and 32 (M = 24.21, SD = 3.76). Some were musically trainedFootnote 1 (M = 3.69, SD = 4.82, Range = 0–15 years) and all reported to have normal hearing and normal/corrected-to-normal vision. In addition, participants had their hearing assessed using a pure-tone audiometric screening, and all the younger participants could detect frequency up to 4000 Hz in at least one ear at 25 dB. Participants were screened for their nonverbal intelligence and English receptive vocabulary using Test of Nonverbal Intelligence (TONI, 4th Edition) and Peabody Picture Vocabulary Test (PPVT, 4th Edition), respectively. Their standard scores for both tests were within the normal range (TONI: M = 104.57, SD = 8.37, Range = 88–124; PPVT: M = 100.93, SD = 11.33, Range = 81–128).
The older adults were 28 volunteers (18 females and 10 males) recruited from the community whose age ranged between 60 and 87 (M = 67.11, SD = 5.63). All were reported to have normal hearing as well as normal/corrected-to-normal vision and some were reported to have musical training experience (M = 1.39, SD = 2.95, Range = 0–10 years). The older adults’ hearing was also screened, and all could detect frequency up to 4000 Hz in at least one ear at 40 dB. Participants scored at least 27 on the Mini Mental State Examination (MMSE, 2nd Edition), suggesting that they were cognitively healthy at the time of participating. Their nonverbal intelligence and English receptive vocabulary were within the normal range (TONI: M = 106.86, SD = 10.48, Range = 90–133; PPVT: M = 97.79, SD = 7.59, Range = 85–111). Relative to the younger adults, there were no significant differences in their nonverbal intelligence (t(54) = 0.90, p = .371, d = 0.24) and English receptive vocabulary (t(54) = 1.22, p = .228, d = 0.33).
All participants provided their written informed consent prior to participating and they were reimbursed for their participation. The Nanyang Technological University Institutional Review Board approved the study protocol, and all methods were performed in accordance with the relevant guidelines and regulations.
2.2. Stimuli and tasks
2.2.1. Word learning task
The word learning task consisted of 24 word-object pairings. The objects were novel (i.e., they are assumed to be unfamiliar, concrete objects that were not readily named by perceivers) and were taken from The Novel Object and Unusual Name (NOUN) Database (Horst & Hout, Reference Horst and Hout2016). The words were disyllabic pseudowords, which were presented auditorily (see Appendix Table A.1 for the list of pseudowords, and the two languages created for this experiment, i.e., the pseudoword-novel object pairings and the condition to which each pairing was assigned). All auditory stimuli were synthesised using Mac OS X Speech Service with a female voice. Materials for the word learning task can be found at the following link: https://osf.io/7uywe/.
The task, presented on PsychoPy (Peirce, Reference Peirce2007), consisted of a learning phase followed by a test phase. In each trial of the learning phase, an object was presented for 4.5 s in the middle of screen. 1.5 s after the object appears, a pseudoword for the object was presented auditorily in a carrier phrase “This is a [pseudoword]”. Importantly, the carrier phrase and pseudowords were presented in four within-subject conditions – spoken-in-quiet condition (Spoken) and three music conditions (spoken in the presence of background music (Bg), sung in-key (Sung-in), and sung out-of-key (Sung-out)), all to the tune of novel five-note melodies – with each condition consisting of six word-object pairings. In the Spoken condition, the carrier phrase and pseudoword were spoken with a natural prosody in quiet. In the Bg condition, each spoken sentence was accompanied by a soft sine-tone melody (with its amplitude peak scaled to 0.2, or approximately 77 dB SPL, on Praat; Boersma & Weenink, Reference Boersma and Weenink2013) in which each tone was linked to a syllable and their duration matched. In the Sung conditions, the pitch of each spoken syllable of the carrier phrase was manipulated on Praat (Boersma & Weenink, Reference Boersma and Weenink2013) such that it would be ‘sung’ to tones implying a major key (e.g., “This is a…” sung to E-G-C, which implies the key of C). Specifically, we levelled the pitch contour of each syllable and shifted its pitch to a particular frequency (tuned to A4 = 440 Hz). The pitch of the pseudowords was also manipulated in the same way, resulting in sung syllables that would either be in-key (Sung-in) or out-of-key (Sung-out) relative to implied key of the carrier phrase. For example, relative to a carrier in the key of C (E-G-C), an in-key pseudoword ‘rin-ba’ would be sung to D-C (which is consistent with the key of C) whereas an out-of-key pseudoword ‘lu-gash’ would be sung to G#-F# (which implies the key of F#). We defined ‘out-of-key’ as being the key directly opposite the implied key of the carrier phrase on the circle of fifths, a representation of the relationship between musical keys in music theory. This manipulation should theoretically be the most out-of-key possible, and thus the key violation to the implied musical key should be obvious to the adult listeners given that key membership is learned by the age of 5 years old (Trainor & Hannon, Reference Trainor and Hannon2013). To avoid potential confounding effects of pseudoword-object pairings and their allocation to a condition, we created two languages, which contained the same stimuli but we randomly paired the pseudowords and the objects and their allocation to one of the four conditions in both languages (see Appendix Table A.1). Participants were randomly assigned to one of the two languages at the start of the experiment. We focused on pitch, rather than rhythm, manipulations to differentiate the conditions given the relatively short utterance on every trial (5 syllables/notes). The duration of the stimuli was similar between the conditions: all had the same duration of the carrier phrase (786 ms) plus the duration of the disyllabic word, which ranged between 349 and 669 ms (M = 508 ms, SD = 68 ms). A 2 (List) × 4 (Condition) ANOVA on the duration of the pseudowords revealed no significant main effects of List (F(1, 40) = 0.46, p = .503) and Condition (F(3, 40) = 0.32, p = .812), nor a significant interaction between the two (F(3, 40) = .26, p = .854). Each word-object pairing was presented twice during the learning phase (with an inter-trial interval of 500 ms) in a pseudorandomised order such that the conditions were not blocked and that no two consecutive learning trials were the same.
Directly after the learning phase, participants completed a six-alternative forced-choice recognition test. On every test trial, participants heard a pseudoword (all of which were spoken) and they had to choose which of the six images on the screen corresponded to the pseudoword. The target image was presented with five distractor images, which were taken from the same condition (e.g., if the target image was from Sung-in condition, then all the distractor images were also from Sung-in condition). Each pairing was tested once in a randomised order.
2.2.2. Auditory working memory task
To measure participants’ working memory, we used the Auditory Working Memory subtest from Woodcock–Johnson Tests of Cognitive Abilities (3rd Edition). Participants heard a list of objects and digits in a random order on each trial. At the end of every list, participants had to first recall the objects in the order that the objects were presented and then the digits in the order that the digits were presented. The number of words in each list ranged between three and eight. Two points were given to each list when participants correctly recalled the objects and the digits in the correct order. One point was given when they correctly recalled either the objects or the digits in order. The maximum raw score for this task is 42.
2.3. Procedure
Participants completed all the tasks in a single session in the following fixed order: word learning, TONI, PPVT, auditory working memory, and demographic questionnaire. Additionally, the older adults completed MMSE at the start of the experiment. Prior to the word learning task, participants adjusted their headphone volume to a comfortable level to ensure they could hear the utterances clearly.
2.4. Data analysis
Independent samples t-tests were first conducted to examine whether younger and older adults differed in their working memory scores and musical training experience.
Data of the word learning task were fitted using mixed effects logistic regression using the lme4 package version 1.1.27.1 (Douglas et al., Reference Douglas, Maechler, Bolker and Walker2015) in R version 4.1.2 (R Core Team, 2021). Data and the analysis script for the experiments reported in this manuscript are available at the following link: https://osf.io/7uywe/. The analysis choice is motivated by the binary nature of the dependent variable, Accuracy (Jaeger, Reference Jaeger2008). There were four predictors: Condition (Helmert coded to test three planned comparisons: (i) Spoken vs. the three music conditions collapsed; (ii) Bg vs. the two sung conditions collapsed; and (iii) Sung In-Key vs. Sung Out-of-Key); Age Group (Age; effect-coded: Younger vs Older); Working Memory (WM; centred continuous variable by Age Group); and Years of Musical Training (Training; centred continuous variable by Age Group). We tested our a priori contrasts for Condition rather than an omnibus test of Condition given that the former approach is more powerful and appropriate (Schad et al., Reference Schad, Vasishth, Hohenstein and Kliegl2020). We modelled the data with all the predictors and the two-way interactions between Condition and Age, Condition and WM, and Condition and Training, to examine whether these factors modulate the effect of music on word learning. Pairwise comparisons, if any, were conducted using emmeans package version 1.7.1.1. As random effects, we entered random by-subject and by-item intercepts and random by-subject slopes for Condition.
3. Results
Independent samples t-tests revealed that the older adults had lower auditory WM raw scores (t(54) = 8.26, p < .001, d = 2.21) and less years of musical training (t(54) = 2.15, p = .036, d = 0.57) than the younger adults. In the subsequent analysis, we included WM scores and years of musical training, both of which were centred by age group.
Descriptive statistics of the overall proportion of correct responses for the word learning task by age group and condition are displayed in Table 1. The overall performance for both age groups was relatively high, given that the 95% confidence intervals were well above the chance level for a 6-alternative forced-choice task (i.e., 1/6, or 0.167).
Note. Spoken = Spoken-in-quiet condition; Sung-in = Sung in-key condition; Sung out = Sung-out-of-key condition; Bg = Spoken in the presence of background music condition.
The model summary of the word learning data can be found in Appendix Table A.2. Findings from the model suggest that younger adults generally performed better than older adults (ß = −0.27, SE = 0.14, z = 1.97, p = .048), and that there was a marginal positive relationship between word learning performance and years of musical training (ß = 0.07, SE = 0.04, z = 1.93, p = .053). There was no influence of music on word learning generally, that is, the performance on each of the conditions was not significantly different from each other (see Fig. 1).
The model also examined whether the influence of music on word learning was modulated by individual characteristics (i.e., the two-way interactions). We found that while age and years of musical training did not modulate the effect of music (i.e., none of the interactions involving age and musical training were significant), working memory (WM) modulated the effect of Spoken condition relative to the music conditions (i.e., Spoken vs. Music × WM interaction: β = −0.02, SE = 0.01, z = −2.13, p = .033; see Fig. 2). The estimated effect (i.e., the slope) of WM on spoken was significantly more negative than the music conditions (B diff = −0.266, SE = 0.12, z = −2.13, p = .033). Descriptively, those with lower WM performed worse on the music conditions relative to the Spoken condition whereas those with higher WM scored higher on the music conditions compared to the Spoken condition. No other interactions involving WM were significant.
4. Discussion
The present study clarifies what effect music has on memory, from a word-learning perspective, given mixed findings in previous studies. We compared word learning performance in three music conditions – background music, sung in-key, and sung out-of-key – with a control condition, that is, spoken in quiet. We also explored several outstanding questions in the field, such as whether there would be any difference in verbal memory for items presented with background music vs. sung items, and whether items sung in-key would be remembered differently than that sung out-of-key. Crucially, we investigated these questions from an individual differences perspective, by examining whether individual characteristics such as age, working memory (WM), and musical training may modulate the effects of music on word learning. We argue that these individual characteristics may partly contribute to the mixed findings in the literature, given that any effect of music (either positive or negative) may be ‘cancelled out’ by not taking them into consideration.
We found that in general, when individual characteristics were not taken into consideration, there was no effect of music on word learning relative to spoken-in-quiet and no difference in performance between background music and sung conditions and between sung in-key and sung out-of-key conditions. Extending previous studies that have typically compared either speech and background music or speech and song, our study demonstrated that the various music conditions do not differentially affect verbal memory. This is somewhat surprising, given that previous studies have found similar or worse performance for background music vs. spoken (e.g., Jäncke & Sandmann, Reference Jäncke and Sandmann2010; Kämpfe et al., Reference Kämpfe, Sedlmeier and Renkewitz2011; Reaves et al., Reference Reaves, Graham, Grahn, Rabannifard and Duarte2016) while superior performance for sung vs. spoken has been reported (e.g., Ludke et al., Reference Ludke, Ferreira and Overy2014; Ratovohery et al., Reference Ratovohery, Baudouin, Gachet, Palisson and Narme2018). So, one might expect that performance would be better in the sung conditions than in the background music condition. We suspect that the lack of a difference between the music conditions may be due to our manipulation methods. We wanted the music conditions to be as similar as possible to one another and to the spoken condition. Thus, the typical advantages afforded by songs such as repetition of melodic structures and slower articulation or tempo were not present in our sung conditions. Moreover, our use of short five-note melodies may not be ‘surprising’ or memorable enough when it is out-of-key for the learners to show differential performances between the two sung conditions.
Our results indicated that across conditions young adults performed better than older adults, unsurprisingly, given that older adults tend to have poorer declarative memory (Korchin & Basowitz, Reference Korchin and Basowitz1957; Meijer et al., Reference Meijer, de Groot, Van Boxtel, Van Gerven and Jolles2008; Service & Craik, Reference Service and Craik1993). There was also a marginal positive relationship between word learning performance and years of musical training across conditions, which is also expected given that musical training appears to enhance general cognitive abilities and is related to higher linguistic aptitude (Chobert & Besson, Reference Chobert and Besson2013; Dittinger et al., Reference Dittinger, Barbaroux, D’Imperio, Jäncke, Elmer and Besson2016; Kilgour et al., Reference Kilgour, Jakobson and Cuddy2000; Schellenberg & Weiss, Reference Schellenberg, Weiss and Deutsch2013; Talamini et al., Reference Talamini, Altoè, Carretti and Grassi2017). The older adults generally had less musical training experience than the younger adults in our sample, and so while we took that into consideration in our model (i.e., we mean-centred their musical experience by age group), it is possible that the effect of musical training may be stronger if there were more musically trained older adults in our sample.
Beyond the general effects of age and musical training on word learning, there was no evidence that the effect of music was modulated by those individual characteristics. Thus, our results appear to not be in line with those that found an effect of music among older adults but not younger adults (Ratovohery et al., Reference Ratovohery, Baudouin, Gachet, Palisson and Narme2018; Reaves et al., Reference Reaves, Graham, Grahn, Rabannifard and Duarte2016). The difference in findings is likely due to the type of music used – whereas previous studies typically employed rich, emotionally charged music, our study has opted for a more acoustically controlled, shorter melodies in order to be comparable to the control/spoken-in-quiet condition. In contrast to age and musical training, we found evidence of WM modulating the effect of music on word learning. Specifically, words that were spoken in the presence of background music and words that were sung were learned better than words spoken in quiet by those with higher WM whereas the opposite pattern was observed among those with lower WM. Based on the cognitive account discussed previously, we propose that music listening may involuntarily draw one’s cognitive resources, leaving listeners with lower WM with insufficient resources to properly encode the word-referent mappings. On the other hand, those with higher WM would still have sufficient resources for efficient encoding of the word-referent mappings, which leads to their ability to reap the benefits of music in learning.
Several limitations of the study are worth noting. Our sample size, though comparable to other studies (Jäncke & Sandmann, Reference Jäncke and Sandmann2010; Ratovohery et al., Reference Ratovohery, Baudouin, Gachet, Palisson and Narme2018), is modest. With a larger sample size, it would be possible to examine whether different forms of musical training (e.g., vocalists vs. instrumental musicians) may affect performances in the various music conditions. Our measure of musical experience is relatively crude and could benefit from using more sophisticated measures, such as a point-system differentiating type of musical training (e.g., private vs. group; Russo et al., Reference Russo, Cuddy, Galembo and Thompson2007) or standardised questionnaires such as the Goldsmiths Musical Sophistication Index (Gold-MSI; Müllensiefen et al., Reference Müllensiefen, Gingras, Musil and Stewart2014). Finally, we did not screen our participants for amusia. Though rare (occurring in about 1.5% of the general population; Peretz & Vuvan, Reference Peretz and Vuvan2017) and the selected out-of-key melodies were the most out-of-key according to the circle of fifths (i.e., we selected the key opposite the implied key), it is possible that the amusics, if there were any, might not have been able to detect the implied keys, which may affect the results. This should be considered in future studies.
This study provides evidence that a learner’s WM influences whether music exerts an effect on word learning. Given these findings, it appears that the effect of music on verbal memory is partly cognitive in nature. Moreover, we suggest that the equivocal results in previous studies may be partly due to not taking this and possibly other characteristics into consideration, which would average out any potential effects of music at the group level. It should be noted that previous studies have used different methodologies (e.g., the modality of the target items (e.g., printed words); background music not following the presentation rate of spoken material, etc.), which may involve different mechanisms and memory components than our current paradigm, and so this may partly explain the contrastive findings with the present study. Nonetheless, our present study has shown that individual characteristics do seem to influence the effect of music and taking this into account will provide us with a more nuanced understanding of how music may influence our memory. Further work is needed to identify other characteristics that may similarly exert an influence, which has clear implications for education (e.g., whether the effect may be stronger among learners with tone language experience given their extensive pitch experience), and how these characteristics may interact with various aspects of the music (e.g., familiarity with and likability of melodies) and methodology (e.g., having the same mode of presentation during learning and test). From a theoretical perspective, it would be interesting to examine whether music may exert an influence beyond word learning, such as grammar learning. According to certain language models (Hamrick et al., Reference Hamrick, Lum and Ullman2018; Ullman, Reference Ullman2004), the same memory component (i.e., declarative memory) is implicated in vocabulary learning and learning grammar of a second language in the initial stages. Thus, if music affects the retrieval of information from declarative memory, then based on our present findings, music may similarly influence initial grammar learning.
5. Conclusions
In conclusion, we found that WM appears to modulate the effect of music on word learning such that those with higher WM appear to benefit from the effect of music whereas those with lower WM were disadvantaged by the presence of music. More generally, our results highlight the need to consider individual characteristics in determining the effect of music. Indeed, we propose that the mixed results seen in previous studies may be partly due to the effect of music being ‘averaged out’ from not taking into account learner’s characteristics. For pedagogical and educational implications, further studies are necessary to identify other characteristics that may similarly influence the effect of music, and to explore whether the effect of music extends beyond word learning to other aspects of language acquisition.
Acknowledgements
We would like to thank all the participants who volunteered their time and Cheryl Choo and Hannah Lim for assistance with data collection.
Funding
This work was supported by the Ministry of Education (Singapore) Social Science Research Thematic Grant (SSRTG) (grant number: MOE2019-SSRTG-016), Academic Research Fund (MoE AcRF) Tier 2 (grant number MOE2019-T2–1-125), and Tier 1 (grant number RG71/18) awarded to AHDC. JHO was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 887283.
Data availability statement
The dataset supporting the conclusions of this article is available at https://osf.io/7uywe/.
Appendices
Note. Spoken = Spoken-in-quiet condition; Sung-in = Sung in-key condition; Sung out = Sung-out-of-key condition; Bg = Spoken in the presence of background music condition.
Note. CI = Confidence interval; SE = Standard Error; Spoken = Spoken-in-quiet condition; Music = the three music conditions collapsed; Bg = Spoken in the presence of background music condition; Sung = the two sung conditions collapsed; Sung-in = Sung in-key condition; Sung out = Sung-out-of-key condition; WM = Working memory; Age = Age Group; Training = Years of musical training.