Introduction
As the number of individuals over the age of 60 years is expected to double between 2000 and 2050(1), the projected incidence of age-related neurodegenerative diseases and associated health care costs is also set to rise significantly. In a recent report commissioned by the Alzheimer's Society, the current annual cost of dementia in the UK alone was estimated to be about £17·03 billion(Reference Knapp, Prince and Albanese2), with the total worldwide cost estimated to be US$315·4 billion annually(Reference Wimo, Winblad and Jönsson3). Moreover, given that individuals aged 65 years can now expect to live for at least another 20 years, there is an urgent need to identify means of mitigating age-related changes in healthy older adults. Diet is crucial in this respect as it is thought to reduce the impact of age-related cognitive decline, for instance, by combating oxidative stress, reducing LDL-cholesterol, and modulating neurological mechanisms such as cell-signalling pathways.
Over the last decade or so a significant, albeit mixed, body of evidence regarding the effects of diet on cognition has accumulated from human and animal work. For instance, in longitudinal and human observational studies, vitamin E intake has been associated with reduced age-related cognitive decline through its antioxidant properties(Reference Morris, Evans and Bienias4), and poorer memory performance has been linked to lower levels of serum vitamin E per unit of cholesterol(Reference Perkins, Hendrie and Callahan5). Evidence from The Rotterdam Study has shown an association between higher plasma folate and better cognitive function, in particular for tests measuring psychomotor processing speed(Reference de Lau, Refsum and Smith6), episodic memory and verbal ability(Reference Feng, Ng and Chuah7). In the PAQUID (Personnes Agées QUID, loosely translated to ‘What about the elderly?’) study, older adults with the highest dietary flavonoid intake showed significantly lower cognitive decline over 10 years than those with the lowest intake(Reference Letenneur, Proust-Lima and Le Gouge8). Studies using animal models have also demonstrated that certain groups of flavonoids may slow and even reverse the effects of ageing and dementia(Reference Galli, Shukitt-Hale and Youdim9–Reference Joseph, Shukitt-Hale and Denisova11). For example, memory deficits may be prevented by the consumption of foods rich in anthocyanins, a flavonoid subgroup(Reference Barros, Amaral and Izquierdo12–Reference Shukitt-Hale, Carey and Simon16).
While still an emerging area, an examination of the available human randomised controlled trial (RCT) literature reveals rather more variable evidence for the beneficial effects of diet on cognition. For example, in a systematic review of B vitamin and antioxidant supplement studies, Jia et al. (Reference Jia, McNeill and Avenell17) found very little evidence for cognitive benefits from taking antioxidant supplements or B vitamins. A similar story is shown for flavonoid studies(Reference Macready, Kennedy and Ellis18), with reports of both significant(Reference File, Hartley and Elsabagh19–Reference Le Bars, Velasco and Ferguson23) and non-significant(Reference Ho, Chan and Ho24–Reference van Dongen, van Rossum and Kessels29) effects of supplementation.
Developing a better understanding of the conditions under which particular nutrients do or do not derive cognitive benefits represents a key challenge for research. However, one major problem facing researchers aiming to do this is that there is currently little consensus across studies in terms of either the cognitive domains to be explored or the specific tests to be used. Thus it is hard to determine whether a failure to reproduce a previously reported effect has established an important boundary condition for that nutrient (for example, supplementation with X not effective for population Y) or, alternatively, is a reflection of the idiosyncrasies of the respective tasks employed across the two studies. For example, although it is often assumed that all tests of working memory performance reflect common mechanisms or processes, it is quite possible that different tests measure partially separate cognitive capacities(Reference Salthouse30) and that performance dissociates across different tests. Indeed, Waters & Caplan (2003)(Reference Waters and Caplan31) reported only moderate correlations between a series of seven different working memory measures. Thus simply assuming a one-to-one correspondence between two different cognitive measures purporting to measure the same domain (for example, working memory) is ill-advised. In the present paper we aim to establish the extent of this practice, as well as make recommendations for future studies.
Cognitive domains and associated brain regions
For the purposes of structuring the present review we now briefly outline the major taxonomies within human cognition. Importantly, attempts to characterise the effects of dietary nutrients on human cognition need to utilise a wide range of tasks to fully assess cognitive ability. In so doing, two points should be borne in mind. Firstly, although a particular task might be identified as having a primary neuropsychological focus such as ‘executive function’ or ‘episodic memory’, such measures are not ‘task pure’(Reference Burgess and Rabbitt32). For example, a range of processes may support a nominally ‘executive’ task such as memory, processing speed and motor function. Secondly, in terms of the underlying brain regions supporting cognitive performance, it is important to recognise that any task is likely to recruit multiple neural regions. For example, functional neuroimaging studies have revealed activations in the prefrontal cortex, medial and lateral parietal cortex, as well as hippocampal/medial temporal lobe activations during episodic memory retrieval(Reference Rugg, Henson, Parker, Wilding and Bussey33). A thorough understanding of the brain regions underpinning performance on particular cognitive tests is important, especially when attempting to relate findings from human studies to animal work. We return to this point in the Discussion.
Executive function
Executive function is a complex term used to describe a number of distinct, specifiable ‘control’ functions that are distinguishable from processing speed, memory, and motor functions. Examples of executive functions include ‘switching’ or ‘shifting’ (for example, alternating between behaviours or information sources), ‘inhibition’ (the ability to suppress automatic and habitual responses or behaviours), ‘updating’ (the ability to discard and replace information(Reference Miyake, Friedman and Emerson34, Reference Rabbitt35)), ‘sustained attention’ (requiring sustained concentration and monitoring skills(Reference Rabbitt35, Reference Manly, Robertson and Rabbitt36)), ‘strategic memory search’ (conscious, controlled retrieval of structured information(Reference Burgess and Rabbitt32, Reference Rabbitt35–Reference Phillips and Rabbitt37)), and ‘planning’ (the ability to deal with novel information, generate goals and make decisions on a suitable course of action(Reference Burgess and Rabbitt32, Reference Rabbitt35)). Neuroimaging studies suggest that the prefrontal cortex and striatum interact to perform specific executive functions(Reference Robbins38), and that distinct brain regions are recruited for different executive functions. For instance, the left inferior frontal gyrus in the prefrontal cortex is recruited in verbal fluency tasks(Reference Costafreda, Fu and Lee39), whereas the right inferior frontal gyrus shows greater activation in tasks measuring both shifting and inhibition(Reference Robbins38).
Working memory
All of the above executive functions are dependent on ‘working memory’, a psychological construct used to describe a hypothetical system for the temporary maintenance and manipulation of speech-based and/or visuospatial information, requiring the control of attentional resources(Reference Burgess and Rabbitt32). Functional neuroimaging work shows that working memory is not a unitary or dedicated system, and is not localised to a single brain region(Reference D'Esposito40). D'Esposito described working memory as ‘an emergent property of functional interactions between the PFC [prefrontal cortex] and the rest of the brain’ (p. 769)(Reference D'Esposito40), and evidence suggests that the network of brain regions recruited for the active maintenance of task-relevant information will depend on the type of information being maintained(Reference D'Esposito40, Reference Curtis, Rao and D'Esposito41).
Memory
A number of key distinctions can be drawn between different types of memory. Specifically, researchers frequently distinguish between short-term memory (retrieval occurs within 30 s of stimulus presentation) v. long-term memory (retrieval occurs after 30 s); explicit memory (consciously and intentionally retrieved) v. implicit memory (unconsciously retrieved); episodic (memory for events) v. semantic (memory for meaning); retrospective (memory for past events) v. prospective memory (remembering to perform actions in the future); memory for skills (procedural memory) v. memory for facts (declarative memory); and verbal memory v. visual or visuospatial memory.
As might be expected, a wide range of brain regions are thought to be involved in supporting these various forms of memory. For instance, activation of left-lateralised posterior temporal regions, the supramarginal gyrus, dorsolateral premotor cortex and Broca's area have been associated with short-term memory(Reference Henson, Burgess and Frith42), in contrast to activation of bilateral ventrolateral prefrontal regions and dorsolateral prefrontal regions during encoding and recognition in long-term episodic memory(Reference Ranganath, Johnson and D'Esposito43). In terms of the neural substrate of explicit and implicit memory tasks(Reference Voss and Paller44), explicit memory has been linked to left frontal, and bilateral hippocampal, parahippocampal and parietal activation(Reference Wagner, Shannon and Kahn45), whereas implicit memory is primarily associated with reduced left fusiform gyrus and bilateral frontal and occipital activity(Reference Voss, Reber and Mesulam46, Reference Schott, Henson and Richardson-Klavehn47). The hippocampus, parahippocampus and parietal areas are typically implicated in spatial memory tasks(Reference Spiers and Maguire48), whereas the anterior prefrontal cortex has been shown to be actively involved in prospective memory tasks(Reference Simons, Schölvinck and Gilbert49).
Motor function, perception and intelligence quotient
Motor function may be measured with or without a cognitive component, and encompasses a range of measures from psychomotor processing speed to planning of movement. Voluntary movement is controlled by the basal ganglia system, which includes the striatum and substantia nigra, by enabling required motor mechanisms and inhibiting competing mechanisms(Reference Mink50). Various brain regions are thought to be involved during motor skill acquisition: prefrontal regions are recruited initially, with a subsequent shift to posterior regions, for example, premotor, posterior parietal and cerebellar cortex structures, as the task becomes more automatic(Reference Shadmehr and Holcomb51).
Visual perception relies on visual acuity, field of view and contrast sensitivity, abilities that are reduced with age(Reference Attebo, Mitchell and Smith52, Reference Ivers, Cumming and Mitchell53) and which underpin any cognitive function with a visual component. Visual perception is associated with a wide range of brain regions in neuroimaging studies, namely the striate cortex and other occipital areas, parietal, temporal and prefrontal regions(Reference Ganis, Thompson and Kosslyn54).
Intelligence tasks may be sub-divided into crystallised intelligence (measuring acquired knowledge) and fluid intelligence (measuring non-verbal ability, problem-solving and pattern recognition independently of acquired knowledge)(Reference Cattell55). General intelligence, or Spearman's g, is associated most closely with fluid intelligence and activation of the lateral frontal(Reference Duncan, Seitz and Kolodny56) or prefrontal cortex and parietal areas(Reference Gray and Thompson57).
Research aims and questions
The primary aim of the present paper is to review the cognitive methods used in existing RCT studies that have explored the effects of nutrition on human cognition, with a view to identifying domains (for example, executive function, episodic memory) and individual tasks within those domains (for example, category fluency task, common objects recall task) that have shown greatest sensitivity to chronic supplementation (for example, supplementation of a nutrient over a number of days, weeks or months, as opposed to an acute intake of a nutrient on a single experimental day). A related aim is to catalogue the cognitive tasks used in existing chronic RCT studies within a single framework enabling researchers to better choose suitable tasks, as well as identify potential gaps in terms of the domains measured.
It should be noted here that significant outcomes for cognitive testing in dietary intervention studies rely on two things: (1) the potential for cognitive change as a result of direct dietary intervention with respect to dose and duration in the cognitive domain or cognitive aspect being measured and (2) cognitive methodologies sensitive enough to measure such cognitive change. The most important consideration in setting up a suitable framework for measuring human cognitive function in nutritional research is to determine methods that are sensitive to dietary changes and repeatable over time, are simple to interpret, and specific to cognitive domains. In this respect, brief measures such as the Mini-Mental State Examination (MMSE)(Reference Folstein, Folstein and McHugh58) and the Alzheimer's Disease Assessment Scale Cognitive Subscale (ADAS-Cog)(Reference Rosen, Mohs and Davis59) are suitable for cognitive screening of dementia and mild cognitive impairment, a term generally used to describe the level of cognitive impairment found in the intermediate stage between normal ageing and fully developed dementia(Reference Petersen, Smith and Waring60). Both the MMSE and the ADAS-Cog consist of items covering a broad range of cognitive functions: orientation, attention and calculation, memory, language, and motor skills, but they cannot truly be said to measure ‘global cognitive function’, as the individual test items do not measure the full range of cognitive functions. The term ‘general cognitive function’ is therefore preferred here.
In terms of examining changes in cognitive performance over time, the MMSE and ADAS-Cog may be useful for the measurement of widespread, gross cognitive changes in longitudinal studies(Reference Letenneur, Proust-Lima and Le Gouge8); in Alzheimer's disease research, where the fastest rate of deterioration over time is likely to be seen, the MMSE has shown an overall progression rate of 0·24 points per month, although this was moderated by education duration, sex, disease incidence and drug therapy(Reference Roselli, Tartaglione and Federico61). However, such measures are unlikely to be sensitive to smaller changes over shorter time periods in healthier individuals at pre-dementia stages, and will indeed show ceiling effects in young and/or cognitively healthy adult populations.
Overall we address four main questions:
(1) What proportion of chronic dietary interventions has reported significant benefits to cognition and in which domains?
(2) How much consistency is there across studies in terms of cognitive domains measured and tasks employed?
(3) Are there any cognitive domains that are under-represented in existing intervention studies?
(4) What are the implications for future chronic dietary intervention studies?
Methods
A search of five databases (PUBMED, Web of Knowledge, PsychINFO, CINAHL and the Cochrane Central Register of Controlled Trials) was carried out for micronutrient or phytochemical adult human randomised controlled intervention trials exploring cognitive function as the primary or secondary outcome. The following search terms were used:
Cognitive tasks: cogniti*, executive function, switching, shifting, updating, inhibition, vigilance, attention, memory, episodic, semantic, implicit, explicit, spatial, visuospatial, prospective, declarative, procedural, processing speed, psychomotor, reaction time, accuracy.
(1) Nutrient: vitamin*, thiamin*, riboflavin, niacin, nicotinamide, pantothenic, pyridox*, biotin, cobalamin*, folic acid, folate, ascorbic, tocopherol, iron, copper, zinc, magnesium, manganese, selenium, flavan*, flavon*, isoflavone, caroten*.
(2) Population sample: human adult [not] adolescent, child, infant, maternal, rat, mouse, mice, rodent, dog, monkey.
(3) Experimental design: randomi*, controlled trial, RCT.
(4) Type of article: journal article, peer-reviewed.
All studies identified through the literature search were evaluated according to the eligibility criteria by one reviewer and independently verified by a second reviewer. The search was limited to the previous 10 years (for example, 1 January 1999 to 31 August 2009).
Chronic studies that specifically focused on the cognitive effects of the target nutrients, both single and combined, were included. Studies exploring phytochemicals administered in extract rather than whole-food form (for example, Ginkgo biloba) were included. Studies on populations suffering from age-related cognitive impairment or dementia were included, except in the case of traumatic head injury or specific neurological disorders, whose findings may not be generalisable to a normal human adult population.
Acute interventions, non-randomised studies, RCT with sample sizes of less than twenty participants (or less than ten for cross-over designs), and studies that did not include a proper baseline, or control and/or placebo group were excluded. Studies using whole foods or treatments combined only with macronutrients or drug therapy were also excluded, as were drug therapy studies using micronutrients as a placebo control. As the present review is focusing on micronutrient or phytochemical strategies for the attenuation and prevention of cognitive decline throughout the adult lifespan, cognitive development studies (for example, maternal, infant and child) were excluded. As we were primarily interested in comparing uniquely human, language-based cognitive paradigms across studies, animal studies were excluded. Also excluded were studies of specific hospital-based patient groups (for example, CHD, stroke or diabetes), as were studies comparing pre- and post-operative cognitive performance. The literature search and screening process is shown in Fig. 1.
Outcomes were changes in cognitive performance. The first author categorised the tasks and the second author resolved any inconsistencies encountered during classification. Tasks were initially categorised by their primary neuropsychological focus, which was determined by the descriptions provided in the selected papers. Where disagreement occurred, the authors used the task descriptions cited in the majority of the included studies and checked these, where possible, against Lezak et al. (Reference Lezak, Howieson and Loring62), a comprehensive sourcebook of neuropsychological assessment.
All included studies were rated using a three-category quality assessment grading system (A, B, C), based on a method successfully trialled elsewhere(Reference Balk, Chung and Raman63, Reference Levey, Coresh and Balk64), to identify any methodological shortcomings which might affect interpretation of the results. Briefly, category A studies employ the best designs, such that they are sufficiently powered with less than 20 % drop-out, methods (double-blind, intervention, comparator, outcome measures and statistical tests) are appropriate, results are clearly reported and assessed as valid. Category B studies may contain some weaknesses, but show no major problems and are still considered valid. Category C studies show significant methodological difficulties which may invalidate results, with flawed designs and analysis, missing information, greater than 20 % drop-out, randomisation issues (for example, unequal between-group baseline scores), reporting discrepancies and low power. Grading was carried out and agreed by the first and second authors.
Studies were examined for cognitive outcome status in order to see if the RCT was designed primarily to test cognitive function, or if cognition was only a secondary outcome measure, as this again may affect the validity of the results. Cognitive outcome status was then designated as ‘primary’ or ‘secondary’ for all RCT.
Results
Assessment of cognitive performance in existing studies
Thirty-nine studies met the inclusion criteria (see Table 1). Five used multivitamins(Reference Cockle, Haller and Kimber65–Reference Wolters, Hickstein and Flintermann69), and two of these also included minerals(Reference McNeill, Avenell and Campbell67, Reference Wouters-Wesseling, Wagenaar and Rozendaal68). Ten studies examined vitamin B treatments(Reference Hvas, Juul and Lauritzen70–Reference Bryan, Calvaresi and Hughes79) and three looked at specific minerals: Zn(Reference Maylor, Simpson and Secker80), Fe(Reference Murray-Kolb and Beard81) and Cu(Reference Kessler, Pajonk and Bach82). One trialled β-carotene with vitamins C and E(Reference Smith, Clark and Nutt83). Of the studies targeting individual micronutrients or phytochemicals, twenty looked at flavonoids(Reference File, Hartley and Elsabagh19–Reference van Dongen, van Rossum and Kessels29, Reference Casini, Marelli and Papaleo84–Reference Stough, Clarke and Lloyd92). Twelve of these were isoflavone RCT(Reference File, Hartley and Elsabagh19–Reference Duffy, Wiseman and File22, Reference Ho, Chan and Ho24, Reference Basaria, Wisniewski and Dupree25, Reference Fournier, Ryan and Robison27, Reference Kreijkamp-Kaspers, Kok and Grobbee28, Reference Casini, Marelli and Papaleo84–Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86, Reference Woo, Lau and Ho89), six used G. biloba extracts containing 24 or 25 % flavonoids and 6 % terpenes(Reference Le Bars, Velasco and Ferguson23, Reference Elsabagh, Hartley and Ali26, Reference van Dongen, van Rossum and Kessels29, Reference Mix and Crews88, Reference Santos, Galduróz and Barbieri91, Reference Stough, Clarke and Lloyd92), one used pine bark(Reference Ryan, Croft and Mori90) and one used cocoa flavanols(Reference Francis, Head and Morris87). No other intervention met the study criteria within the time frame of the review.
MCI, mild cognitive impairment; DSM, Diagnostic and Statistical Manual of Mental Disorders; MMSE, Mini-Mental State Examination; T, treatment; C, control; ADAS-Cog, Cognitive element of the Alzheimer's Disease Assessment Scale; n/a, not applicable; IQ, intelligence quotient; ITT, intention-to-treat; D/O, dropped out; RDA, recommended daily allowance; WAIS, Wechsler Adult Intelligence Scale (all versions); HRNTB, Halstead–Reitan Neuropsychological Test Battery; CAMCOG, Cambridge Cognitive Examination; WMS, Wechsler Memory Scale (all versions); mITT, modified intention-to-treat; TICS, Telephone Interview for Cognitive Status; NINDS-ADRDA, National Institute of Neurological Disorders and Stroke and Alzheimer's Disease and Related Disorders Association; CAT, Cognitive Abilities Test; CANTAB, Cambridge Neuropsychological Test Automated Battery; IDED, Intra Dimensional/Extra Dimensional Set Shifting Task; fMRI, functional magnetic resonance imaging; BOLD, blood oxygenation level-dependent; CDR, cognitive drug research; TP, Toulouse–Pieron Test; CBT, Cognometer Battery of Tests; SCOLP, Speed and Capacity of Language-Processing Test; HRT, hormone replacement therapy. * Mean treatment group scores were significantly better than those of the control group: * P < 0·05, ** P < 0·01, *** P < 0·005. † Mean treatment group scores were significantly worse than those of the control group: † P < 0·05, †† P < 0·01, ††† P < 0·005.
‡ Study designed with cognitive function as ‘primary’ or ‘secondary’ target outcome measure.
§ Study graded for quality: category A, highest quality, no bias; category B, medium quality, some bias but results are deemed valid; category C, poor quality, significant bias that may invalidate the results.
∥ 1 IU vitamin A = 0·3 μg retinol or 0·6 μg β-carotene.
Fifteen studies (38 %) were graded as category C and judged to contain significant bias that may invalidate the results, mostly as a result of lack of any quantitative cognitive screening, missing information and reporting errors. Eighteen studies (46 %) were classed as category B and judged to be susceptible to bias, but not sufficiently so to invalidate the results. Only six studies (15 %) met the more rigorous criteria for category A, as described earlier (see Table 1). In the assessment of cognitive outcome status, 69 % of the RCT were found to be specifically designed to measure cognitive function as the primary outcome (see Table 1).
Only seventeen studies (44 %) reported benefits of treatment on cognitive function in the expected direction(Reference File, Hartley and Elsabagh19–Reference Le Bars, Velasco and Ferguson23, Reference Cockle, Haller and Kimber65, Reference Wouters-Wesseling, Wagenaar and Rozendaal68, Reference van Uffelen, Chinapaw and van Mechelen75, Reference Durga, van Boxtel and Schouten77, Reference Bryan, Calvaresi and Hughes79, Reference Maylor, Simpson and Secker80, Reference Casini, Marelli and Papaleo84, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86, Reference Woo, Lau and Ho89–Reference Stough, Clarke and Lloyd92), of which two were graded as category A(Reference Durga, van Boxtel and Schouten77, Reference Maylor, Simpson and Secker80), seven were category B(Reference File, Jarrett and Fluck21–Reference Le Bars, Velasco and Ferguson23, Reference Cockle, Haller and Kimber65, Reference Bryan, Calvaresi and Hughes79, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86, Reference Ryan, Croft and Mori90) and the rest were category C. Twelve of the seventeen RCT were flavonoid(Reference File, Hartley and Elsabagh19–Reference Le Bars, Velasco and Ferguson23, Reference Casini, Marelli and Papaleo84, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86, Reference Mix and Crews88–Reference Stough, Clarke and Lloyd92) including seven isoflavone studies(Reference File, Hartley and Elsabagh19–Reference Duffy, Wiseman and File22, Reference Casini, Marelli and Papaleo84, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86, Reference Woo, Lau and Ho89) and four G. biloba interventions(Reference Le Bars, Velasco and Ferguson23, Reference Mix and Crews88, Reference Santos, Galduróz and Barbieri91, Reference Stough, Clarke and Lloyd92).
In evaluating these effects of treatment, there was found to be considerable variability in the statistical rigour employed in individual studies. Gleason et al. (Reference Gleason, Carlsson and Barnet20) reported both positive and negative effects with a small sample size of thirty, but did not appear to have accounted for the possibility of type I error. Mix & Crews(Reference Mix and Crews88) reported a small significant effect of treatment on a single outcome measure using a one-tailed t test, a result unlikely to survive cut-off if an arguably more appropriate two-tailed convention had been employed. Additionally, both Casini et al. (Reference Casini, Marelli and Papaleo84) and Santos et al. (Reference Santos, Galduróz and Barbieri91) reported multiple t tests without any apparent correction for type I error. Howes et al. (Reference Howes, Bray and Lorenz85) also carried out a large number of tests on a small sample (n 30) initially reporting a series of significant effects. However, as an illustration of good practice, these disappeared after the authors statistically accounted for type I error. Finally, Stough et al. (Reference Stough, Clarke and Lloyd92) provided no descriptive statistics at all, making it impossible to evaluate the quality or rigour of their experimental design and analysis.
Of the micronutrient studies, three vitamin B studies reported some positive effects of treatment(Reference van Uffelen, Chinapaw and van Mechelen75, Reference Durga, van Boxtel and Schouten77, Reference Bryan, Calvaresi and Hughes79), although Bryan et al. (Reference Bryan, Calvaresi and Hughes79) also reported negative effects. Two multivitamin interventions reported benefits(Reference Cockle, Haller and Kimber65, Reference Wouters-Wesseling, Wagenaar and Rozendaal68), and Maylor et al. (Reference Maylor, Simpson and Secker80) found both positive and negative effects of Zn treatment on cognitive function.
Interestingly, four studies (10 %) showed only null and negative effects of treatment on cognitive function: three vitamin B studies(Reference McMahon, Green and Skeaff72, Reference Pathansali, Mangoni and Creagh-Brown73, Reference Lewerin, Matousek and Steen78) and one flavonoid intervention(Reference Fournier, Ryan and Robison27). The vitamin B studies were carried out on older populations and appear to have used t tests on multiple tasks with no correction for type I error.
Of the thirty-nine studies included in the present review, the size of study populations ranged from sixteen in a functional magnetic resonance imaging cross-over study(Reference Francis, Head and Morris87) to 818(Reference Durga, van Boxtel and Schouten77). Seventeen studies had fewer than 100 participants, and five studies had forty participants or less(Reference Gleason, Carlsson and Barnet20–Reference Duffy, Wiseman and File22, Reference Elsabagh, Hartley and Ali26, Reference Howes, Bray and Lorenz85). Power calculations were carried out in only four RCT, all researching vitamin B, with populations of 179 or more(Reference Eussen, de Groot and Joosten71, Reference McMahon, Green and Skeaff72, Reference van Uffelen, Chinapaw and van Mechelen75, Reference Aisen, Schneider and Sano76), but these were based largely on expected changes in physiological rather than cognitive markers. In other RCT, group size does not appear to have been driven by effect sizes found in previous studies and varies considerably across nutrient studies, for example Pathansali et al. (Reference Pathansali, Mangoni and Creagh-Brown73) with groups of twelve participants, and Durga et al. (Reference Durga, van Boxtel and Schouten77) with groups of over 400.
Participant ages were highly variable, ranging from 18 years to over 80 years of age, with twenty-nine studies (74 %) carried out on participants over the age of 50 years, including nine studies specifically carried out on adults of 65 years or more. Three further RCT included young and older adult populations(Reference Le Bars, Velasco and Ferguson23, Reference Basaria, Wisniewski and Dupree25, Reference Bryan, Calvaresi and Hughes79), two more studies focused on the 40–65 years age range(Reference Fournier, Ryan and Robison27, Reference Casini, Marelli and Papaleo84), and five others were conducted on 18- to 40-year-olds(Reference File, Jarrett and Fluck21, Reference Elsabagh, Hartley and Ali26, Reference Murray-Kolb and Beard81, Reference Francis, Head and Morris87, Reference Stough, Clarke and Lloyd92).
Single v. multiple cognitive domains
Among the thirty-nine RCT included in the present review, a variety of approaches was used to measure cognitive performance, mostly targeting multiple cognitive domains, with the exception of a flavonoid brain imaging study which used a single executive function ‘switching’ (or ‘shifting’) task(Reference Francis, Head and Morris87). Of the RCT testing multiple cognitive domains, the majority examined a range of specific memory processes and executive functions. The remaining RCT targeted general cognition rather than any specific domain (see Table 1).
Rationale for choice of cognitive tests
Twenty-one RCT based their choice of cognitive test(s) on findings from previous studies. Eleven cited sensitivity to the class of nutrient under investigation, such as B vitamins, flavonoids, other type of dietary manipulation, hormone replacement therapy or oestrogen(Reference File, Hartley and Elsabagh19, Reference File, Jarrett and Fluck21, Reference Duffy, Wiseman and File22, Reference Fournier, Ryan and Robison27, Reference Kreijkamp-Kaspers, Kok and Grobbee28, Reference Cockle, Haller and Kimber65, Reference Wouters-Wesseling, Wagenaar and Rozendaal68, Reference Eussen, de Groot and Joosten71, Reference Pathansali, Mangoni and Creagh-Brown73, Reference Bryan, Calvaresi and Hughes79, Reference Smith, Clark and Nutt83). Seven studies instead used tasks sensitive to ageing(Reference van Dongen, van Rossum and Kessels29, Reference Durga, van Boxtel and Schouten77, Reference Howes, Bray and Lorenz85), brain disorders and pharmacological interventions(Reference Le Bars, Velasco and Ferguson23, Reference Aisen, Schneider and Sano76, Reference Maylor, Simpson and Secker80), or changes in functional magnetic resonance imaging blood oxygenation level-dependent (BOLD) signal(Reference Francis, Head and Morris87). Three RCT selected tasks from established computerised psychometric test series: Ryan et al. (Reference Ryan, Croft and Mori90) used the Cognitive Drug Research® battery(Reference Wesnes, Simpson and Christmas93); Murray-Kolb & Beard(Reference Murray-Kolb and Beard81) the Cognitive Abilities Test battery(Reference Detterman94); and Mix & Crews(Reference Mix and Crews88) selected the Trail-Making Test on the basis that it appeared to be ‘one of the best measures of general cognitive functioning’ (p. 223) according to Reitan(Reference Reitan95). These tests were developed for use with multiple populations and settings; none was specifically designed with reference to micronutrient or phytochemical interventions. Revealingly, in the remaining eighteen studies(Reference Gleason, Carlsson and Barnet20, Reference Ho, Chan and Ho24–Reference Elsabagh, Hartley and Ali26, Reference Clarke, Harrison and Richards66, Reference McNeill, Avenell and Campbell67, Reference Wolters, Hickstein and Flintermann69, Reference Hvas, Juul and Lauritzen70, Reference McMahon, Green and Skeaff72, Reference Seal, Metz and Flicker74, Reference van Uffelen, Chinapaw and van Mechelen75, Reference Lewerin, Matousek and Steen78, Reference Kessler, Pajonk and Bach82, Reference Casini, Marelli and Papaleo84, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86, Reference Woo, Lau and Ho89, Reference Santos, Galduróz and Barbieri91, Reference Stough, Clarke and Lloyd92), no rationale for task choice was given, although four of these included dementia patients, so task selection was naturally restricted to measures appropriate to these populations(Reference Clarke, Harrison and Richards66, Reference Hvas, Juul and Lauritzen70, Reference Seal, Metz and Flicker74, Reference Kessler, Pajonk and Bach82).
Range of cognitive measures used
Across the thirty-nine RCT under investigation, 121 cognitive tasks were identified (see Table 2). After an analysis of the primary neuropsychological focus for each measure, it was calculated that thirty-seven memory tasks (for example, episodic, semantic and short-term), twenty-six executive function tasks, fourteen working memory tasks, nineteen psychomotor processing speed tasks, nine general or ‘global’ tasks, thirteen intelligence quotient (IQ) tasks (mostly to measure baseline between-group differences), two motor function tasks and one perception measure had been employed (see Table 2).
IQ, intelligence quotient; Y, yes; WMS, Wechsler Memory Scale; WAIS, Wechsler Adult Intelligence Scale; CAT, Cognitive Abilities Test; Exec Fn, Executive function; IDED, Intra Dimensional/Extra Dimensional Set Shifting Task; CANTAB, Cambridge Neuropsychological Test Automated Battery; CDR, cognitive drug research; HRNTB, Halstead–Reitan Neuropsychological Test Battery; CBT, Cognometer Battery of Tests; SCOLP, Speed and Capacity of Language-Processing Test; TP, Toulouse–Pieron Test.
Generally, there was little correspondence in measures between studies, with occasional notable exceptions. For instance, researchers from King's College London(Reference File, Hartley and Elsabagh19, Reference File, Jarrett and Fluck21, Reference Duffy, Wiseman and File22) used the same seven executive function and memory tasks in their flavonoid intervention studies on older populations as had previously been used in a group of 22-to 30-year-old subjects(Reference File, Hartley and Elsabagh19). Two tasks which were non-significant in the earlier study were excluded. While they found the Common Objects Recall Test and the Cambridge Neuropsychological Test Automated Battery (CANTAB) Intra Dimensional/Extra Dimensional Set Shifting Task Rule Learning and Reversal tests to be sensitive to flavonoid treatment in all three studies, these tasks appear to have been rarely employed elsewhere (see Table 2).
Executive function
Among the executive function tasks, one measure was primarily categorised as ‘focused attention’ and five as ‘sustained attention’. Four measured ‘switching’ or ‘shifting’, and one measured ‘inhibition’. None focused specifically on ‘updating’, although two tasks described here as measuring the more generalised ‘frontal function’ may include varying degrees of ‘updating’ as well as ‘switching’ and ‘inhibition’. In addition, there were five measures of ‘verbal fluency’, four ‘visual search’, two ‘decision-making’ and one ‘planning’ task (see Table 2).
Verbal fluency tasks were used in sixteen studies, and some positive effects were shown in four: flavonoids, B vitamins and multivitamins and minerals(Reference Gleason, Carlsson and Barnet20, Reference Wouters-Wesseling, Wagenaar and Rozendaal68, Reference Bryan, Calvaresi and Hughes79, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86). Verbal fluency tasks included category generation, where participants are asked to name as many category members as possible (for example, animals, transportation, etc), and initial-letter verbal fluency, where participants generate words beginning with a particular letter in a given time (for example, F, A or S). Category fluency tasks, which are also thought to involve an element of semantic memory(Reference Lezak, Howieson and Loring62), were significant in the two flavonoid studies(Reference Gleason, Carlsson and Barnet20, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86) and the multivitamin and minerals study(Reference Wouters-Wesseling, Wagenaar and Rozendaal68), although Wouters-Wesseling et al. (Reference Wouters-Wesseling, Wagenaar and Rozendaal68) reported significant baseline differences on this task, suggesting potential randomisation issues in this study. The less semantically demanding initial-letter fluency tasks were generally non-significant, with the exception of Bryan et al. (Reference Bryan, Calvaresi and Hughes79) who found significantly better fluency scores for vitamin B6 and placebo than for folate or vitamin B12.
The Stroop Colour–Word Task, which measures inhibition, was used in seven studies, and showed positive effects of flavonoid treatment in two(Reference Gleason, Carlsson and Barnet20, Reference Mix and Crews88). Conspicuously, the Trail-Making Task, which featured in thirteen studies(Reference Gleason, Carlsson and Barnet20, Reference Ho, Chan and Ho24, Reference Basaria, Wisniewski and Dupree25, Reference Kreijkamp-Kaspers, Kok and Grobbee28, Reference van Dongen, van Rossum and Kessels29, Reference Eussen, de Groot and Joosten71, Reference McMahon, Green and Skeaff72, Reference Bryan, Calvaresi and Hughes79, Reference Howes, Bray and Lorenz85, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86, Reference Mix and Crews88, Reference Woo, Lau and Ho89, Reference Stough, Clarke and Lloyd92) showed only a negative effect of treatment for vitamin B(Reference McMahon, Green and Skeaff72) and flavonoids(Reference Gleason, Carlsson and Barnet20). Although a common measure of ‘switching’, it is possible that the pen and paper nature of the Trail-Making Task limits its sensitivity. However, none of the four ‘switching’ tasks reviewed here has shown a positive treatment effect, perhaps suggestive of a more general limiting condition.
The Intra Dimensional/Extra Dimensional Set Shifting Task Rule Learning and Reversal task which measures frontal function, involving aspects of ‘switching’ or ‘shifting’, ‘updating’ and ‘inhibition’, and the Stockings of Cambridge planning task showed some sensitivity to flavonoid treatment for both young(Reference File, Hartley and Elsabagh19) and older(Reference File, Jarrett and Fluck21, Reference Duffy, Wiseman and File22) adults, for doses of 60–100 mg total isoflavones per d, over periods ranging from 6 to 12 weeks (see Tables 1 and 2). However, both tests failed to show any effect in a 6-week G. biloba trial in young adults(Reference Elsabagh, Hartley and Ali26).
Working memory
Eight measures of visuospatial or spatial working memory were identified, along with three numerical measures, one verbal measure, and one test of working memory span (for example, the Digit Span Backwards task which measures working memory capacity), which was used in nine studies (see Table 2). The eight visuospatial measures, used in eleven different studies, showed significant effects in four studies, with respect to flavonoid treatments(Reference Gleason, Carlsson and Barnet20, Reference Ryan, Croft and Mori90, Reference Santos, Galduróz and Barbieri91) and Zn(Reference Maylor, Simpson and Secker80), although Santos et al. (Reference Santos, Galduróz and Barbieri91) (n 48) and Gleason et al. (Reference Gleason, Carlsson and Barnet20) (n 30) had relatively small sample sizes, and the randomisation process adopted by Santos et al. (Reference Santos, Galduróz and Barbieri91) may not have been reliable as they reported baseline differences in IQ between groups.
Two computerised tasks described as measuring ‘spatial working memory’ appeared sensitive to Zn(Reference Maylor, Simpson and Secker80) and flavonoid(Reference Ryan, Croft and Mori90) treatment. Encouragingly, while they were described as measuring the same cognitive function, the tasks themselves varied considerably, suggesting that the observed effects were specific to domain rather than task. For instance, the Zn study Cambridge Neuropsychological Test Automated Battery (CANTAB) task required participants to search for, and remember, the locations of blue tokens inside a set of red boxes. In the Ryan et al. (Reference Ryan, Croft and Mori90) study, participants were shown a picture of a house and were asked to memorise the locations of nine lit windows; subsequently, they were shown individually lit windows and asked to decide if they had been lit earlier. The Rey–Osterrieth Complex Figure Test, used in four studies, showed significant positive treatment effects in two(Reference Gleason, Carlsson and Barnet20, Reference Santos, Galduróz and Barbieri91).
The working memory span task (the Digit Span Backward test) showed significant effects in only two of eight studies, for flavonoids(Reference Casini, Marelli and Papaleo84, Reference Stough, Clarke and Lloyd92), although as mentioned earlier, it is difficult to judge the quality of these results due to the lack of statistical rigour to which the data were subjected, and/or to lack of information provided.
Memory
In terms of the other memory processes under investigation, thirty-one studies investigated episodic memory, or memory for events. Episodic memory tests included both verbal and visual tasks, such as word learning, paragraph recall, picture recall, recall of common objects and delayed matching to sample. Tasks showed significant treatment effects in a quarter of the studies: five flavonoid RCT(Reference File, Hartley and Elsabagh19, Reference File, Jarrett and Fluck21, Reference Duffy, Wiseman and File22, Reference Santos, Galduróz and Barbieri91, Reference Stough, Clarke and Lloyd92), two B vitamin RCT(Reference Durga, van Boxtel and Schouten77, Reference Bryan, Calvaresi and Hughes79) and one multivitamin and mineral RCT(Reference Wouters-Wesseling, Wagenaar and Rozendaal68). A negative effect of treatment was found for B vitamins(Reference McMahon, Green and Skeaff72).
As can be seen from Table 2, verbal episodic memory tasks such as the Rey Auditory Verbal Learning Test and other word learning tests showed some sensitivity to B vitamin, multivitamin and mineral, and flavonoid treatments, whereas visual or visuospatial episodic tasks were more sensitive to flavonoid treatment.
Only six of the episodic memory tests are described as such in five of the RCT under review(Reference File, Hartley and Elsabagh19, Reference File, Jarrett and Fluck21, Reference Duffy, Wiseman and File22, Reference Elsabagh, Hartley and Ali26, Reference Kreijkamp-Kaspers, Kok and Grobbee28) (see Table 2). Nineteen other episodic memory tasks were defined instead as testing verbal memory. Twelve further tasks were defined as visual, spatial, or visuospatial memory tasks. In total, therefore, thirty-seven of the reviewed tasks could be described as measuring episodic memory. This includes any verbal or visuospatial memory tasks without a major working memory element, but excludes memory span measures, which are judged to measure short-term memory.
There was a wide range of tasks designed to measure verbal episodic memory, including immediate and delayed recall and/or recognition of between ten and thirty words (see ‘Word-Learning Tests’ in Table 2), immediate and delayed paragraph recall, and paired-associates cued-recall, all of which may impose different levels of cognitive demand, and access different types of cognitive processes and domains. Visuospatial episodic measures encompassed an equally wide range of tasks, ranging from picture recall and reproduction, for example, the Benton Visual Retention Test, to memory for faces, for example, the Wechsler Memory Scale(Reference Wechsler96) Faces I and Faces II subscale. Immediate and delayed recall or recognition for the same task are counted here as a single memory task, for example, the Wechsler Memory Scale Logical Memory Test I and II, Memory 1 and 2 Tests, Verbal Memory 1 and 2 Tests and the Wechsler Adult Intelligence Scale(Reference Wechsler97) Visual Memory 1 and 2 Tests.
Eight studies measured semantic memory(Reference Gleason, Carlsson and Barnet20, Reference Ho, Chan and Ho24, Reference Kreijkamp-Kaspers, Kok and Grobbee28, Reference Eussen, de Groot and Joosten71, Reference Lewerin, Matousek and Steen78, Reference Bryan, Calvaresi and Hughes79, Reference Howes, Bray and Lorenz85, Reference Woo, Lau and Ho89), but only one vitamin B RCT showed any positive effect of treatment(Reference Lewerin, Matousek and Steen78). Again, there was a wide variation between the tasks: the Spot The Word Vocabulary Test requires participants to differentiate between words and non-words(Reference Bryan, Calvaresi and Hughes79), whereas the Wechsler Adult Intelligence Scale Similarities Test(Reference Eussen, de Groot and Joosten71, Reference Howes, Bray and Lorenz85, Reference Santos, Galduróz and Barbieri91) requires participants to describe similarities between sets of nouns. The Boston Naming Test measures the ability to name picture objects(Reference Gleason, Carlsson and Barnet20, Reference Ho, Chan and Ho24, Reference Kreijkamp-Kaspers, Kok and Grobbee28, Reference Howes, Bray and Lorenz85, Reference Woo, Lau and Ho89), and the Synonyms task measures the ability to select correct synonyms for given words(Reference Lewerin, Matousek and Steen78).
Short-term memory, as measured by the digit span or recall tasks and the Sternberg Memory Scanning Task, was measured in thirteen studies. While a decline was found in a flavonoid study(Reference Howes, Bray and Lorenz85), this effect disappeared after the authors corrected for type I error. No other effects of treatment were found for short-term memory measures.
As can be seen from Table 2, cognitive domains which have yet to be explored in micronutrient human RCT include procedural, implicit and prospective memory; it is therefore not possible to provide a more comprehensive assessment of the relationship between these dietary components and the full range of memory processes at this stage.
Psychomotor/motor function
Psychomotor processing speed was measured in twenty-three studies, using a total of nineteen psychomotor tasks. The most popular type of test, used in eleven studies, was the Wechsler Adult Intelligence Scale Digit Symbol Substitution or Digit Symbol Coding Test, designed as a measure of psychomotor processing speed(Reference Lezak, Howieson and Loring62) but often used as a measure of working memory. These tasks were shown to be sensitive to flavonoid(Reference Casini, Marelli and Papaleo84, Reference Mix and Crews88, Reference Santos, Galduróz and Barbieri91, Reference Stough, Clarke and Lloyd92) and vitamin B(Reference Pathansali, Mangoni and Creagh-Brown73, Reference van Uffelen, Chinapaw and van Mechelen75) treatments. Three other psychomotor processing speed tasks were also sensitive to vitamin B treatment: the Boxes Test(Reference Bryan, Calvaresi and Hughes79), Identical Forms Task(Reference Lewerin, Matousek and Steen78) and Letter Digit Substitution Test(Reference Durga, van Boxtel and Schouten77). The next most popular paradigm was choice reaction time (CRT), a complex form of the simple reaction time (SRT) paradigm used in four RCT. The CRT task also has a decision-making component and was sensitive to a multivitamin intervention(Reference Cockle, Haller and Kimber65), unlike the SRT which was non-significant across four studies(Reference Maylor, Simpson and Secker80, Reference Smith, Clark and Nutt83, Reference Ryan, Croft and Mori90, Reference Stough, Clarke and Lloyd92).
Multiple cognitive measures
Within a domain, the number of measures used varied widely from study to study. Whereas the majority selected one or two measures to represent a single cognitive domain, Santos et al. (Reference Santos, Galduróz and Barbieri91) used twenty-two measures overall, including ten executive function tasks, six memory tests and four IQ performance measures. Bryan et al. (Reference Bryan, Calvaresi and Hughes79) used eight measures of executive function and working memory, and five other memory tasks. Gleason et al. (Reference Gleason, Carlsson and Barnet20) used six executive function tasks and four memory measures and Howes et al. (Reference Howes, Bray and Lorenz85) used four executive function tasks and six memory measures.
One advantage of the multiple measures approach is that it provides an opportunity for researchers to compute a composite measure of a particular domain, thus controlling in part for the idiosyncrasies of individual tasks and the potential for type I error. Murray-Kolb & Beard(Reference Murray-Kolb and Beard81) and Ryan et al. (Reference Ryan, Croft and Mori90) used factor analysis to group ‘families’ of tasks to reduce the number of cognitive variables. Neither of these studies found significant results using these composite scores. Notwithstanding this, the approach has some promise, although caution is needed when deriving composite measures, as cognitive tasks are likely to be correlated and therefore some statistical techniques may not be warranted.
With an increasing number of cognitive tasks there comes an increased danger of type I error during analysis. This issue was dealt with in two cases(Reference Duffy, Wiseman and File22, Reference Fournier, Ryan and Robison27) by using multiple ANOVA, which reduces overall error.
Global cognitive measures
Five studies targeted general rather than specific cognition, using only general or ‘global’ measures of cognitive functions such as the MMSE, the ADAS-Cog and Hasegawa's Dementia Rating Scale(Reference Le Bars, Velasco and Ferguson23, Reference Clarke, Harrison and Richards66, Reference Seal, Metz and Flicker74, Reference Aisen, Schneider and Sano76, Reference Kessler, Pajonk and Bach82). Two RCT relied solely on the MMSE or the ADAS-Cog for measurement of cognitive function: one of these studied the effects of flavonoids on Alzheimer's disease patients, showing a significant positive effect of flavonoid treatment over 12 months(Reference Le Bars, Velasco and Ferguson23). The other found no effect of vitamin B treatment in older adults with a MMSE score ranging from 6 to 28 out of a possible 30(Reference Seal, Metz and Flicker74). Overall, the MMSE was used as a performance measure in fourteen RCT(Reference Ho, Chan and Ho24, Reference Kreijkamp-Kaspers, Kok and Grobbee28, Reference Cockle, Haller and Kimber65, Reference Clarke, Harrison and Richards66, Reference Wouters-Wesseling, Wagenaar and Rozendaal68, Reference Hvas, Juul and Lauritzen70–Reference Seal, Metz and Flicker74, Reference Aisen, Schneider and Sano76, Reference Kessler, Pajonk and Bach82, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86, Reference Woo, Lau and Ho89) but only showed a significant effect of treatment in one(Reference Woo, Lau and Ho89), although this may have been due to the inclusion of cognitively impaired controls with a baseline MMSE score of less than 24 out of 30. The ADAS-Cog was used in four studies(Reference Le Bars, Velasco and Ferguson23, Reference Clarke, Harrison and Richards66, Reference Aisen, Schneider and Sano76, Reference Kessler, Pajonk and Bach82), and showed significant effects of treatment for patients with a baseline MMSE score of less than 24 in one 12-month flavonoid Alzheimer's disease intervention(Reference Le Bars, Velasco and Ferguson23), with the placebo group showing a significantly greater decline on the ADAS-Cog than the treatment group.
Cognitive screening and baseline population measures
Twelve studies (31 %) included participants with MMSE scores of less than 24 out of 30, which is suggestive of cognitive impairment or dementia(Reference Le Bars, Velasco and Ferguson23, Reference Ho, Chan and Ho24, Reference van Dongen, van Rossum and Kessels29, Reference Clarke, Harrison and Richards66, Reference Wouters-Wesseling, Wagenaar and Rozendaal68, Reference Hvas, Juul and Lauritzen70, Reference Eussen, de Groot and Joosten71, Reference Seal, Metz and Flicker74, Reference Aisen, Schneider and Sano76, Reference Kessler, Pajonk and Bach82, Reference Smith, Clark and Nutt83, Reference Santos, Galduróz and Barbieri91). Eighteen studies (46 %) did not screen for cognitive impairment; of these, thirteen included participants over the age of 50 years(Reference File, Hartley and Elsabagh19, Reference Duffy, Wiseman and File22, Reference Basaria, Wisniewski and Dupree25, Reference Fournier, Ryan and Robison27, Reference Kreijkamp-Kaspers, Kok and Grobbee28, Reference Cockle, Haller and Kimber65, Reference Wolters, Hickstein and Flintermann69, Reference Lewerin, Matousek and Steen78, Reference Bryan, Calvaresi and Hughes79, Reference Casini, Marelli and Papaleo84, Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86, Reference Woo, Lau and Ho89, Reference Ryan, Croft and Mori90) and may also, therefore, have unwittingly included cognitively impaired participants. Three studies reported baseline statistical differences in cognitive performance(Reference Wouters-Wesseling, Wagenaar and Rozendaal68, Reference Wolters, Hickstein and Flintermann69, Reference Santos, Galduróz and Barbieri91). Eleven studies did not report statistical baseline comparisons for any cognitive performance measures, making it difficult to assess the efficacy of their randomisation process(Reference File, Hartley and Elsabagh19, Reference Gleason, Carlsson and Barnet20, Reference Elsabagh, Hartley and Ali26, Reference Fournier, Ryan and Robison27, Reference Murray-Kolb and Beard81–Reference Casini, Marelli and Papaleo84, Reference Mix and Crews88, Reference Woo, Lau and Ho89, Reference Stough, Clarke and Lloyd92), and Bryan et al. (Reference Bryan, Calvaresi and Hughes79) only provided this information for some of their cognitive tasks. Casini et al. (Reference Casini, Marelli and Papaleo84) collected cognitive data at the end of each treatment arm in their cross-over design, thereby providing no real baseline cognitive data.
In general, IQ tests were used to provide measures of possible baseline differences between treatment groups, although Santos et al. (Reference Santos, Galduróz and Barbieri91) used four crystallised intelligence IQ tasks as performance measures and, unusually, found significant differences after 8 months of treatment with G. biloba. Examination of means shows that the treatment group had generally lower baseline IQ scores but similar endpoint scores on these measures, suggesting a measurement correction rather than a treatment effect.
Reasons given for null results
Fourteen RCT were considered by their authors to be underpowered with too-small sample sizes. Sixteen suggested that the duration of the treatment was too short, in interventions that ranged from 4 weeks(Reference Pathansali, Mangoni and Creagh-Brown73, Reference Seal, Metz and Flicker74) to 2 years(Reference McMahon, Green and Skeaff72). Two suggested that the doses used were inadequate(Reference Ho, Chan and Ho24, Reference McMahon, Green and Skeaff72). Authors of six RCT warned that the results may not be generalisable across other populations(Reference Basaria, Wisniewski and Dupree25, Reference McMahon, Green and Skeaff72, Reference Aisen, Schneider and Sano76, Reference Durga, van Boxtel and Schouten77, Reference Maylor, Simpson and Secker80, Reference Stough, Clarke and Lloyd92); for instance, as the authors pointed out, vitamin B trials may have different results when carried out in countries where fortification with folic acid is mandatory(Reference Aisen, Schneider and Sano76, Reference Durga, van Boxtel and Schouten77). With a few exceptions(Reference Cockle, Haller and Kimber65, Reference Francis, Head and Morris87, Reference Mix and Crews88), researchers have not focused on the sensitivity of the cognitive measures when explaining null results.
Discussion
Less than half the studies reviewed showed any positive treatment effects of target nutrients. In terms of looking at efficacy for individual cognitive tasks, this equates to only thirty-eight out of 121 (31 %) displaying some sensitivity to chronic supplementation. This is consistent with findings from other reviews of nutrition and prevention of cognitive decline, such as Hoyland et al. (Reference Hoyland, Lawton and Dye98), Jia et al. (Reference Jia, McNeill and Avenell17) and Macready et al. (Reference Macready, Kennedy and Ellis18). Hoyland et al. (Reference Hoyland, Lawton and Dye98), for instance, found that in macronutrient intervention studies, the most significant performance differences occurred with the most demanding tasks and with delayed memory performance. It is possible that the tasks showing the greatest sensitivity in micronutrient and flavonoid interventions may also impose the greatest cognitive demands and/or memory performance stimulus–response delays; this is certainly worth considering for systematic investigation in the future, but it has not yet been explored in the field of micronutrient or flavonoid human chronic RCT research.
The quality-grading exercise revealed only six studies that could be considered as examples of best practice: three vitamin B studies(Reference Hvas, Juul and Lauritzen70, Reference Eussen, de Groot and Joosten71, Reference Durga, van Boxtel and Schouten77), two flavonoid studies(Reference Ho, Chan and Ho24, Reference Kreijkamp-Kaspers, Kok and Grobbee28) and a Zn RCT(Reference Maylor, Simpson and Secker80). The remainder failed to meet one or more of the criteria set out in the Methods section for category A, including: inappropriate comparator, outcome measure, statistical method and/or reporting; insufficient power; reporting errors; and unclear description of the population, setting or reporting of drop-outs. Ideally, there should be less than 20 % drop-out and RCT should be double-blinded where possible. While findings were mixed, and interpretation of the findings is made more challenging by methodological shortcomings in individual studies, some cognitive domains did show more sensitivity to nutritional supplementation than others. Specifically, there were reports of positive treatment effects on tasks with a spatial memory component with Zn and flavonoids in two RCT, one of which was graded category A(Reference Maylor, Simpson and Secker80). Vitamin B and Fe showed benefits for psychomotor processing speed, with the vitamin B study also classified as a category A RCT(Reference Durga, van Boxtel and Schouten77), although this was not supported by findings from the two other category A vitamin B studies(Reference Hvas, Juul and Lauritzen70, Reference Eussen, de Groot and Joosten71). Flavonoids also showed positive effects for executive function, in particular frontal function (Intra Dimensional/Extra Dimensional Set Shifting Task Rule Learning and Reversal Task), inhibition (Stroop Colour–Word Task), planning (Stockings of Cambridge Task), sustained attention (Paced Auditory Serial Addition Test), and category fluency tasks, although some of these findings are from category C studies and should therefore be interpreted with a degree of caution. In addition, Zn generated an improvement on a visual search task (Match to Sample)(Reference Maylor, Simpson and Secker80). Finally, there were also some nutrient effects on various episodic memory tasks, in particular the Common Objects Recall Test, the Rey Auditory-Verbal Learning Test, a short story recall task and various word learning tests, and the Delayed Matching To Sample Test, a measure of visuospatial episodic memory(Reference File, Hartley and Elsabagh19, Reference File, Jarrett and Fluck21, Reference Duffy, Wiseman and File22, Reference Wouters-Wesseling, Wagenaar and Rozendaal68, Reference Durga, van Boxtel and Schouten77, Reference Bryan, Calvaresi and Hughes79, Reference Stough, Clarke and Lloyd92). While some of these findings are from category C studies, the majority are from higher-graded studies(Reference File, Jarrett and Fluck21, Reference Duffy, Wiseman and File22, Reference Durga, van Boxtel and Schouten77, Reference Bryan, Calvaresi and Hughes79).
Interestingly, some of the domain-specific nutrient effects shown here in human RCT studies overlap quite closely with findings reported in animal studies. For example, in rat studies, positive effects of flavonoid intake have been shown on spatial memory tasks(Reference Joseph, Shukitt-Hale and Denisova10, Reference Williams, El Mohsen and Vauzour99) and psychomotor performance(Reference Shukitt-Hale, Cheng and Joseph100). In animal models, these tasks are thought to involve brain regions such as the hippocampus, which is associated with place learning(Reference Devan, Goad and Petri101), the basal ganglia and striatum which are linked to cue and response learning(Reference McDonald and White102), and the prefrontal cortex, which is central to rule acquisition in procedural learning(Reference Zyzak, Otto and Eichenbaum103). In rats, these same regions have been shown to be responsive to treatment by flavonoids, for instance via increased striatal muscarinic receptor sensitivity(Reference Joseph, Shukitt-Hale and Denisova10, Reference Shukitt-Hale, Carey and Simon16, Reference Williams, El Mohsen and Vauzour99, Reference Andres-Lacueva, Shukitt-Hale and Galli104, Reference Youdim, Shukitt-Hale and Joseph105). In terms of mechanisms, there is some suggestion that flavonoids may improve cell signalling which in turn may enhance neuroprotection through their antioxidant and anti-inflammatory properties, and may also promote neurogenesis. According to neuroimaging studies, similar brain regions are involved in human tasks. For example, areas including the hippocampal formation have been implicated in spatial memory(Reference Spiers and Maguire48), whereas psychomotor processing speed may include a striatal component(Reference Mink50). Whether or not the mechanisms of action are the same in both animals and humans remains to be seen.
Despite strong evidence from animal studies of a positive relationship impact of certain nutrients on cognitive function, these important findings do not appear to be driving RCT research in human studies. In human flavonoid research, for instance, there appears to be little explicit reference to animal work in the search for ‘primary dependent measures’ to use in human flavonoid intervention studies. As an example, spatial memory was explored in only 65 % of flavonoid studies, despite evidence from animal work that spatial processing may be the cognitive domain which is most sensitive to flavonoid intervention. While only 20 % of those studies that included a task with a spatial memory component showed positive results, it is noteworthy that those specifically designed to measure ‘spatial working memory’ did show some nutrient-driven improvements (see Table 2, under ‘Working memory: visual/visuospatial’). This may have been due to the level of cognitive demand, or to the number and complexity of cognitive functions or processes required to perform the task. One key difference between animal and human studies though is that spatial tasks used in animal studies involve the rodent subject moving within a three-dimensional space with ever-changing environmental cues. In human studies, spatial tasks tend to be presented in a two-dimensional environment, arguably involving very different kinds of cognitive functioning to those employed in animal studies. While a number of flavonoid studies used memory tasks with a visual or visuospatial element, few have attempted to measure spatial working memory in a similar way to that in which it is assessed in animal paradigms; this is particularly true of cognitive processes and functions that are normally employed in three-dimensional maze task environments such as navigation and orientation. Better analogues of animal tasks or more similarly demanding tasks should be developed, perhaps using new technologies such as virtual reality (see, for example, Astur et al. (Reference Astur, Taylor and Mamelak106)).
While some preliminary conclusions can be drawn from the available data, the ability to do so is hindered by an inconsistent approach to cognitive testing in nutrition supplementation studies, particularly in terms of the cognitive domains measured and tasks employed. Some researchers targeted specific domains such as memory or executive function, whereas others measured multiple domains. Both single and multiple domain studies included different numbers of measures within each cognitive domain, varying from single measures per domain to multiple measures, and researchers treating cognitively impaired populations were limited to using general measures of cognitive function for ethical and/or practical reasons. There is therefore little correspondence between the approaches taken in the measurement of cognitive function within or across nutrients, making it difficult to build a comprehensive picture of the relationship between nutrition and cognitive function.
It is also clear from looking at Table 2 that while some cognitive domains have been covered fairly extensively, for instance, episodic memory, others, such as semantic memory, have not. There are also a number of key gaps in the domains measured where no RCT research has been carried out at all, namely implicit, procedural, and prospective memory. Collectively, procedural memory and implicit memory are involved in skill learning(Reference Schacter107, Reference Anderson108). They are therefore important to study from a practical perspective, particularly for every-day cognitive functioning in older human adults, and there is strong evidence that the striatum, previously shown to be sensitive to nutrient supplementation in animal models(Reference Joseph, Shukitt-Hale and Denisova10), is involved in procedural memory(Reference Poldrack and Packard109). Furthermore, implicit compared with explicit memory tasks are thought to make minimal demands on hippocampal-based brain regions(Reference Buckner, Petersen and Ojemann110); as many animal studies have focused on the effects of nutrients on cognition using tasks with a significant hippocampal component, the inclusion of implicit memory tasks would provide a good empirical test of the degree to which benefits observed for particular nutrients are brain-region specific. Prospective memory declines in older adulthood, especially when the task makes demands on the provision and allocation of attentional/executive resources(Reference McDaniel, Einstein, Rendell, Kliegel, McDaniel and Einstein111), as it recruits areas of the prefrontal cortex and parietal cortex(Reference Burgess, Quayle and Frith112), and is also therefore worthy of exploration. Overall, distinctions between implicit v. explicit memory, procedural v. declarative memory, and prospective v. retrospective memory represent major taxonomic divisions in human memory and cannot simply be ignored.
Another source of inconsistency comes from the wide range of measures used across studies, in particular those exploring the same nutrient, making comparability between studies and within domains difficult. Not only has a large number of tasks been used within a single domain, the variability between those tasks is also a concern. For instance, the verbal memory tasks varied enormously from each other, from recall (immediate and delayed) and recognition of word lists, paragraphs and stories to paired-associates recall; the word lists in the memory tasks ranged from ten to thirty words; semantic memory tasks measured (1) naming of objects, (2) differentiation of words and non-words, (3) similarities between nouns and (4) synonyms for given words; and two spatial working memory tasks differed in the type of stimuli shown and the level of complexity of the task. So while all these tasks may be measuring cognitive function within the same primary domain, they may also impose different levels of cognitive demand, and involve different processes or even different secondary cognitive domains. One sensible approach may therefore be to include at least two measures of a cognitive domain of interest where possible.
One of the more concerning findings in the present review is that researchers continue to use cognitive tests that seemingly display little sensitivity to nutritional supplementation. The MMSE, recommended for use as a screening tool for cognitive impairment, was used as a performance measure in 38 % of the studies but failed to show significance in all but one, suggesting either that the nutrient in question has no effect on general cognitive function, or that the MMSE is not sensitive enough to measure short-term micronutrient- or flavonoid-driven cognitive change. While Woo et al. (Reference Woo, Lau and Ho89) did find significant positive effects of flavonoid treatment on postmenopausal women on three cognitive tasks including the MMSE (which may have been due to differences in baseline scores), Cockle et al. (Reference Cockle, Haller and Kimber65) found no change using the MMSE but a positive effect of treatment using a choice reaction time task, which suggests that the issue here may be one of test sensitivity rather than lack of treatment efficacy. The MMSE may simply not be sensitive enough to capture the subtle cognitive changes associated with dietary treatments, particularly with younger and/or cognitively healthy populations who are likely to perform at ceiling. Over half the reviewed studies which used the MMSE as a performance measure had normal populations without dementia or mild cognitive impairment, which may also have contributed to the large number of null findings. Similarly, the Trail-Making Task was only significant in two out of thirteen studies, showing a negative effect of treatment in both cases. The Boston Naming Test was used as a measure of semantic memory in six studies, including two category A RCT(Reference Ho, Chan and Ho24, Reference Kreijkamp-Kaspers, Kok and Grobbee28), but was not significant in any. Of course, such consistently negative results may suggest that nutritional supplementation has no effect, or even a detrimental effect, on the specific cognitive function(s) measured by these tasks. As positive treatment effects have been observed with other, similar tasks though, it is more likely a question of task sensitivity and/or intensity of task demand. The possibility of type I error and lack of power in relation to effect size should also be considered. Therefore, while replication of cognitive methodologies is important, it would be good practice to move away from measures that repeatedly show null results and to identify tasks that are more consistently sensitive to treatment in the target populations.
There is also considerable confusion in the literature concerning the terms used to describe the same task. As one example, the Trail-Making Task Parts A and B have been referred to by different researchers as measuring visuomotor tracking and attention(Reference Kritz-Silverstein, Von Muhlen and Barrett-Connor86); speed for attention, sequencing, mental flexibility, visual search and motor function(Reference Ho, Chan and Ho24); information processing and prefrontal lobe function(Reference Howes, Bray and Lorenz85); sequencing and shifting perceptual sets, concentration/vigilance and visuomotor scanning/tracking speed(Reference Mix and Crews88). While all these terms may have some validity, this inconsistency may lead to confusion for future researchers, and may hinder systematic comparison and interpretation of the specific tasks across studies. There were also discrepancies between what tasks actually measured. For instance, the Similarities Test was categorised by Eussen et al. (Reference Eussen, de Groot and Joosten71) as executive function, by Howes et al. (Reference Howes, Bray and Lorenz85) as verbal reasoning(Reference Howes, Bray and Lorenz85) and as an aspect of general intelligence by Santos et al. (Reference Santos, Galduróz and Barbieri91). In a number of cases, the tasks have been listed with no description and the reader is simply referred to the manual. This is especially true of Wechsler Adult Intelligence Scale and Wechsler Memory Scale tasks, which have undergone a number of updates over the years, and researchers may not have access to particular editions. Without access to the manuals it is often difficult to assess the suitability and sensitivity of tasks for future research.
The focus of much of the research reviewed here currently centres on identifying the correct dose and duration of treatment to bring about improvements in cognitive function, and adequately powering the study to enable significant differences to be shown. Authors of approximately a third of the studies reviewed suggested that their sample sizes were too small to provide adequate statistical power, or that their significant findings were a result of type I error. Those who had carried out power calculations often had to base their calculations on previous research in another area, or on projected changes in biomarkers rather than increments in cognitive change, due to the lack of availability of more appropriate data. Some authors warned that their results may not be generalisable to other populations. While there clearly are inconsistencies of approach in terms of intervention dose and duration, and issues about statistical power, sample size and type I error, it is our view that definitive conclusions regarding the efficacy of the dose or duration of a particular nutrient cannot be made until the sensitivity of the cognitive measures used has been established for the type of nutrient under investigation. This may be achievable through, for instance, the systematic manipulation of cognitive demand in an acute design, as seen in the glucose RCT study literature, where glucose was shown to be of greater benefit on more cognitively demanding tasks(Reference Hoyland, Lawton and Dye98, Reference Meikle, Riby and Stollery113).
Further consideration of individual differences such as age needs to be given when selecting cognitive tests. For example, as can be seen from Table 1, less than a quarter of the studies were carried out on younger adults. As it is possible that the optimal window of opportunity for slowing or reversal of age-related declines by dietary means may occur much earlier in the life cycle, younger populations should be given greater consideration. Clearly tests such as the MMSE are unlikely to be sensitive enough for younger, healthier populations, but tasks that are more challenging and sensitive for younger participants will also inform about older cognitively intact and/or at-risk populations, allow better comparability between young and older age groups, and enable a greater understanding of diet-related cognitive evolution throughout the lifespan. Screening is important, as there is a danger of including participants with mild cognitive impairment within a healthy study sample, which makes interpretation difficult, as results may not be transferable to cognitively healthy populations. Finally, reporting of statistical differences in cognitive scores at baseline is essential, as this enables an informed assessment of parity between treatment groups based on initial cognitive performance.
Conclusion and recommendations
Research on the relationship between nutrition and cognitive function clearly has a long way to go, and an increasingly complex picture is emerging. Findings from epidemiological, longitudinal, observational and animal studies suggest that certain nutrients may offer great potential in the treatment of age-related cognitive disorders, and excitingly, certain nutrients may have specific roles to play in improving specific cognitive functions. However, this potential is currently hampered by methodological difficulties, and by the unsystematic approach being adopted across studies, reducing their comparability and making interpretation of findings difficult. As well as interpretative difficulties, such a ‘scattergun’ approach limits the ability to make reliable comparisons across studies. The above findings demonstrate the necessity for more standardised, sensitive, and theory-derived sets of cognitive tasks in future clinical and dietary intervention studies.
The present review offers a number of implications for future chronic dietary studies. Firstly, there is a clear need to pay closer attention to animal studies and to previous human work when identifying appropriate cognitive tests, both within and across nutrients. Where reliable and nutritionally sensitive cognitive tasks are identified, researchers should endeavour to incorporate them into subsequent test batteries so that patterns relating to issues such as dose–response effects and nutrient type sensitivity can more easily emerge. Secondly, tasks need to be appropriate to the target population, be sufficiently sensitive to the nutrient under investigation, be sufficiently demanding to discriminate between good and poor performers, and capable of avoiding ceiling (and floor) effects. Thirdly, greater care should be taken to avoid statistical artifacts likely to bring about null findings, such as lack of power, and type I errors. Finally, including more than a single task within a domain (for example, two executive function tasks) would greatly help to determine whether a null effect for a particular nutrient is a real finding or reflects a lack of task sensitivity to the nutrient in question. Adopting these simple guidelines will bring much needed clarity and methodological rigour and, in the longer term, permit researchers to make much clearer policy recommendations for dietary intake in the general public.
Acknowledgements
J. P. E. S., L. T. B. and C. M. W. are sponsored by the Biotechnology and Biological Sciences Research Council (BB/F008953/1). J. P. E. S. is also funded by BBSRC grants (BB/E023185/1; BB/G005702/1; BB/C518222/1; BB/E023185/1) and the Medical Research Council (G0400278/NI02). J. P. E. S. and O. B. K. are funded by the Food Standards Agency (FSA; grant N02039). This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.
A. L. M. and L. T. B. were responsible for paper conceptualisation and manuscript preparation; O. B. K., J. A. E., C. M. W. and J. P. E. S. contributed to paper conceptualisation and manuscript editing.
The authors are not aware of any conflicts of interest relating to this paper.