Dietary intake is among the top risk factors for chronic diseases(Reference Afshin, Sur and Fay1,Reference English, Ard and Bailey2) . Research examining dietary intake has historically focused on single foods, nutrients or other dietary constituents(Reference Mozaffarian, Rosenberg and Uauy3). As the focus of public health nutrition shifted from the prevention of deficiency to the prevention of chronic diseases, research likewise shifted towards the examination of dietary patterns, aiming to capture how foods and beverages are consumed in real life(Reference Mozaffarian, Rosenberg and Uauy3–Reference Schulz, Oluwagbemigun and Nöthlings5). Humans typically do not consume foods or nutrients on their own, but in the context of a broader dietary pattern(Reference Mozaffarian, Rosenberg and Uauy3,Reference Reedy, Subar and George4) . Accordingly, food-based dietary guidelines are now typically focused on patterns of intake rather than single dietary components(Reference Herforth, Arimond and Álvarez-Sánchez6). It is likely the synergistic and antagonistic relationships among the multiple foods, beverages and other dietary components that humans consume that influence health rather than individual components(Reference Reedy, Subar and George4). In addition to this multidimensionality, dietary patterns are dynamic, changing from meal to meal, day to day and across the life course(Reference Reedy, Subar and George4,7) . Further, dietary patterns are shaped by culture, social position and other contextual factors(Reference Imamura, Micha and Khatibzadeh8,Reference Delormier, Frohlich and Potvin9) . However, incorporating the domains of multidimensionality, dynamism and contextual factors into dietary patterns analysis is a difficult task.
Traditional approaches to identify dietary patterns, including ‘a priori’ and ‘a posteriori’ approaches, are useful for understanding overall dietary patterns or the diet quality of populations and population subgroups(Reference Ocké10). For example, ‘a priori’ methods like the Healthy Eating Index-2020 and the Healthy Eating Food Index-2019 are generally investigator driven(Reference Shams-White, Pannucci and Lerman11,Reference Brassard, Munene and Pierre12) and consider multiple components such as fruits and vegetables and whole grains as inputs, but typically compress the multidimensional construct of dietary patterns to a single unidimensional score reflecting overall diet quality(Reference Kirkpatrick, Reedy and Krebs-Smith13,Reference Bodnar, Cartus and Kirkpatrick14) . ‘A posteriori’ approaches are data-driven and have also been widely used to identify dietary patterns. Commonly applied data-driven approaches include clustering methods (e.g., k-means, Ward’s method), principal component analysis and factor analysis, providing opportunities to identify dietary patterns through statistical modelling or clustering algorithms rather than relying on researcher hypotheses(Reference Hu15). These approaches compress dietary components to key food groupings typically expressed as single scores(Reference Ocké10,Reference Michels and Schulze16) . By reducing the dimensionality of dietary patterns, these methods are limited in their ability to explain the wide variation in dietary intakes(Reference Reedy, Subar and George4). Methods employed to traditionally characterise dietary patterns using ‘a priori’ and ‘a posteriori’ approaches thus address multidimensionality to some extent, but do not allow for explorations of dietary patterns in their totality because they miss potential synergistic or antagonistic associations among dietary components(Reference Reedy, Subar and George4,Reference Bodnar, Cartus and Kirkpatrick14,Reference Reedy, Krebs-Smith and Hammond17) .
Novel methods that have not traditionally been used to identify dietary patterns, such as probabilistic graphical modelling, latent class analysis and machine learning algorithms (e.g., random forest, neural networks), may capture complexities like dietary synergy. There is no clear delineation between traditional and novel methods, and specifically defining what is novel is challenging given it naturally implies an evolution of methods. Nonetheless, there is a growing interest among nutrition researchers in the application of methods that have not typically been used to capture dietary complexity, with these methods often centred in machine learning(18). To date, there have been perspectives and narrative reviews on the application of machine learning in nutrition(Reference Kirk, Kok and Tufano19–Reference Côté and Lamarche21), and a recent systematic review of studies that applied machine learning approaches to assess food consumption(Reference Oliveira Chaves, Gomes Domingos and Louzada Fernandes22). However, there has not been an assessment of studies applying novel methods to characterise dietary patterns. Given the rapid adoption of these methods within the field of health(Reference Le Glaz, Haralambous and Kim-Dufor23–Reference Morgenstern, Buajitti and O’Neill26), it is increasingly important for researchers to have a basic understanding of available methods and how they are being applied in the field. This will facilitate the synthesis of evidence from a range of methodological inputs to inform food-based dietary guidelines and other policies and programs that promote health. The objective of this scoping review was therefore to describe the use of novel methods not traditionally used to characterise dietary patterns in the published literature.
Methods
The review was conducted in accordance with the JBI Manual for Evidence Synthesis(27), which was developed using the Arksey and O’Malley framework(Reference Arksey and O’Malley28). Reporting follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews(Reference Tricco, Lillie and Zarin29).
Defining novel methods
The novel methods considered were based on a preliminary search of the literature and the expertise of the research team and included systems methods (e.g., agent-based modelling, system dynamics), least absolute shrinkage and selection operator, machine learning algorithms, copulas and data-driven statistical modelling approaches (e.g., treelet transformations, principal balances and coordinates). Novel methods could also include those that have been used previously in nutrition research if applied in new ways to characterise dietary patterns (e.g., linear programming used to model a modified dietary pattern rather than to test scenarios). Methods that were not considered to be novel were those that have been applied to assess dietary patterns in numerous studies and have been considered by prior reviews and commentaries(Reference English, Ard and Bailey2,Reference Ocké10,Reference Newby and Tucker30) , including regression, ‘a priori’ approaches such as investigator-driven indices, and routinely used data-driven approaches, including factor analysis and cluster analysis(Reference Ocké10,Reference Krebs-Smith, Subar and Reedy31) .
Identifying relevant studies
Articles were eligible for inclusion if they were: a primary research article; focused on dietary intake as an exposure or outcome, including examination of dietary patterns (i.e., multiple dietary components in combination rather than single nutrients, foods or other dietary components); used at least one or more novel methods, as described above, to characterise dietary patterns; were published in English; and focused on humans. Ineligible studies included those focused on individual foods or human milk rather than dietary patterns and commentaries and reviews.
Searches of three research databases, MEDLINE (via PubMed), the Cumulative Index to Nursing and Allied Health Literature and Scopus, were conducted in March 2022. These health-focused, specialised and multidisciplinary databases were selected based on consultation with a research librarian (JS) to ensure a range of possibly relevant study types were included. The search strategies were developed in consultation with the research librarian using keywords and subject headings to capture diet-related constructs (e.g., dietary intake, patterns, recommendations, feeding behaviour, food habits) and novel methods to characterise dietary patterns (e.g., machine learning, network science and system dynamics model). No date limits were applied to the searches, and articles were included until the end of the search in March 2022. The search strategies for MEDLINE, CINAHL and Scopus are available in online Supplementary File 1.
Study selection
Two independent reviewers (two of AP, AR, SH, SIK) screened each record at the title and abstract and full-text screening stages using Covidence(32), with one consistent reviewer (AP) participating throughout the entire process. At the title and abstract screening stage, an initial pilot screening (twenty-five records) generated 100 % agreement (AR and AP) and 92 % agreement (AP and SH). A second pilot screening (100 records) generated 91 % agreement (AR and AP) and 93 % agreement (AP and SH). When applicable, discrepancies were discussed by reviewers and if needed deferred to a third reviewer (SIK) for decision. Following pilot screening, the reviewers independently reviewed the remaining articles (96 % agreement, Kappa = 0·83).
The reviewers were intentionally liberal during the title and abstract screening stage because of the breadth of possible novel methods. This required iteratively revisiting the inclusion criteria. For example, reduced rank regression was initially considered to be novel but was found to be prevalent in the literature based on title and abstract screening and was excluded during full-text review. Further, articles that used ‘a posteriori’ methods to identify dietary intake but did not specify the exact method in the title or abstract were included for full-text review.
Pilot screening of full-text reviews (fifty records) generated 82 % agreement (AR and AP) and 96 % agreement (AP and SIK); after discrepancies were discussed, two reviewers independently screened the remaining full-text articles (93 % agreement, Kappa = 0·60). The high agreement between reviewers but relatively low Cohen’s Kappa is described as Cohen’s paradox, with a larger number of studies excluded than included(Reference Gwet33–Reference Belur, Tompson and Thornton35).
Data extraction
Data extraction was completed by JMH and TEW using a pre-specified Excel template, with all extracted data subsequently verified by LA. Data extraction fields (online Supplementary File 2) included information pertaining to authorship, study title, journal, year of publication, funding source, contextual details (e.g., study location), sample size and participant characteristics (e.g., age). Details relating to study methods (e.g., analysis input variables, measurement of dietary intake and analytic approaches) and results (e.g., findings related to dietary patterns and if applicable, health risk and outcomes) were also extracted.
Results
Summary of search
A total of 5274 unique articles were identified after removing duplicates. Of these, 436 were identified as potentially relevant based on the title and abstract review and underwent full-text screening (Figure 1). Studies excluded during full-text screening included those that did not include methods defined as novel, those that did not focus on dietary patterns, commentaries, narrative reviews, systematic reviews, studies that were not published in English, studies that were not conducted with humans and theses/dissertations. A final pool of twenty-four articles describing twenty-four unique studies met the inclusion criteria.

Figure 1. PRISMA diagram illustrating the screening process for a scoping review exploring innovative methods for the analysis of dietary intake data and characterisation of dietary patterns. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
Characteristics of included studies
Across the twenty-four included studies, data from seventeen countries were represented (Table 1). Half of the studies were published between 2005 and 2019(Reference Biesbroek, van der A and Brosens36–Reference Solans, Coenders and Marcos-Gragera47), and the remaining twelve were published between 2020 and March 2022(Reference Dalmartello, Decarli and Ferraroni48–Reference Schwedhelm, Lipsky and Shearrer59). Three studies used data from subsets of the European Prospective Investigation into Cancer and Nutrition(Reference Biesbroek, van der A and Brosens36,Reference Iqbal, Buijsse and Wirth45,Reference Schwedhelm, Knüppel and Schwingshackl46) , two studies used waves of data from the National Health and Nutrition Examination Survey(Reference Wright, McKenna and Nugent54,Reference Farmer, Lee and Powell-Wiley58) and two studies used data from the ELSA-Brasil cohort study (Table 2)(Reference Bezerra, Bahamonde and Marchioni42,Reference de Almeida Alves, Molina and da Fonseca57) . Sample sizes ranged from 250 to over 73 000 participants. Nineteen studies were conducted using data from cohort or cross-sectional studies, and five studies applied a case–control design.
Table 1. Study characteristics across included studies applying novel methods to characterise dietary patterns

* Some studies included more than one country.
Table 2. Characteristics of studies (n 24) identified in a scoping review of novel analytic methods to characterise dietary patterns

† Etude Epidémiologique auprès de femmes de la Mutuelle Générale de l’Education Nationale.
‡ FFQ.
§ Brazilian Longitudinal Study of Adult Health.
|| European Prospective Investigation into Cancer and Nutrition.
¶ National Health and Nutrition Examination Survey.
** 24 h dietary recall.
†† Protection Against Allergy: Study in Rural Environments.
‡‡ Finnish, rural-suburban birth cohort.
§§ Portuguese Elderly and Nutritional Status Surveillance System.
|||| Early Life Exposure in Mexico to Environmental Toxicants.
¶¶ Coronary Artery Risk Development in Young Adults.
The majority (n 15) of studies used FFQ to assess dietary intake(Reference Biesbroek, van der A and Brosens36,Reference Fonseca, Gaio and Lopes37,Reference Oliveira, Rodríguez-Artalejo and Gaio39–Reference Harrington, Dahly and Fitzgerald43,Reference Iqbal, Buijsse and Wirth45,Reference Solans, Coenders and Marcos-Gragera47–Reference Hoang, Lee and Kim49,Reference Samieri, Sonawane and Lefèvre-Arbogast52,Reference Xia, Zhao and Zhang55–Reference de Almeida Alves, Molina and da Fonseca57) . Six studies used 24-h recalls(Reference Schwedhelm, Knüppel and Schwingshackl46,Reference Madeira, Severo and Oliveira51,Reference Shang, Li and Xu53,Reference Wright, McKenna and Nugent54,Reference Farmer, Lee and Powell-Wiley58,Reference Schwedhelm, Lipsky and Shearrer59) , two studies used food records/diaries(Reference Hearty and Gibney44,Reference Hose, Pagani and Karvonen50) and one study used a FFQ and a 24-h recall(Reference Jamiołkowski, Szpak and Pawłowska38). Among the studies using 24-h recalls and records/diaries, one used data from a single recall that was combined with data from a FFQ(Reference Jamiołkowski, Szpak and Pawłowska38). The remaining studies including records or recalls averaged or combined data from two or more days of intake. Dietary input variables were created by selecting specific items of interest from questionnaires or condensing foods into groupings, ranging from nine to sixty-two food groupings(Reference Biesbroek, van der A and Brosens36–Reference Schwedhelm, Lipsky and Shearrer59). Apart from averaging recalls or records, none of the included studies applied substantial efforts to mitigate measurement error present in dietary intake data. Several studies noted potential misreporting as a limitation, and five studies specifically noted that findings may have been influenced by measurement error present in self-reported dietary assessment instruments(Reference Wu, Sánchez and Goodrich40,Reference Harrington, Dahly and Fitzgerald43,Reference Madeira, Severo and Oliveira51,Reference Samieri, Sonawane and Lefèvre-Arbogast52,Reference Schwedhelm, Lipsky and Shearrer59) .
Novel methods applied to identify dietary patterns
The types of methods used and how they were implemented to identify dietary patterns varied widely (Table 3). Nine studies applied approaches that have applications in machine learning, including classification models, neural networks and probabilistic graphical models (Table 4)(Reference Biesbroek, van der A and Brosens36,Reference Jamiołkowski, Szpak and Pawłowska38,Reference Hearty and Gibney44–Reference Schwedhelm, Knüppel and Schwingshackl46,Reference Hoang, Lee and Kim49,Reference Shang, Li and Xu53,Reference Zhao, Naumova and Bobb56,Reference Schwedhelm, Lipsky and Shearrer59) . The earliest study included in this review was published in 2005 and applied neural networks to characterise dietary patterns(Reference Jamiołkowski, Szpak and Pawłowska38). Fifteen studies applied other novel methods, including latent class analysis, mutual information and treelet transform(Reference Fonseca, Gaio and Lopes37,Reference Oliveira, Rodríguez-Artalejo and Gaio39–Reference Harrington, Dahly and Fitzgerald43,Reference Solans, Coenders and Marcos-Gragera47,Reference Dalmartello, Decarli and Ferraroni48,Reference Hose, Pagani and Karvonen50–Reference Samieri, Sonawane and Lefèvre-Arbogast52,Reference Wright, McKenna and Nugent54,Reference Xia, Zhao and Zhang55,Reference de Almeida Alves, Molina and da Fonseca57,Reference Farmer, Lee and Powell-Wiley58) . Two studies identified dietary patterns using more than one novel method(Reference Hearty and Gibney44,Reference Shang, Li and Xu53) . Five studies included comparisons of different novel methods, though these were typically versions of the same model(Reference Hearty and Gibney44,Reference Iqbal, Buijsse and Wirth45,Reference Solans, Coenders and Marcos-Gragera47,Reference Hoang, Lee and Kim49,Reference Shang, Li and Xu53) . For example, Solans et al. compared three models for compositional data analysis and reported that the best-performing model incorporated both investigator- and data-driven methods(Reference Solans, Coenders and Marcos-Gragera47).
Table 3. Description of dietary patterns (n 24) identified in a scoping review of novel analytic methods to characterise dietary patterns

* We considered socio-demographic characteristics that are related to social position or are indicators of equity including age, sex, gender, race/ethnicity, marital status, education, employment status, smoking status as examples. We did not include physical activity, BMI or alcohol consumption.
Table 4. Novel methods applied to identify dietary patterns across included studies *

* Studies may have used more than one novel approach to characterise dietary patterns.
In twelve studies, two to eight distinct dietary patterns, such as the ‘prudent’ pattern or ‘Western’ pattern, were identified using methods such as latent class analysis, treelet transform, random forest with classification tree analysis and multivariate finite mixture models(Reference Biesbroek, van der A and Brosens36–Reference Jamiołkowski, Szpak and Pawłowska38,Reference Affret, Severi and Dow41–Reference Harrington, Dahly and Fitzgerald43,Reference Dalmartello, Decarli and Ferraroni48,Reference Hose, Pagani and Karvonen50,Reference Madeira, Severo and Oliveira51,Reference Wright, McKenna and Nugent54,Reference de Almeida Alves, Molina and da Fonseca57,Reference Farmer, Lee and Powell-Wiley58) . Six studies applied network methods, including probabilistic graphical models and mutual information, to identify networks of dietary patterns among populations(Reference Iqbal, Buijsse and Wirth45,Reference Schwedhelm, Knüppel and Schwingshackl46,Reference Hoang, Lee and Kim49,Reference Samieri, Sonawane and Lefèvre-Arbogast52,Reference Xia, Zhao and Zhang55,Reference Schwedhelm, Lipsky and Shearrer59) .
Dynamism, or how dietary patterns vary across time, was incorporated into four studies’ characterisation or analysis of dietary patterns. Three studies incorporated stratification by meals to consider dynamism(Reference Hearty and Gibney44,Reference Schwedhelm, Knüppel and Schwingshackl46,Reference Schwedhelm, Lipsky and Shearrer59) . In two studies using graphical models, separate networks were created for each meal to provide insights into how patterns of intake vary throughout the day(Reference Schwedhelm, Knüppel and Schwingshackl46,Reference Schwedhelm, Lipsky and Shearrer59) . Hearty and Gibney used decision trees and neural networks and ran models by meals based on sixty-two food groups to predict diet quality(Reference Hearty and Gibney44). Additionally, one study considered dynamism by using ANOVA and chi-square tests to descriptively show how a variety of characteristics were associated with stable or changing dietary patterns characterised using latent class analysis(Reference Harrington, Dahly and Fitzgerald43).
Fourteen studies examined relationships between dietary patterns characterised using novel methods and variables indicative of health risk or outcomes, such as periodontitis, cardiovascular disease and metabolic syndrome (Table 3) (Reference Biesbroek, van der A and Brosens36–Reference Wu, Sánchez and Goodrich40,Reference Dalmartello, Decarli and Ferraroni48–Reference Zhao, Naumova and Bobb56) . Six studies included longitudinal analysis of the relationship between dietary patterns and health outcomes(Reference Biesbroek, van der A and Brosens36,Reference Jamiołkowski, Szpak and Pawłowska38,Reference Wu, Sánchez and Goodrich40,Reference Hose, Pagani and Karvonen50,Reference Samieri, Sonawane and Lefèvre-Arbogast52,Reference Shang, Li and Xu53) . Most studies that examined health risk or outcomes first identified dietary patterns using a novel method and then investigated relationships with health outcomes using regression models(Reference Biesbroek, van der A and Brosens36–Reference Wu, Sánchez and Goodrich40,Reference Dalmartello, Decarli and Ferraroni48,Reference Hose, Pagani and Karvonen50,Reference Madeira, Severo and Oliveira51,Reference Wright, McKenna and Nugent54) . In contrast, some studies incorporated variables indicative of health outcomes or risk directly into the machine learning models(Reference Hoang, Lee and Kim49,Reference Zhao, Naumova and Bobb56) . For example, Zhao et al. (Reference Zhao, Naumova and Bobb56) applied Bayesian kernel machine regression, a machine learning model designed to incorporate high-dimensional data, to jointly model the relationship between several dietary components and cardiovascular disease risk. Similarly, Hoang et al. (Reference Hoang, Lee and Kim49) included health variables within mixed graphical models, though directionality of diet-health relationships could not be ascertained given the cross-sectional nature of the data. In two case–control studies, dietary patterns were identified using mutual information to estimate dietary pattern networks, with stratification by health outcomes(Reference Samieri, Sonawane and Lefèvre-Arbogast52,Reference Xia, Zhao and Zhang55) .
Nineteen studies considered socio-demographic characteristics, such as sex, age, race/ethnicity, education and income(Reference Biesbroek, van der A and Brosens36,Reference Fonseca, Gaio and Lopes37,Reference Oliveira, Rodríguez-Artalejo and Gaio39–Reference Harrington, Dahly and Fitzgerald43,Reference Iqbal, Buijsse and Wirth45,Reference Dalmartello, Decarli and Ferraroni48–Reference Farmer, Lee and Powell-Wiley58) . In one case, socio-demographic characteristics were included in models used to characterise dietary patterns(Reference Hoang, Lee and Kim49). Two studies stratified by socio-demographic characteristics, examining dietary patterns by sex(Reference Iqbal, Buijsse and Wirth45) or age groups(Reference Bezerra, Bahamonde and Marchioni42). Studies that used case–control designs typically considered socio-demographic characteristics through matching(Reference Samieri, Sonawane and Lefèvre-Arbogast52,Reference Xia, Zhao and Zhang55) . In the remaining studies that considered socio-demographic characteristics, these were incorporated in regression models to explore how dietary patterns characterised using novel methods were associated with health and other characteristics.
Two studies included comparisons of novel methods and traditional statistical approaches(Reference Biesbroek, van der A and Brosens36,Reference de Almeida Alves, Molina and da Fonseca57) . For instance, Biesbroek et al. (Reference Biesbroek, van der A and Brosens36) found that dietary patterns identified through reduced rank regression were more strongly associated with coronary artery disease compared with those identified through random forest with classification tree analysis.
Discussion
The application of novel methods to dietary pattern research is rapidly expanding, with the aim of better understanding their complexity and how they are related to health and other factors. Many studies used methods that characterise distinct dietary patterns based on the population being studied, such as the ‘prudent’ pattern or the ‘Western’ pattern. Most studies used cross-sectional data, limiting opportunities to examine the effect of dietary patterns on health.
Methods newly being applied in this field offer promising capacity to better understand the totality of dietary patterns and synergistic relationships among dietary components when compared with traditional approaches that do not assume synergy(Reference Reedy, Subar and George4,Reference Bodnar, Cartus and Kirkpatrick14) . Given the large variation in how dietary patterns were characterised using novel methods, multidimensionality and potential synergistic relationships between dietary components were considered and presented in a range of ways, from latent classes to networks. Several studies incorporated dynamism into their consideration of dietary patterns, though in most cases this was through stratification, for example, by meal, rather than through direct use of novel methods(Reference Harrington, Dahly and Fitzgerald43,Reference Hearty and Gibney44,Reference Schwedhelm, Knüppel and Schwingshackl46,Reference Schwedhelm, Lipsky and Shearrer59) . In these cases, it was a combination of input variables, stratification by time and the novel method that enabled explorations of dynamism.
The methods highlighted have a range of strengths and limitations for the characterisation of dietary patterns. Methods that focused on the classification of distinct patterns allowed for the assessment of relationships between these patterns and health outcomes or other indicators of interest but explored the interrelationships between dietary components to a lesser degree(Reference Biesbroek, van der A and Brosens36–Reference Jamiołkowski, Szpak and Pawłowska38,Reference Affret, Severi and Dow41–Reference Harrington, Dahly and Fitzgerald43,Reference Dalmartello, Decarli and Ferraroni48,Reference Hose, Pagani and Karvonen50,Reference Madeira, Severo and Oliveira51,Reference Wright, McKenna and Nugent54,Reference de Almeida Alves, Molina and da Fonseca57,Reference Farmer, Lee and Powell-Wiley58) . Other methods such as compositional data analysis, mutual information and probabilistic graphical models can be used to consider joint relationships among dietary components to better understand multidimensionality(Reference Iqbal, Buijsse and Wirth45–Reference Solans, Coenders and Marcos-Gragera47,Reference Hoang, Lee and Kim49,Reference Samieri, Sonawane and Lefèvre-Arbogast52,Reference Xia, Zhao and Zhang55,Reference Schwedhelm, Lipsky and Shearrer59) . For example, Gaussian graphical models provide the opportunity to visualise the dietary pattern through a network of dietary components, with relationships between variables indicating conditional dependencies(Reference Iqbal, Buijsse and Wirth45,Reference Hoang, Lee and Kim49,Reference Schwedhelm, Lipsky and Shearrer59) . However, studies making use of these methods included further analyses, such as the development of a score from dietary pattern networks, to assess relationships with health outcomes.
There are trade-offs between novel and traditional methods that should be considered when contemplating the most appropriate methods for a given study. Though potential benefits such as a greater ability to discern multidimensionality may be desirable, these must be weighed against the implications for interpretability and computational costs. The application of novel methods may not always yield insights beyond those gained from traditional approaches. For example, Biesbroek et al. (Reference Biesbroek, van der A and Brosens36) found that random forest models did not outperform reduced rank regression when examining associations of dietary patterns with coronary artery disease. Conversely, a study that was not included in this review because it first identified dietary patterns using a traditional method – principal component analysis – found that machine learning algorithms were better able to classify the identified dietary patterns according to cardiometabolic risk compared to traditional approaches(Reference Panaretos, Koloverou and Dimopoulos60).
Several socio-demographic characteristics are indicators of systemic health inequity and have been shown to be associated with dietary patterns among populations(Reference Hanson and Connor61–Reference Hiza, Casavale and Guenther63). The degree to which studies incorporated socio-demographic characteristics into their consideration of dietary patterns or relationships between dietary patterns and health varied, with adjusted regression models applied after dietary patterns were characterised as the most common approach. Consistent with nutrition research more broadly(Reference Hiza, Casavale and Guenther63–Reference Brassard, Munene and Pierre65), there was little consideration of possible interactions among socio-demographic characteristics in relation to dietary patterns. Methods particularly suited to pattern recognition and complexity could be leveraged to simultaneously explore potential joint relationships among facets of social identity and dietary patterns(Reference Wang, Ramaswamy and Russakovsky66) and advance our understanding of how broader systems of oppression and intersecting characteristics contribute to dietary patterns(Reference Doan, Olstad and Vanderlee62).
Beyond the inclusion of socio-demographic characteristics in models, considering equity from the beginning of study design is a critical consideration given potential bias in data and algorithms that can have immense implications for those who already experience inequities because of factors such as structural racism(Reference Robinson, Renson and Naimi67–Reference Rajkomar, Hardt and Howell69). The included studies did not explicitly discuss the incorporation of equity into study design, and many conducted secondary analyses of existing datasets. The use of directed acyclic graphs has been identified as a potential solution to mitigate some possible issues with bias through careful model design(Reference Robinson, Renson and Naimi67) and has been applied in other domains of nutrition research using novel methods(Reference Bodnar, Cartus and Kirkpatrick14). Engaging individuals with lived experience and the integration of interdisciplinary teams with broad expertise that can combine content knowledge with data-driven approaches can help to mitigate potential bias in algorithms(Reference Wang, Ramaswamy and Russakovsky66).
The level of description of methods varied, and it was sometimes challenging to decipher the specifics of how novel methods were applied. Although the Strengthening the Reporting of Observational Studies in Epidemiology—Nutritional Epidemiology reporting guidelines provide guidance for transparently reporting nutritional epidemiology and dietary assessment research(Reference Lachat, Hawwash and Ocké70), it was not designed specifically for the methods used in the studies considered in this review and the ways in which they are being applied in dietary patterns research. Other reporting guidelines, such as the Consolidated Standards of Reporting Trials, have been extended to consider the application of artificial intelligence (AI) (Reference Liu, Cruz Rivera and Moher71). Motivations related to the extension of Consolidated Standards of Reporting Trials included inadequate reporting of studies using AI and the lack of full consideration of potential sources of bias specific to AI within existing reporting guidelines(Reference Liu, Cruz Rivera and Moher71). Relevant items added to Consolidated Standards of Reporting Trials-AI pertain to the role of AI in the study, the nature of the data used in AI systems and how humans interacted with AI systems, for example(Reference Liu, Cruz Rivera and Moher71). The extension of reporting guidelines such as STROBE-nut to consider applications of AI, including machine learning, and other methods that are becoming more commonly used, may facilitate consistent and complete reporting and improved comparability of studies. Reporting guidelines should continue to emphasise strategies applied to mitigate measurement error in dietary intake data(Reference Lachat, Hawwash and Ocké70), as studies using novel methods are not immune to the effects of error on findings(Reference Spicker, Nazemi and Hutchinson72). Along with reporting guidelines, the development of tailored quality appraisal tools may facilitate synthesis of high-quality evidence to inform recommendations about dietary patterns and health.
This review provides a snapshot of a rapidly evolving field(Reference Lampignano, Tatoli and Donghia73,Reference Slurink, Corpeleijn and Bakker74) , with the involvement of an interdisciplinary team of researchers lending to a robust consideration of emerging methods in dietary patterns research. While prior reviews have provided perspectives on the potential applications of machine learning within the field of nutrition(Reference Kirk, Kok and Tufano19–Reference Côté and Lamarche21), this review considered dietary patterns in particular, as well as considering approaches beyond machine learning that have not traditionally been used in this area, broadening the scope compared to prior reviews(Reference Oliveira Chaves, Gomes Domingos and Louzada Fernandes22,Reference Kirk, Catal and Tekinerdogan75) . The search terms were informed by preliminary searching, though it is unlikely that all relevant articles applying novel methods to characterise dietary patterns were captured, partially driven by the wide range of descriptors used for these methods and the lack of reporting standards. As well, determining whether a method is novel is somewhat subjective. Methods such as factor analysis and principal component analysis once revolutionised dietary pattern analysis, providing data-driven approaches to identify patterns(Reference Hu15). Now, they are widely applied and recognised as limited in their capabilities to capture complexity compared to some newer approaches. Further, the search terms skewed toward multidimensionality v. dynamism, potentially overlooking some studies focusing on variation of dietary patterns over time or across eating occasions. Nonetheless, this review documents an acceleration of the application of a range of novel methods to dietary patterns research and captures a broad scope of methods being used to characterise these patterns, highlighting the need for researchers to develop the lexicon and knowledge needed to interpret the emerging literature.
Conclusion
The findings of this review indicate a strong motivation to apply novel methods, including but not limited to machine learning, to improve understanding of dietary patterns and how they relate to health and other factors. The application of these methods may help us to learn about complex relationships that may not be possible to discern through traditional approaches. However, these methods may not be suitable for every question and do not necessarily overcome the limitations of more traditional approaches.
Given the proliferation of these methods, it is becoming increasingly worthwhile for nutrition researchers to have at least a basic understanding of novel methods such as machine learning and latent class analysis, so they can interpret the results of emerging studies. The development and implementation of reporting guidelines and quality appraisal mechanisms for studies that apply novel methods may improve the capacity for synthesis of evidence generated to inform strategies that promote improved population health and well-being.
Acknowledgements
We thank research librarian Jackie Stapleton (JS) of the University of Waterloo for support with the search strategy.
This review was funded by the Canadian Institutes of Health Research, a University of Waterloo Research Incentive Fund award, an Ontario Ministry of Research and Innovation Early Researcher Award held by S. I. K. and Microsoft AI for Good. J. M. H. was funded by a Vanier Canada Graduate Scholarship. R. M. L. is funded by a National Health and Medical Research Council Emerging Leadership Fellowship (APP1175250). L. M. B. was funded by the National Institutes of Health (R01 HD102313, MPI Bodnar LM, Naimi AI).
S. I. K. conceived of the review and planned it with the co-authors; A. R., A. P., S. H., S. I. K. conducted the search and screening; J. M. H. and T. E. W. conducted extraction; L. A. conducted verification; J. M. H. wrote the first draft of the manuscript with support from A. P. to write the methods; all co-authors provided critical input to the manuscript and all co-authors read and approved the final manuscript.
R. M. L. is a statistical editor for the British Journal of Nutrition. Other authors declare none.
Posted as a preprint: https://doi.org/10.1101/2024.06.20.24309251
Supplementary material
For supplementary material/s referred to in this article, please visit https://doi.org/10.1017/S0007114524002587