Investigating the role of diet in health outcomes is an ongoing challenge in nutritional epidemiology and applied research(Reference Boeing1). To achieve this goal, it is necessary to get reliable data on food intake to obtain the most accurate estimates of nutrient intake and dietary patterns(Reference Schulz, Oluwagbemigun and Nöthlings2). Several methods for assessing food intake have been validated in different individuals and populations. Generally, traditional methods are based on participants’ short- and medium-term memory, highlighting food records, 24-h recalls, FFQ and dietary history(Reference Probst and Zammit3,Reference Shim, Oh and Kim4) . The choice of dietary intake method depends on the research question, study design, sample characteristics and reference timeframe(Reference Bailey5). However, an important consideration of these methods is that they are susceptible to random and systematic measurement errors that affect the reliability and accuracy of the obtained dietary information(Reference Shim, Oh and Kim4). For example, assessment methods based on subjective evaluation, such as the 24-h recall method and FFQ, are susceptible to recall bias and researcher bias in previous discussions(Reference Archer, Marlow and Lavie6). To address this limitation, Prentice et al. (Reference Prentice, Mossavar-Rahmani and Huang7) conducted a study and reported that nutritional intake biomarkers could be a new approach to enhance the reliability of food records and FFQ. The cost and feasibility of these methodological approaches are barriers that researchers and health care practitioners must consider.
Rapidly evolving technologies can help to reduce the difficulties described above. Some mobile applications and web applications are becoming more prevalent in research owing to their cost-effectiveness, speed and accuracy in collecting dietary information. A review published in 2017 showed that image-assisted methods can improve the accuracy of information collection compared with conventional methods, specifically adding more detail to dietary records and being more dynamic(Reference Boushey, Spoden and Zhu8). Artificial intelligence (AI) has recently presented significant growth opportunities in medicine and nutrition(Reference Acharjee and Choudhury9–Reference López-Cortés, Matamala and Venegas11). AI-based training algorithms can support accurately predicting complex food intake interactions by integrating and organising large amounts of data(Reference Miyazawa, Hiratsuka and Toda12). Approaches to developing AI in nutrition include techniques such as machine learning (ML), deep learning (DL) and data mining. In turn, it can be trained in a supervised, semi-supervised or unsupervised manner(Reference Limketkai, Mauldin and Manitius10,Reference Oliveira Chaves, Gomes Domingos and Louzada Fernandes13) . These technologies are programmed to extract information from sources such as social networks, devices and mobile applications, depending on the validation context (e.g. preclinical or clinical)(Reference Côté and Lamarche14). Currently, no systematic review has comprehensively and critically analysed this issue. Therefore, the present systematic review aimed to assess the validity and accuracy of AI-based dietary intake assessment methods (AI-DIA) available in the biomedical literature.
Methods
This systematic review was designed and conducted following the Preferred Reporting Items for Systematic Reviews and Meta-analysis guidelines(Reference Moher, Liberati and Tetzlaff15). The protocol was registered in the Open Science Framework database (https://osf.io/gqw6s).
Eligibility criteria
The PECOS (P-Population; E-Exposure, C-Comparison, O-Outcome, S-Study design) framework for the search planning was considered. The inclusion criteria were as follows: (1) human population data; (2) articles that assess dietary intake methods based on AI: 24-h recalls, FFQ, weighed food records, food records or other methods, such as image-based applications or software-based records. Each dietary intake assessment method should incorporate data processing techniques based on AI, such as DL, ML and data mining and (3) articles that report reliability properties: internal consistency, measurement error, test-retest reliability, interrater reliability, correlation coefficients and validity measures (content validity and face validity). In addition, articles report AI-metrics: accuracy, precision, regression (mean absolute error, mean squared error and root mean squared error, R2), ROC curve and others; (4) study designs by purpose (comparative, validation or analytical studies), by temporality (cross-sectional, prospective or retrospective studies), by researcher involvement (controlled clinical trials and quasi-experimental studies, observational) and others designs by stage of the study development (pilot studies and feasibility studies) and (5) only original articles in English language were included. The exclusion criteria were as follows: (1) studies in animal models; (2) other dietary intake assessment methods that are not based on AI; (3) study designs including ecological studies or case studies and (4) other types of articles, such as, letters to the editor, narrative reviews and conference papers.
Search strategy and information sources
Systematic searches were performed in the Embase, PubMed, Web of Science and Scopus databases from inception to 1 December 2024 by two independent reviewers (C.S. and G.Q.). The search strategy was adapted for each database and information source according to the descriptors in the Medical Subject Headings Section section and free terms. Specifically, we used the following terms: ‘diet’, ‘dietary assessment’, ‘food intake’, ‘food records’, ‘food frequency questionnaire’, ‘24-hour recall’, ‘weighed food records’, ‘artificial intelligence’, ‘data mining’, ‘deep learning’, ‘machine learning’, ‘artificial neural networks’, ‘validity’, ‘reliability’, ‘accuracy’. All keywords were combined with Boolean operators such as OR and AND (online Supplementary 1). Records and duplicates were analysed using the Rayyan(Reference Ouzzani, Hammady and Fedorowicz16) platform.
Selection process and data extraction
Articles were selected, and two independent reviewers (C.S. and G.Q.) extracted their data to ensure blinding during screening. The first selection was made by assessing the relevance of the titles and abstracts identified in the search strategy and checking whether they met the eligibility criteria. In the event of disagreement during the selection process, a third reviewer (S.C.) resolved the dispute. After this stage, we analysed each selected article in the full text and removed any articles that did not meet the objective of the present review. The extraction process and analysis of the results were conducted by three researchers (C.S., G. Q. and X.A.L.C.). They entered the data into a descriptive matrix, reporting the main characteristics of the studies: (1) main author /year /country, (2) objective, (3) study design, (4) sample/ origin, (5) setting, (6) name technology (i.e. commercial or patented name), (7) dietary components evaluated, (8) traditional method of assessing dietary intake used as reference, (9) description of dietary assessment method, (10) type of AI technique used, (11) statistical method applied, (12) outcomes and (13) main findings.
Quality assessment
Two researchers (G.Q. and S.C.) used Risk of Bias in Non-randomised Studies of Interventions, a tool developed specifically for non-randomised trials(Reference Sterne, Hernán and Reeves17) that assesses the risk of bias. Seven types of biases were assessed: confounding, selection of participants, classification, deviations from interventions, missing data, measurement of outcomes and reporting of results. Likewise, it assigns a rating of ‘low’, ‘moderate’, ‘serious’, ‘critical’ or ‘no information’ depending on the integration of the above domains. Additionally, a visualisation tool was used to plot the risk of bias by domain in each study.
Results
Our research team identified 1679 articles through a systematic search. Subsequently, 612 duplicates were removed, and 1067 titles and abstracts were screened. A total of forty-three articles were analysed, of which thirty were discarded owing to non-compliance with one or more previously defined eligibility criteria. After an exhaustive review, thirteen articles were selected for inclusion in this study (Figure 1).

Figure 1. PRISMA 2020 flow diagram of process studies selection.
Study characteristics
In reporting the geographical distribution of the research, four studies were conducted in North America(Reference Ji, Plourde and Bouzo18–Reference Moyen, Rappaport and Fleurent-Grégoire21), five in Asia(Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22–Reference Tagi, Hamada and Shan26), three in Europe(Reference Mezgec and Seljak27–Reference Papathanail, Brühlmann and Vasiloglou29) and only one in Africa(Reference Folson, Bannerman and Atadze30). By the year of publication, these ranged from 2017 to 2024, mostly concentrated in 2022 (n 7). Most of the designs focused on validation studies(Reference Kusuma, Yang and Yang19,Reference Yang, Yang and Kusuma20,Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22–Reference Lee, Yen and Wu24,Reference Tagi, Hamada and Shan26,Reference Papathanail, Vasiloglou and Stathopoulou28) (n 7), followed by randomised controlled trials(Reference Ji, Plourde and Bouzo18,Reference Moyen, Rappaport and Fleurent-Grégoire21) (n 2), non-randomised controlled trials(Reference Nguyen, Tran and Hoang25,Reference Folson, Bannerman and Atadze30) (n 2), pilot study(Reference Papathanail, Brühlmann and Vasiloglou29) (n 1) and comparative studies(Reference Mezgec and Seljak27)(n 1). The population sizes ranged from 36(Reference Nguyen, Tran and Hoang25) to 136(Reference Moyen, Rappaport and Fleurent-Grégoire21) participants, while the images collected varied from 576(Reference Tagi, Tajiri and Hamada23) to 130 517(Reference Mezgec and Seljak27). In line with the research context, eight studies were conducted in preclinical settings, whereas four were conducted in clinical settings, as per the available information. The following AI-DIA are distinguished by their names: Food Recognition Assistance and Nudging Insights(Reference Nguyen, Tran and Hoang25,Reference Folson, Bannerman and Atadze30) , Kenooa(Reference Ji, Plourde and Bouzo18,Reference Moyen, Rappaport and Fleurent-Grégoire21) , GB HealthWatch(Reference Kusuma, Yang and Yang19,Reference Yang, Yang and Kusuma20) , mediPIATTO(Reference Papathanail, Vasiloglou and Stathopoulou28), NutriNet(Reference Mezgec and Seljak27) and Automated Carbohydrate Estimation System(Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22). The main characteristics of the AI-DIA methods are summarised in Table 1.
Table 1. Main characteristics of the included studies (n 13)

Dietary components assessed by artificial intelligence-based dietary intake assessment methods
It is worth highlighting that a significant number of methods evaluated calories, protein, fat and carbohydrates as dietary components(Reference Ji, Plourde and Bouzo18–Reference Moyen, Rappaport and Fleurent-Grégoire21,Reference Lee, Yen and Wu24,Reference Nguyen, Tran and Hoang25,Reference Papathanail, Brühlmann and Vasiloglou29,Reference Folson, Bannerman and Atadze30) . Seven studies estimated vitamins, namely, A, D, E, C, B1, B2, B6 and B12 (Reference Ji, Plourde and Bouzo18–Reference Moyen, Rappaport and Fleurent-Grégoire21,Reference Lee, Yen and Wu24,Reference Nguyen, Tran and Hoang25,Reference Folson, Bannerman and Atadze30) . In the same way, minerals such as Ca, Fe, Zn, Na, potassium, phosphorus and Mg were estimated in some studies(Reference Moyen, Rappaport and Fleurent-Grégoire21,Reference Lee, Yen and Wu24,Reference Nguyen, Tran and Hoang25,Reference Folson, Bannerman and Atadze30) . Dietary fibre was estimated and calculated in six studies(Reference Ji, Plourde and Bouzo18–Reference Moyen, Rappaport and Fleurent-Grégoire21,Reference Lee, Yen and Wu24,Reference Nguyen, Tran and Hoang25,Reference Folson, Bannerman and Atadze30) and water consumption in two studies(Reference Tagi, Tajiri and Hamada23,Reference Lee, Yen and Wu24) . Other dietary components, including food, beverages(Reference Mezgec and Seljak27), meals, liquids and fermented foods(Reference Tagi, Tajiri and Hamada23), were estimated using various AI technologies, emphasising the heterogeneity of these factors in dietary intake. Only one study by Papathanail et al. (Reference Papathanail, Vasiloglou and Stathopoulou28) evaluated adherence to a dietary pattern, specifically the Mediterranean diet.
Validity of artificial intelligence-based dietary intake assessment methods
A wide range of AI-DIA methods were examined and compared with the traditional methods of dietary intake assessment. Three studies utilised food weighing records, two food frequency questionnaires, two 24-hour dietary recalls, two visual estimations by dietitians, two database records with food and nutrient information and only one study employed daily food records (Table 1). Regarding the reliability of the technologies, the correlation coefficients, Pearson’s correlation and Spearman’s correlation were used. In this line, correlation coefficients between the energy estimation of AI technology and the traditional method contrasted, a variation from 0·20 in Ji et al. (Reference Ji, Plourde and Bouzo18) study to 0·97 in the Papathanail et al. (Reference Papathanail, Brühlmann and Vasiloglou29) study was observed. Moreover, the correlation coefficients for macronutrients ranged from 0·38 in the Ji et al. (Reference Ji, Plourde and Bouzo18) study to 0·98 in the Papathanail et al. (Reference Papathanail, Brühlmann and Vasiloglou29) study. Micronutrients showed a range of variation from 0·3 in the Folson et al. (Reference Folson, Bannerman and Atadze30) article to 0·84 in the study by Kusuma et al. (Reference Kusuma, Yang and Yang19)
Bland–Altman analyses were performed using graphical plots in 66·6 % of the AI-DIA. In the studies by Chotwanvirat et al. (Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22), Moyen et al. (Reference Moyen, Rappaport and Fleurent-Grégoire21), Nguyen et al. (Reference Nguyen, Tran and Hoang25) and Folson et al. (Reference Folson, Bannerman and Atadze30), a high degree of agreement was observed for the nutrients analysed. Kappa test showed moderate agreement for micronutrients and macronutrients in the article of Nguyen et al. (Reference Nguyen, Tran and Hoang25) Ji et al. (Reference Ji, Plourde and Bouzo18) showed a moderate degree of agreement for fibre and certain micronutrients (vitamin A, B1, Mg and P), but a low degree of agreement for energy and macronutrients, also using the kappa test as a statistic. Most of the investigations employed methods for calculating the percentage of estimated differences between AI-based technology and the reference method for assessing dietary intake. These results are shown in Table 2.
Table 2. Summary of principal artificial intelligence (AI) and statistical techniques employed in the validation process (n 13)

Artificial intelligence techniques applied in artificial intelligence-based dietary intake assessment methods
Regarding the types of AI techniques employed by the various technologies under review, DL was identified as a method of processing dietary data in five studies(Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22,Reference Tagi, Tajiri and Hamada23,Reference Mezgec and Seljak27–Reference Papathanail, Brühlmann and Vasiloglou29) . In two articles, the use of ML was reported(Reference Kusuma, Yang and Yang19,Reference Yang, Yang and Kusuma20) . Conversely, four authors only reported the use of an AI-algorithm and did not provide information about the type of analysis technique used for its validation(Reference Ji, Plourde and Bouzo18,Reference Moyen, Rappaport and Fleurent-Grégoire21,Reference Nguyen, Tran and Hoang25,Reference Folson, Bannerman and Atadze30) . Chotwanvirat et al. (Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22) employed DL techniques for food recognition and food weight estimation, particularly a neural network-based regression model. For his part, Lee et al. (Reference Lee, Yen and Wu24) used the technique known as natural language processing to analyse their data. In turn, two studies described the stages of training, validation and testing of the AI models. Mezgec et al. (Reference Mezgec and Seljak27) for the DL model divided the dataset into 70 % training, 10 % validation and 20 % testing. In the case of Kusuma et al. (Reference Kusuma, Yang and Yang19), in their ML model, data processing was divided into two stages: 80 % training and 20 % testing. (Table 2). Also, Tagi et al. (Reference Tagi, Hamada and Shan26) used convolutional neural networks as the technique and training of their model to analyze liquids estimation from images.
Accuracy of artificial intelligence-based dietary intake assessment methods
The accuracy of the AI models was estimated in seven investigations. Mezgec et al. (Reference Mezgec and Seljak27) obtained 94·47 % in a food and drink detection model for a set of images captured by the study participants. Papathanail et al. (Reference Papathanail, Brühlmann and Vasiloglou29) achieved 61·8 % in their model for recognising meal images and 54·5 % for serving size estimation. In another study by his authorship, he achieved 84·1 % meal segmentation in the model(Reference Papathanail, Vasiloglou and Stathopoulou28). In contrast, Tagi et al. (Reference Tagi, Tajiri and Hamada23) evaluated the accuracy of their algorithm for estimating leftover food and obtained an accuracy of 99 % for thin rice gruel, 63 % for fermented dairy, 25 % for peach juice and 85 % for all foods.
Kusuma et al. (Reference Kusuma, Yang and Yang19) developed a predictive model to account for the observed differences between nutrient estimations from the application and the FFQ. An AUC of 0·995 was obtained to explain the differences in calorie estimates based on carbohydrate and protein differences. In a related study, Yang et al. (Reference Yang, Yang and Kusuma20) reported an AUC of 0·91 to account for differences in estimated calories between an internet-based application and a national database recording participants’ intake information (Table 2).
Risk of bias
A quality assessment using the ROBINS tool revealed that 58·3 % of the studies were at moderate risk of bias (online Supplementary 2). In contrast, 25 % showed a low risk of bias, most of which were experimental. Overall, confounding bias was the most frequently reported in the present analysis for moderate risk assessment. Intervention intention bias, missing data bias and outcome measurement bias at the judgement of the reviewers were presented with a lower risk of bias. These results are shown in Figure 2.

Figure 2. Risk of bias (ROB) in selected studies.
Discussion
This is the first systematic review to examine the validity and accuracy of dietary intake assessment methods that employ AI techniques in dietary information processing. The selected studies suggest the promising potential of AI-DIA, indicating a possible evolution in comparison with traditional dietary assessment methods. However, further research is needed to confirm and quantify its actual impact.
Consequently, the increasing number of published articles demonstrates a growing interest in research on the use of technologies based on AI, particularly in the most recent period, between 2022 and 2024. Furthermore, there is a high concentration of research in North America(Reference Ji, Plourde and Bouzo18–Reference Moyen, Rappaport and Fleurent-Grégoire21) and Asia(Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22–Reference Nguyen, Tran and Hoang25), which poses a challenge to the generalisability of results. Our research findings indicate that traditional methods of intake assessment, particularly food weight record and 24-h dietary recall, are more prevalent than other methods to contrast the validity of AI-DIA(Reference Yang, Yang and Kusuma20–Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22,Reference Lee, Yen and Wu24–Reference Tagi, Hamada and Shan26,Reference Folson, Bannerman and Atadze30) .
Our systematic review highlights the analysis of the main AI and statistical techniques used in the validation of dietary intake assessment methods, reporting that most of the studies focus on app uses that have behind them the use of AI, either focused on ML(Reference Kusuma, Yang and Yang19,Reference Yang, Yang and Kusuma20) or DL(Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22,Reference Tagi, Tajiri and Hamada23,Reference Mezgec and Seljak27–Reference Papathanail, Brühlmann and Vasiloglou29) . The implementation of advanced algorithms, including convolutional neural networks and ML, which have revolutionised the accuracy and efficiency in estimating dietary and nutritional intake from images captured by mobile devices(Reference Kusuma, Yang and Yang19,Reference Mezgec and Seljak27) .Others studies reported in detail procedures to detect and classify images of the evaluated algorithms, for instance in the studies Mezgec et al. (Reference Mezgec and Seljak27) and Yang et al. (Reference Yang, Yang and Kusuma20) showed high levels of accuracy in food classification and detection, comparable or superior to traditional methods. These findings are consistent with those discussed in the systematic review conducted by Ho et al. (Reference Ho, Tseng and Wu31), where he points out that app-based photographic capture methods are highly accurate, even superior to dietician-directed methods, allowing for adequate classification and quantification of food portion sizes.
Concerning to the procedures necessary for validation of AI-DIA methods, our research suggests that this can be based on classifying foods, quantifying portions and estimating calories, macronutrients and micronutrients. For calorie estimation, four studies showed correlation coefficients (CC) higher than 0·8, with the research by Papathanail et al. (Reference Papathanail, Brühlmann and Vasiloglou29) standing out, whose identification system achieved a CC of 0·97 compared to the visual estimation developed by dieticians. In this research, an adequate amount of food components and plate types (n 4) seems to be beneficial for model training, increasing accuracy, the authors conclude. In terms of macronutrient estimation, four studies reported CC above 0·8, showing heterogeneity in the indices, for instance, Moyen et al. (Reference Moyen, Rappaport and Fleurent-Grégoire21) reported a higher level of correlation in the estimation of protein and carbohydrates in men than in women. Furthermore, the findings of Kusuma et al. (Reference Kusuma, Yang and Yang19) indicate a high CC (0·85–0·87) for all macronutrients, when the mobile app was compared with the food frequency questionnaire, using ML techniques for the analysis. Similarly, Yang et al. (Reference Yang, Yang and Kusuma20) compared an internet-based application and the USDA Nutrition Data System for Research showing a CC 0·85 and thus confirming the validity of ML-based technology. These findings indicate that the use of AI-DIA has an adequate correlation and reliability for the estimation of macronutrients. In the case of micronutrients, there is a high variation in the estimates, making it difficult to generalise results. To demonstrate this issue, Kusuma et al. (Reference Kusuma, Yang and Yang19) reported a CC of 0·84, highlighting the estimates of vitamin D (0·90), Fe (0·88) and niacin (0·88). In contrast, Ji et al. (Reference Ji, Plourde and Bouzo18) reported low CC (< 0·20) especially in micronutrients such as vitamin B12, vitamin C, vitamin D, Ca, Mg, potassium and Na when they compared Kenooa-dietitian v. 3-day food diary records. Likewise, Folson et al. (Reference Folson, Bannerman and Atadze30) obtained low CC niacin (0·42), riboflavin (0·51), vitamin A (0·47) and vitamin C (0·3) when they compared FRANI-app v. Weighed Records. These differences emphasise the difficulty in estimating micronutrients from the AI-DIA compared to the traditional method, posing challenges in standardising methods that allow contrast of technologies, and the use of homogeneous databases, even knowing that there are variations between populations in the quantification of intakes. Capling et al. (Reference Capling, Beck and Gifford32) previously discussed this problem, pointing out that dietary intake assessment methods that use long recall periods, namely, FFQ, may have low accuracy for micronutrient assessment and have low CC. Another issue in micronutrient analysis is the ability to determine the amount of minerals and vitamins in various food sources, which depends on the quality of information in national databases containing the chemical composition of foods.
Another method employed in most of the studies was the use of Bland–Altman plots, which demonstrated a considerable degree of heterogeneity in the analyses. To illustrate this point, in five articles, a high-moderate degree of agreement(Reference Ji, Plourde and Bouzo18,Reference Moyen, Rappaport and Fleurent-Grégoire21,Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22,Reference Nguyen, Tran and Hoang25,Reference Folson, Bannerman and Atadze30) was observed for the nutrients reported; however, in three, a low degree of agreement(Reference Kusuma, Yang and Yang19,Reference Yang, Yang and Kusuma20,Reference Tagi, Tajiri and Hamada23) was observed.
Concerning the AI techniques used by the AI-DIA, there is a wide predominance of DL, highlighting the CNN analysis rules as the main learning architecture. Chotwanvirat et al. (Reference Chotwanvirat, Hnoohom and Rojroongwasinkul22) used CNN to process images of carbohydrate-based ingredients from Thai cuisine, comparing the results with food weight recording performed by dietitians. For its part, Papathanail et al. (Reference Papathanail, Vasiloglou and Stathopoulou28) developed a CNN-based system for recognising foods and calculating Mediterranean diet adherence scores from food photos, compared to a FFQ. Tagi et al. (Reference Tagi, Tajiri and Hamada23) used a multitask CNN to classify the names of liquid foods and estimate the leftover liquid food, in addition the model considered calorie-volume estimation. The research described above demonstrates the effectiveness of DL for food image recognition, which is achieved through computational models composed of multiple processing layers and that are trained by input based on image sets. Specifically, CNN layers contain learnable filters that respond to features in the input data, and fully connected layers compose output data from other layers to obtain higher level learning from them(Reference Mezgec and Seljak33).
A relevant discussion to note is that food record methods and 24-h dietary recall were the most employed reference methods to validate and contrast with AI-DIA. In particular, the food weighing method has been shown to be a reliable method (gold standard) to compare the different AI technologies examined, provided that trained dietitians are available to perform the measurements. Despite its notable advantages, Ortega et al. (Reference Ortega, Perez-Rodrigo and Lopez-Sobaler34) argue that this method requires time, the possible induction of modifications in the diet of the subjects analysed or difficulties in describing the foods and/or portions consumed when it is self-applied.
The systematic review revealed significant variability in the quality of included studies, with 61·5 % presenting a moderate risk of bias and 38·5 % low risk, according to the Risk of Bias in Non-randomised Studies of Interventions tool. The most common biases were confounding and participant selection, suggesting the presence of uncontrolled variables that could influence results(Reference Langohr35,Reference Leahy, Kent and Sammon36) . These findings are consistent with previous studies that have identified the presence of biases in AI-based research, particularly in the context of dietary and health assessment(Reference Chen, Wang and Hong37).
Randomised controlled trials are considered the gold standard for intervention validation, offering robust evidence on the efficacy and accuracy of AI technologies compared to traditional methods(Reference Shonkoff, Cara and Pei38). For example, Moyen et al. (Reference Moyen, Rappaport and Fleurent-Grégoire21) led a randomised crossover design to evaluate the validity of an AI-assisted diet application against a web-based food recall method, demonstrating the robustness of randomised controlled trials for validating AI technologies. However, most of the reviewed studies employed non-randomised designs, limiting the assessment of causality and clinical effectiveness.
The implementation of randomised controlled trials to validate AI technologies faces significant challenges, including high costs, logistical complexity, the need for large samples and variability in dietary intake. Additionally, result generalisability, rapid technological evolution and ethical considerations present additional challenges. These challenges have been documented in studies analysing the feasibility of technology-based interventions to improve nutrition, like the Nudging for Good project(Reference Braga, Aberman and Arrieta39).
Strengths and limitations
Our review is the first with focus on the validity and accuracy of AI-based dietary assessment methods, underlining its emphasis on the wide variety of designs and technologies currently under development in research. In this regard, a high percentage of mobile applications currently under development and validation, both in preclinical and clinical contexts, was observed. During the protocol design phase, rigorous inclusion criteria were established, therefore all the selected articles present some AI-DIA that are contrasted with a traditional method for the evaluation of dietary intake. Moreover, we performed an exhaustive comparison of the designs of the included studies, as well as an analysis of the quality of the evidence using a standardised tool, reporting a critical look at the risk of bias of the studies.
Some limitations are mainly focused on three aspects. First, the heterogeneity of contrast methods for the evaluation of dietary intake prevented us from developing a meta-analysis to summarise the pooled effect of the investigations. This issue is problematic, especially if the studies in a meta-analysis differ significantly because the generalisability of the pooled estimate could be questionable, losing clinical applicability(Reference Cordero and Dans40). Second, despite the exhaustive analysis in each article, some did not report the AI technique used, whether ML or DL, they only noted the use of an AI algorithm. Wang et al. (Reference Wang, Jin and Liu41) recently led a guide for the development, validation and evaluation of AI-based algorithms where it is recommended to explain in detail the development of a new algorithm to ensure the transparency and reproducibility of the research. It is emphasised to choose appropriate approaches to algorithm development, such as inclusion of coding, model-based rules and explicit ML techniques employed(Reference Wang, Jin and Liu41). Another salient limitation of the present review is the potential for its included studies to lack representativeness with respect to diverse cultural and demographic contexts. AI-based tools could have varied performance in different cultures due to differences in diets, eating habits and food availability. Future studies should focus on cross-cultural validation of AI-DIA to ensure its applicability and accuracy in diverse populations.
Future directions
From the point of view of research designs, it is recommended that standardised protocols be developed, multicentre collaborations be encouraged, technology assessment frameworks be updated and adaptive designs be explored. Adaptive designs can make clinical trials more flexible by using the results accumulated in the trial to modify the course of the trial according to previously defined rules(Reference Pallmann, Bedding and Choodari-Oskooei42). These strategies aim to improve the validity and reliability of AI- DIA, ensuring their accuracy, safety and ethical acceptability.
In addition, we suggest encouraging the use of one or more traditional methods of dietary intake to allow for a greater variety of methods and homogenise future statistical analyses between studies. In this context, the methods of dietary record and weighed food are appropriate for comparing AI technologies, according to our analysis.
Finally, as AI technologies become more prevalent in dietary intake assessment, ethical frameworks and regulatory standards will need to be established to govern their use(Reference Kassem, Beevi and Basheer43). Ensuring that AI-DIA are developed with the inclusion of ethical frameworks in mind will avoid potential biases that could arise from biased training data(Reference Kassem, Beevi and Basheer43). Collaborations with local researchers and academic partners have the potential to enhance the cultural relevance of AI dietary solutions, fostering trust and stimulating effective utilisation in diverse communities.
Conclusions
Validity and accuracy are fundamental properties of any method that assesses nutrient or food intake in individuals. This systematic review critically analysed all the available evidence, highlighting as findings that AI-DIA are presented as reliable and valid methods to determine the amount of energy and macronutrients of individuals. Although the validity for micronutrient determination is moderate to low due to the variability of information contained in the sources and resources used as input in the ML or DL models. A relevant challenge is the design of randomised clinical trials to evaluate the efficacy of AI-DIA, as there is a moderate risk of bias in the designs included in the present study.
Acknowledgements
The authors declare no conflicts of interest. Author contribution: S.C. and C.S. carried out the concept, design, and drafting of this study. C.S. and G.Q.F. searched databases and screened articles. S.C., G.Q.F. and X.L.C. carried out the data collection, data interpretation, and reviewing of manuscript for editorial and intellectual contents. S.C. supervised the study and formal analysis. All authors approved the final version of the manuscript.
S. C. gratefully acknowledge financial support from National Doctoral Scholarship 2702/2024 ANID. G. Q. gratefully acknowledge financial support from National Doctoral Scholarship 1808/2023 ANID. X. A. L. C. gratefully acknowledge financial support from Research Project ANID FONDECYT Iniciación en Investigación No. 11220897.
Supplementary material
For supplementary material/s referred to in this article, please visit https://doi.org/10.1017/S0007114525000522