Introduction
More than 13.5 million households and 33 million people experience food insecurity (FI), including more than 5 million children in the United States [1]. Notably, FI, or the reduced quality, variety, desirability of diet, or reduced food intake, disproportionately affects households with children under the age of 6 years, headed by Black and Hispanic persons, and headed by a single woman [1]. FI is associated with prematurity; chronic illnesses including asthma, depression, and obesity; missed routine medical care including immunizations; and higher emergency department utilization and hospitalizations [Reference Thomas, Miller and Morrissey2–Reference Ortiz-Marron, Ortiz-Pinto and Urtasun Lanza4]. Additionally, FI is associated with 20% greater health care expenditures among families with FI compared to families without FI [Reference Palakshappa, Garg, Peltz, Wong, Cholera and Berkowitz5].
Given the prevalence and association with poor outcomes, an increasing number of health systems and health insurers are investing in strategies to identify and address FI to improve patient care and population health. Although many healthcare systems are screening for FI, it is not yet universal and little guidance exists regarding the optimal frequency or clinical department, so predicting FI could be useful for healthcare systems and practices that are not yet screening. Additionally, if practices are only screening at well visits, given the transient nature of FI [Reference Insolera6], FI may be missed at acute visits, especially among families that are not attending well visits. Furthermore, families may choose not to disclose FI for many reasons, including screening fatigue, shame or stigma, fear of consequences for reporting FI, or perception of need [Reference Insolera6–Reference Cullen, Attridge and Fein9], so predicting who is at risk of FI and offering resources rather than screening may be useful.
Although there is increasing ability to leverage machine learning techniques using the electronic health record (EHR) to earlier identify diseases and to prevent adverse outcomes [Reference Kogan, Didden and Lee10,Reference Hobensack, Song, Scharp, Bowles and Topaz11], it is unknown how to best utilize such models to predict which patients are at highest risk of FI to offer resources and interventions. Studies show it is possible to routinely screen for FI during routine clinical care in emergency room, primary care, and subspecialty care settings [Reference Gonzalez, Hartford, Moore and Brown12,Reference Oates, Lock and Tarn13]. Given the rich data contained within EHR, including demographics, medical histories, and historic patient-reported information, including unmet health-related social needs, these data could be valuable resources to predict FI. However, such prediction models are not yet validated or integrated into the EHR to guide clinical decision support and patient care.
This study aims to develop a machine learning algorithm to predict families that will report FI using demographic, geographic, medical, and unmet health-related social needs data that are typically available within most EHRs.
Materials and methods
Setting
We conducted a retrospective longitudinal cohort study conducted at Atrium Health Wake Forest Baptist between January 2017 and June 2021. This study was approved by the Wake Forest University Health Sciences Institutional Review Board (IRB00071608).
Study population
We included all patients in one academic pediatric primary care clinic with patients who were screened at least once for FI between January 2017 and August 2021. Encounters were included if FI was screened for and documented in the EHR (N = 25,214) and were excluded if FI was not screened for or documented in the EHR (N = 3,296). The clinic is in an urban neighborhood and serves a predominantly racial and ethnic minority population of which more than 90% are covered by Medicaid insurance. The medical system is also part of a tertiary care hospital, has multiple board-certified pediatric subspecialties, a pediatric emergency department, and serves the majority of children across western North Carolina.
Measurement of FI
The clinic screens all patients for FI during routine clinical care, including well-child and acute visits. This screening occurs on paper in English or Spanish and is verbally asked through a certified interpreter if the parent/caregiver speaks another language or is unable to read. Household FI is screened using two validated questions, “In the past year, did you worry that your food would run out before you got money or Food Stamps to buy more?” and “In the past year, did the food you bought just not last and you didn’t have money to get more?” with dichotomous response options (yes or no) [Reference Hager, Quigg and Black14]. An affirmative response to either question is considered a positive screen. These questions are recommended by the American Academy of Pediatrics [15]. At the time of a clinic visit, the provider discusses the screening results with the family and enters the results into discrete data fields within the EHR. When a patient has a positive screen, the provider asks the family if they would like a bag of food from the clinic’s food pharmacy, referral to a food navigator, connection with federal nutrition programs, and/or a list of community resources.
Demographics
Patient’s age, sex, race, and ethnicity at the time of the visit were extracted from the EHR. Race and ethnicity, which were self-reported by parents/caregivers and recorded in the EHR, were combined into one variable and categorized as non-Hispanic White, non-Hispanic Black, Hispanic, or other (includes American Indian, Alaska Native, Asian, Native Hawaiian, Other Pacific Islander, Other, Patient Refused, Unknown).
Other Unmet Health-Related Social Needs Screening
As part of routine clinical care, the clinic also screens for other unmet health-related social needs over the prior 12 months including intimate partner violence, risk of homelessness, transportation difficulties, legal needs, caregiver coping concerns, and caregiver substance use. These needs are screened alongside FI. All screening results and resources provided are documented within the EHR (Supplementary Table 1).
Chronic conditions and healthcare Utilization
All problem lists and encounter diagnoses from visits occurring at the primary care clinic and any department across the medical center since 2012, when our EHR was adopted, were included. Diagnoses data included in the problem list or encounter diagnoses based on the respective International Classification of Diseases 10th Revision (ICD-10), were extracted, including attention-deficit hyperactivity disorder (ADHD), failure to gain weight, hypertension, asthma, eczema, prematurity, depression, and anxiety. Emergency department visits and hospitalizations were also extracted.
Anthropometrics
Children’s height and weight are recorded as standard of care at each clinic visit, and body mass index (BMI) percentiles were calculated from the encounter at which FI was measured. Underweight was defined as BMI of <5th percentile. Healthy weight was defined as BMI between ≥ 5 to<85th percentile. Overweight and obesity were defined as BMI of ≥ 85 to <95th, and ≥ 95th percentile, respectively.
Neighborhood information
We used the US Census Bureau’s 2010 American Community Survey (ACS) 5-year estimates to identify neighborhood information. The ACS includes neighborhood-level information at both the zip code level and the census block level including income, poverty, employment, and home ownership estimates [16]. Zip code predictors included total population count, number of housing units, median age, percentage 25 and older who had a high school education or less, median income, percentage of households with incomes under 100 percent of the federal poverty level in last 12 months, percentage of households where the person who lives in the household owns the house, percentage 16 years of age and older who work and commute 60 minutes or more to work in an area, percentage 16 years of age and older who are unemployed, percent minority, percentage of the area considered to be urban, and percentage of the area considered to be rural. Census block predictors included the same variables as zip code but measured at the smaller geographic level of census block. The home address of all patients in the health system is automatically geocoded by an automated system through the Wake Forest Clinical and Translational Science Institute’s Translational Data Warehouse. We merged geocode data from the EHR with zip code level and census block level data from the ACS.
Statistical analysis
The analysis cohort was created by merging the above data sources (demographics, anthropometrics, social needs, chronic conditions, and neighborhood information) by encounter number. The analysis cohort was randomly split at the patient level into 80% training set and 20% testing/validation set. Each patient could contribute more than one record to the analysis cohort. Various combinations of predictor variables and modeling approaches were used to train and validate prediction models. Demographic predictor variables included age in years, female sex, Black race, Hispanic ethnicity, other race and ethnicity (defined as non-black, non-Hispanic, and/or nonwhite), and BMI percentile. Twelve-month “look back” indicator variables included previous FI, intimate partner violence, risk of homelessness, transportation difficulties, legal needs, caregiver coping concerns, caregiver substance use, overweight/obesity, ADHD, failure to gain weight, hypertension, asthma, eczema, prematurity, depression, anxiety, hospitalizations, and emergency department visits. The look-back variables were initially coded as absent and were recoded as present if they appeared within the look-back window. The look-back window was expanded for the same set of variables to 18 and 24 months. Six different iterations of each logistic regression, random forest, and gradient-boosted model, respectively, were developed. The modeling approaches were defined by ACS census block variables with a 12, 18, or 24-month clinical data look-back period, or ACS zip code variables using 12, 18, or 24-month clinical data look-back period. All models included demographics.
Modeling approaches included logistic regression, random forest, and gradient-boosted machine. Default values of hyperparameters were used for random forest (500 trees, number of variables to split at each node equal to the rounded down square root of the number of predictors, minimum node size of 1) and gradient boosted machine (learning rate of 0.3, maximum tree depth of 6) [Reference Breiman17–Reference Chen, Benesty and Khotilovich20]. No variable selection was performed, as all variables considered are readily available in the EHR. All models were built on the training set using 10-fold cross-validation to estimate out-of-sample performance for the metrics of accuracy, area under the curve (C-statistic), precision-recall area under the curve, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). In the 10-fold cross-validation, each patient’s observations were uniquely mapped to only one of the folds to prevent overly optimistic estimates of performance. Final models from the training set were then evaluated on the test set. Test set evaluations included measures of discrimination and calibration. Model discrimination was measured using the C-statistic and compared between models using DeLong’s test [Reference DeLong, DeLong and Clarke-Pearson21,Reference Robin, Turck and Hainard22]. Model calibration was measured using the Hosmer–Lemeshow slope statistic [Reference Hosmer and Lemesbow23]. The Hosmer–Lemeshow statistic breaks the data into deciles of predicted FI and examines the relationship by decile between average predicted FI and observed rate of FI. A slope of 1 for a line connecting the deciles indicates a well-calibrated model.
Missing data were addressed in two ways. First, a single imputation was employed to fill in all missing data, using multiple imputations by chained equations [Reference van Buuren and Groothuis-Oudshoorn24]. As a sensitivity analysis, missing data were coded as a category [Reference Josse, Scornet and Varoquaux25]. Continuous variables were summarized using tertiles, with a fourth level added to reflect missing data. Categorical variables included an extra level to reflect missing, e.g., sex was coded as male, female, or missing. Additional sensitivity analyses included using only data from before the onset of the COVID-19 pandemic (visits occurring before 3/11/2020) and removing previous FI as a predictor from the model.
All analyses were conducted using R version 4.3.2 [26]. Machine learning prediction models were trained and validated using the R package “mlr3” [Reference Lang, Binder and Richter27]. All models were built and described using the principles outlined in the Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement [Reference Collins, Reitsma, Altman and Moons28] (Supplementary Table 2).
Results
A total of 25,214 encounters from 8521 unique patients collected from 1/3/2017 to 8/2/2021 met inclusion criteria for this study. The number of encounters per patient ranged from 1–12 with a median of 2 (IQR: 1, 4). Patients were partitioned into 80% training (N = 6816 patients with 20,171 encounters) and 20% test/validation (N = 1705 patients with 5043 encounters). The median age was 3.17 years (IQR: 0.75, 9.17), and the sample was 52% male, 22.4% Black, and 62.0% Hispanic. The median BMI percentile was 76 (IQR 47.15, 93.77), and 26.2% of the sample was under 100% of the federal poverty level in last 12 months. Training and test sets showed similar demographics (Tables 1–3). Overall, 3820 (15.2%) encounters reported FI, with similar proportions of FI in the training (N = 3054, 15.1%) and test (N = 766, 15.4%) sets.
Legend: N (%) or mean (SD) are reported. BMI = body mass index. HS = high school. Mins = minutes. SD = standard deviation. ZIP = zip code. yo = year old.
Legend: FI = food insecurity.
Legend: ADHD = attention deficit and hyperactivity disorder.
Logistic regression with a 12-month look-back using census block group neighborhood variables showed the best performance in the training set (Table 4). This approach had a C-statistic of 0.68 and an accuracy of 0.85. The most important variables in this model (Supplementary Table 3) included previous FI (odds ratio 4.09; p-value < .0001), domestic violence (odds ratio 0.24; p-value < .0001), failure to gain weight (odds ratio 1.43; p-value 0.0007), and prematurity (odds ratio 1.24; p-value 0.0264). BMI percentile, age, and income had the highest variable importance in the random forest model, whereas domestic violence, previous FI, and age had the highest variable importance values in the gradient boosting model.
Legend: AUC = area under the cure. PR-AUC = precision recall- area under the curve. PPV = positive predictive value. NPV = negative predictive value. LR = logistic regression. RF = random forest. GBM = gradient boosted model.
Logistic regression with 12-month lookback using census block group neighborhood variables performed similarly in the test set (C-statistic 0.70 and accuracy 0.84) and had superior C-statistics to both random forest (0.65, p = 0.01) and gradient boosted machine (0.68, p < 0.01) (Table 5, Figure 1). Furthermore, this approach showed the best calibration (Figure 2), with a predicted versus true intercept of 0 and 95% CI (-0.02, 0.03) that includes 0, reflecting good calibration, and slope of 1.03 and 95% CI (0.88, 1.18) that includes 1, also reflecting good calibration. Sensitivity and specificity were maximized when predicting FI for all encounters with probability of 0.13 or higher (Supplementary Tables 4 and 5). Notably, at the optimal cutpoint of 0.13, PPV was equal to 0.92.
Legend: AUC = area under the cure. PR-AUC = precision recall- area under the curve. PPV = positive predictive value. NPV = negative predictive value. LR = logistic regression. RF = random forest. GBM = gradient boosted model.
Model Performance was similar when treating missing data as a category rather than imputing (Supplementary Tables 6 and 7). With missing data coded as a category, the logistic regression with a 24-month look back using census block group posted a C-statistic of 0.70 and an accuracy of 0.85, showing nearly identical performance to handling missing data using a single imputation.
Two additional sensitivity analyses were performed. First, only data from before the onset of the COVID-19 pandemic (before 3/11/2020) were included in model training and testing. In this analysis, there were 14,712 encounters from 6180 patients in the training set and 3573 encounters from 1538 patients in the test set. Again using 12-month lookback data with census block group neighborhood variables, logistic regression had a C-statistic of 0.69 in the test set, indicating similar performance to the model built on the full data set. A second sensitivity analysis removed previous FI as a predictor from the model. Doing so had a large impact on the predictive ability of the model, reducing the C-statistic in the test set to 0.63.
Discussion
This is one of the first studies to utilize machine learning to predict pediatric FI. Our study shows logistic regression modeling with a 12-month look-back using census block group neighborhood variables, demographics, unmet health-related social needs, and clinical data from the EHR showed the best performance in both the training and validation sets to predict FI with a C-statistic of 0.70. Interestingly, when missing data were encoded as a categorical variable rather than imputed, the model exhibited similar performance, with a C-statistic of 0.70.
Notably, despite evaluating more than 80 variables, leveraging historical longitudinal data, routine health-related social screening at well and acute visits, and including a 12-month look-back period, our most robust model did not perform as well as expected with a C-statistic of only 0.70. This is particularly noteworthy in a pediatric population, when we first start evaluating patients at birth, and provide routine well-child visits every 2–3 months for the first two years. It is possible that prediction modeling of FI in families that have infants and toddlers is less accurate. Similarly, given that our clinic serves a racial and ethnic minority population, it is also possible that families in our clinic under-report FI due to fear of consequences, or perception of need [Reference Insolera6–Reference Cullen, Attridge and Fein9]. Comparable clinics have a wide positive FI screening rates ranging from 10% to more than 70% [Reference Smith, Malinak and Chang29–Reference Hendrickson, O’Riordan, Arpilleda and Heneghan31]. This study also highlights that despite advanced modeling, there are unmeasured variables, and screening for FI remains important given the lack of effective predictive models. This is in accordance with recent guidance from regulatory agencies for screening health-related social needs, including FI, to reduce disparities [32–34].
Given our model’s performance, it is also possible that expanded FI screening to other departments outside of our clinic may create a more robust model. Patients and families with unmet health-related social needs seek acute medical care more frequently than those without unmet needs [Reference Rigdon, Montez and Palakshappa35,Reference Li, Gogia and Tatem36], so it is possible families are not seeing their primary care provider, which was necessary to be included in our study. Likewise, our clinic has access to federal and community-based nutrition supports, such as the Special Supplemental Nutrition Program for Women, Infants and Children, Supplemental Nutrition Assistance Programs, food pantries, and other hunger relief programs embedded within our clinic and clinical workflows, it is possible that some FI needs were being partially addressed. Similarly, although there are brief, validated surveys for FI that could be used universally, this simply adds to the large amount of screening recommended but not being done, including depression, anxiety, homelessness, domestic violence, etc. Rather than add one more screen that may supplant another, we propose targeting the highest-risk patients to address FI using a statistical model that runs in the background and does not require doctor-patient interaction unless patients meet a threshold of likely FI.
Strengths of this study include that it leverages the EHR and community data, which are rich data sources containing demographic, unmet health-related social needs, and health information. This study is an example of the real-world applicability of integrating social determinants of health and the electronic health record to enable predictive models for social needs screening and intervention. We also had a large sample size in an urban pediatric health center, which is an ideal location to screen for FI and assist with addressing unmet health-related social needs.
A limitation of the study is that we did not address potential correlation between multiple encounters within the same person. Potential approaches could include longitudinal modeling [Reference Liang and Zeger37,Reference Fitzmaurice, Laird and Ware38] or creating a number of encounter variables and including them in the model [Reference McGee, Haneuse, Coull, Weisskopf and Rotem39]. Similarly, other models to predict FI include variables on education and school performance, patient substance use, and family relationships; we attempted to address these by including parental substance use screening but did not have access to school or family data outside of the ACS data. It is also possible ACS survey has nonresponse bias; however, the US Bureau takes steps in sampling and weights to account for nonresponse bias and to improve the quality of data. Our data are retrospective; there could be selection bias in who was screened for FI, but we standardized clinic procedures to collect these data [40]. Lastly, FI is commonly measured at the family level; however, we were unable to accurately identify sibling and family linkages, and whether or not children were split between households at different encounters.
Conclusion
This is one of the first pediatric studies leveraging machine learning to predict FI. Although such models can be integrated into the EHR to predict FI, guide clinical decision support, and streamline workflows to address FI in a pediatric population, further work is needed to develop a more robust prediction model. Future considerations include more widespread integration of FI screening to assist with model development.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/cts.2024.645.
Author contributions
JR conducted the statistical analysis and drafted the manuscript. KM conceptualized and designed the study. DP conceptualized and designed the study. CB conceptualized and designed the study. SMD conceptualized and designed the study. LWA conceptualized and designed the study. AT conceptualized and designed the study, drafted the manuscript, and takes responsibility for the manuscript as a whole.
Funding statement
The authors gratefully acknowledge use of the services and facilities of the Center for Biomedical Informatics, funded by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant Award Number UL1TR001420.
Dr Brown is supported by a grant from the National Institute for Child Health (grant K23HD099249). Dr Palakshappa is supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under Award Number K23HL146902. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funding organization has no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Competing interests
The authors do not have any conflicts of interest to disclose.