We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This study seeks a low-rank representation of turbulent flow data obtained from multiple sources. To uncover such a representation, we consider finding a finite-dimensional manifold that captures underlying turbulent flow structures and characteristics. While nonlinear machine-learning techniques can be considered to seek a low-order manifold from flow field data, there exists an infinite number of transformations between data-driven low-order representations, causing difficulty in understanding turbulent flows on a manifold. Finding a manifold that captures turbulence characteristics becomes further challenging when considering multi-source data together due to the presence of inherent noise or uncertainties and the difference in the spatiotemporal length scale resolved in flow snapshots, which depends on approaches in collecting data. With an example of numerical and experimental data sets of transitional turbulent boundary layers, this study considers an observable-augmented nonlinear autoencoder-based compression, enabling data-driven feature extraction with prior knowledge of turbulence. We show that it is possible to find a low-rank subspace that not only captures structural features of flows across the Reynolds number but also distinguishes the data source. Along with machine-learning-based super-resolution, we further argue that the present manifold can be used to validate the outcome of modern data-driven techniques when training and evaluating across data sets collected through different techniques. The current approach could serve as a foundation for a range of analyses including reduced-complexity modelling and state estimation with multi-source turbulent flow data.
Studies conducted during the COVID-19 pandemic found high occurrence of suicidal thoughts and behaviours (STBs) among healthcare workers (HCWs). The current study aimed to (1) develop a machine learning-based prediction model for future STBs using data from a large prospective cohort of Spanish HCWs and (2) identify the most important variables in terms of contribution to the model’s predictive accuracy.
Methods
This is a prospective, multicentre cohort study of Spanish HCWs active during the COVID-19 pandemic. A total of 8,996 HCWs participated in the web-based baseline survey (May–July 2020) and 4,809 in the 4-month follow-up survey. A total of 219 predictor variables were derived from the baseline survey. The outcome variable was any STB at the 4-month follow-up. Variable selection was done using an L1 regularized linear Support Vector Classifier (SVC). A random forest model with 5-fold cross-validation was developed, in which the Synthetic Minority Oversampling Technique (SMOTE) and undersampling of the majority class balancing techniques were tested. The model was evaluated by the area under the Receiver Operating Characteristic (AUROC) curve and the area under the precision–recall curve. Shapley’s additive explanatory values (SHAP values) were used to evaluate the overall contribution of each variable to the prediction of future STBs. Results were obtained separately by gender.
Results
The prevalence of STBs in HCWs at the 4-month follow-up was 7.9% (women = 7.8%, men = 8.2%). Thirty-four variables were selected by the L1 regularized linear SVC. The best results were obtained without data balancing techniques: AUROC = 0.87 (0.86 for women and 0.87 for men) and area under the precision–recall curve = 0.50 (0.55 for women and 0.45 for men). Based on SHAP values, the most important baseline predictors for any STB at the 4-month follow-up were the presence of passive suicidal ideation, the number of days in the past 30 days with passive or active suicidal ideation, the number of days in the past 30 days with binge eating episodes, the number of panic attacks (women only) and the frequency of intrusive thoughts (men only).
Conclusions
Machine learning-based prediction models for STBs in HCWs during the COVID-19 pandemic trained on web-based survey data present high discrimination and classification capacity. Future clinical implementations of this model could enable the early detection of HCWs at the highest risk for developing adverse mental health outcomes.
Kelvin wakes are fluid motions generated by a moving disturbance at a free surface. We present a machine learning-based framework for inferring the properties of such moving disturbances from the Kelvin-wake patterns. We perform phase-resolved simulations to establish a dataset of nearly half a million Kelvin wakes generated by disturbances of varying propagating speed, length scale and geometry. Trained with the augmented data, the neural network achieves accuracies of 99.7% and 92.4% in predicting the velocity and the length scale of the disturbance, respectively, even if a random noise has been added to the training data. The explainability of the neural network is demonstrated by quantifying the contribution of the input data to the prediction, which shows a strong connection with the diverging and transverse waves. The accuracy of the neural network in predicting the disturbance length scale is sensitive to wave nonlinearity.
Aerosol-cloud interactions contribute significant uncertainty to modern climate model predictions. Analysis of complex observed aerosol-cloud parameter relationships is a crucial piece of reducing this uncertainty. Here, we apply two machine learning methods to explore variability in in-situ observations from the NASA ACTIVATE mission. These observations consist of flights over the Western North Atlantic Ocean, providing a large repository of data including aerosol, meteorological, and microphysical conditions in and out of clouds. We investigate this dataset using principal component analysis (PCA), a linear dimensionality reduction technique, and an autoencoder, a deep learning non-linear dimensionality reduction technique. We find that we can reduce the dimensionality of the parameter space by more than a factor of 2 and verify that the deep learning method outperforms a PCA baseline by two orders of magnitude. Analysis in the low dimensional space of both these techniques reveals two consistent physically interpretable regimes—a low pollution regime and an in-cloud regime. Through this work, we show that unsupervised machine learning techniques can learn useful information from in-situ atmospheric observations and provide interpretable results of low-dimensional variability.
Mental health problems are the major cause of disability among adolescents. Personalized prevention may help to mitigate the development of mental health problems, but no tools are available to identify individuals at risk before they require mental health care.
Methods
We identified children without mental health problems at baseline but with six different clinically relevant problems at 1- or 2-year follow-up in the Adolescent Brain Cognitive Development (ABCD) study. We used machine learning analysis to predict the development of these mental health problems with the use of demographic, symptom and neuroimaging data in a discovery (N = 3236) and validation (N = 3851) sample. The discovery sample (N = 168–513 per group) consisted of participants with MRI data and were matched with healthy controls on age, sex, IQ, and parental education level. The validation sample (N = 84–231) consisted of participants without MRI data.
Results
Subclinical symptoms at 9–10 years of age could accurately predict the development of six different mental health problems before the age of 12 in the discovery and validation sample (AUCs = 0.71–0.90). The additive value of neuroimaging in the discovery sample was limited. Multiclass prediction of the six groups showed considerable misclassification, but subclinical symptoms could accurately differentiate between the development of externalizing and internalizing problems (AUC = 0.79).
Conclusions
These results suggest that machine learning models can predict conversion to mental health problems during a critical period in childhood using subclinical symptoms. These models enable the personalization of preventative interventions for children at increased risk, which may reduce the incidence of mental health problems.
Constrained econometric techniques hamper investigations of disease prevalence and income risks in the shrimp industry. We employ an econometric model and machine learning (ML) to reduce model restrictions and improve understanding of the influence of diseases and climate on income and disease risks. An interview of 534 farmers with the models enables the discernment of factors influencing shrimp income and disease risks. ML complemented the Just-Pope production model, and the partial dependency plots show nonlinear relationships between income, disease prevalence, and risk factors. Econometric and ML models generated complementary information to understand income and disease prevalence risk factors.
Neural network models have been employed to predict the instantaneous flow close to the wall in a viscoelastic turbulent channel flow. Numerical simulation data at the wall are used to predict the instantaneous velocity fluctuations and polymeric-stress fluctuations at three different wall-normal positions in the buffer region. Such an ability of non-intrusive predictions has not been previously investigated in non-Newtonian turbulence. Our comparative analysis with reference simulation data shows that velocity fluctuations are predicted reasonably well from wall measurements in viscoelastic turbulence. The network models exhibit relatively improved accuracy in predicting quantities of interest during the hibernation intervals, facilitating a deeper understanding of the underlying physics during low-drag events. This method could be used in flow control or when only wall information is available from experiments (for example, in opaque fluids). More importantly, only velocity and pressure information can be measured experimentally, while polymeric elongation and orientation cannot be directly measured despite their importance for turbulent dynamics. We therefore study the possibility to reconstruct the polymeric-stress fields from velocity or pressure measurements in viscoelastic turbulent flows. The neural network models demonstrate a reasonably good accuracy in predicting polymeric shear stress and the trace of the polymeric stress at a given wall-normal location. The results are promising, but also underline that a lack of small scales in the input velocity fields can alter the rate of energy transfer from flow to polymers, affecting the prediction of the polymeric-stress fluctuations.
This chapter covers quantum algorithmic primitives for loading classical data into a quantum algorithm. These primitives are important in many quantum algorithms, and they are especially essential for algorithms for big-data problems in the area of machine learning. We cover quantum random access memory (QRAM), an operation that allows a quantum algorithm to query a classical database in superposition. We carefully detail caveats and nuances that appear for realizing fast large-scale QRAM and what this means for algorithms that rely upon QRAM. We also cover primitives for preparing arbitrary quantum states given a list of the amplitudes stored in a classical database, and for performing a block-encoding of a matrix, given a list of its entries stored in a classical database.
This chapter covers quantum linear system solvers, which are quantum algorithmic primitives for solving a linear system of equations. The linear system problem is encountered in many real-world situations, and quantum linear system solvers are a prominent ingredient in quantum algorithms in the areas of machine learning and continuous optimization. Quantum linear systems solvers do not themselves solve end-to-end problems because their output is a quantum state, which is one of its major caveats.
This chapter covers variational quantum algorithms, which act as a primitive ingredient for larger quantum algorithms in several application areas, including quantum chemistry, combinatorial optimization, and machine learning. Variational quantum algorithms are parameterized quantum circuits where the parameters are trained to optimize a certain cost function. They are often shallow circuits, which potentially makes them suitable for near-term devices that are not error corrected.
This chapter covers a number of disparate applications of quantum computing in the area of machine learning. We only consider situations where the dataset is classical (rather than quantum). We cover quantum algorithms for big-data problems relying upon high-dimensional linear algebra, such as Gaussian process regression and support vector machines. We discuss the prospect of achieving a quantum speedup with these algorithms, which face certain input/output caveats and must compete against quantum-inspired classical algorithms. We also cover heuristic quantum algorithms for energy-based models, which are generative machine learning models that learn to produce outputs similar to those in a training dataset. Next, we cover a quantum algorithm for the tensor principal component analysis problem, where a quartic speedup may be available, as well as quantum algorithms for topological data analysis, which aim to compute topologically invariant properties of a dataset. We conclude by covering quantum neural networks and quantum kernel methods, where the machine learning model itself is quantum in nature.
When using machine learning to model environmental systems, it is often a model’s ability to predict extreme behaviors that yields the highest practical value to policy makers. However, most existing error metrics used to evaluate the performance of environmental machine learning models weigh error equally across test data. Thus, routine performance is prioritized over a model’s ability to robustly quantify extreme behaviors. In this work, we present a new error metric, termed Reflective Error, which quantifies the degree at which our model error is distributed around our extremes, in contrast to existing model evaluation methods that aggregate error over all events. The suitability of our proposed metric is demonstrated on a real-world hydrological modeling problem, where extreme values are of particular concern.
The oriental armyworm, Mythimna separata (Walker), is a highly migratory pest known for its sudden larval outbreaks, which result in severe crop losses. These unpredictable surges pose significant challenges for timely and accurate monitoring, as conventional methods are labour-intensive and prone to errors. To address these limitations, this study investigates the use of machine learning for automated and precise identification of M. separata larval instars. A total of 1577 larval images representing different instar were analysed for geometric, colour, and texture features. Additionally, larval weight was predicted using 13 regression models. Instar identification was conducted using Support Vector Classifier (SVC), Random Forest, and Multi-Layer Perceptron. Key feature contributing to classification accuracy were subsequently identified through permutation feature importance analysis. The results demonstrated the potential of machine learning for automating instar identification with high efficiency and accuracy. Predicted larval weight emerged as a key feature, significantly enhancing the performance of all identification models. Among the tested approaches, BaggingRegressor exhibited the best performance for larval weight prediction (R2 = 98.20%, RMSE = 0.2313), while SVC achieved the highest instar identification accuracy (94%). Overall, the integration of larval weight with other image-derived features proved to be a highly effective strategy. This study demonstrates the efficacy of machine learning in enhancing pest monitoring systems by providing a scalable and reliable framework for precise pest management. The proposed methodology significantly improves larval instar identification accuracy and efficiency, offering actionable insights for implementing targeted biological and chemical control strategies.
Understanding country-level nutrition intake is crucial to global nutritional policies that aim to reduce disparities and relevant disease burdens. Still, there are limited numbers of studies using clustering techniques to analyse the recent Global Dietary Database. This study aims to extend an existing multivariate time-series clustering algorithm to allow for greater customisability and to provide the first cluster analysis of the Global Dietary Database to explore temporal trends in country-level nutrition profiles (1990-2018).
Design:
Trends in sugar-sweetened beverage intake and nutritional deficiency were explored using the newly developed program ‘MTSclust’. Time-series clustering algorithms are different from simple clustering approaches in their ability to appreciate temporal elements.
Setting:
Nutritional and demographical data from 176 countries were analysed from the Global Dietary Database.
Participants:
Population representative samples of the 176 in the Global Dietary Database.
Results:
In a 3-class test specific to the domain, the MTSclust program achieved a mean accuracy of 71.5% (Adjusted Rand Index [ARI]=0.381) while the mean accuracy of a popular algorithm, DTWclust, was 58% (ARI=0.224). The clustering of nutritional deficiency and sugar-sweetened beverage intake identified several common trends among countries and found that these did not change by demographics. Multivariate time-series clustering demonstrated a global convergence towards a Western diet.
Conclusion:
While global nutrition trends are associated with geography, demographic variables such as sex and age, are less influential to the trends of certain nutrition intake. The literature could be further supplemented by applying outcome-guided methods to explore how these trends link to disease burdens.
In this chapter, we review approaches to model climate-related migration including the multiple goals of modeling efforts and why modeling climate-related migration is of interest to researchers, commonly used sources of climate and migration data and data-related challenges, and various modeling methods used. The chapter is not meant to be an exhaustive inventory of approaches to modeling climate-related migration, but rather is intended to present the reader with an overview of the most common approaches and possible pitfalls associated with those approaches. We end the chapter with a discussion of some of the future directions and opportunities for data and modeling of climate-related migration.
In this study, we tackle the challenge of inferring the initial conditions of a Rayleigh–Taylor mixing zone for modelling purposes by analysing zero-dimensional (0-D) turbulent quantities measured at an unspecified time. This approach assesses the extent to which 0-D observations retain the memory of the flow, evaluating their effectiveness in determining initial conditions and, consequently, in predicting the flow’s evolution. To this end, we generated a comprehensive dataset of direct numerical simulations, focusing on miscible fluids with low density contrasts. The initial interface deformations in these simulations are characterised by an annular spectrum parametrised by four non-dimensional numbers. To study the sensitivity of 0-D turbulent quantities to initial perturbation distributions, we developed a surrogate model using a physics-informed neural network (PINN). This model enables computation of the Sobol indices for the turbulent quantities, disentangling the effects of the initial parameters on the growth of the mixing layer. Within a Bayesian framework, we employ a Markov chain Monte Carlo (MCMC) method to determine the posterior distributions of initial conditions and time, given various state variables. This analysis sheds light on inertial and diffusive trajectories, as well as the progressive loss of initial conditions memory during the transition to turbulence. Furthermore, it identifies which turbulent quantities serve as better predictors of Rayleigh–Taylor mixing zone dynamics by more effectively retaining the memory of the flow. By inferring initial conditions and forward propagating the maximum a posteriori (MAP) estimate, we propose a strategy for modelling the Rayleigh–Taylor transition to turbulence.
Prediction models that can detect the onset of psychotic experiences are a key component of developing Just-In-Time Adaptive Interventions (JITAI). Building these models on passively collectable data could substantially reduce user burden. In this study, we developed prediction models to detect experiences of auditory verbal hallucinations (AVH) and paranoia using ambulatory sensor data and assessed their stability over 12 weeks.
Methods
Fourteen individuals diagnosed with a schizophrenia-spectrum disorder participated in a 12-day Ecological Momentary Assessment (EMA) study. They wore ambulatory sensors measuring autonomic arousal (i.e., electrodermal activity, heart rate variability) and completed questionnaires assessing the intensity/distress of AVHs and paranoia once every hour. After 12 weeks, participants repeated the EMA for four days for a follow-up assessment. We calculated prediction models to detect AVHs, paranoia, and AVH-/paranoia-related distress using random forests within nested cross-validation. Calculated prediction models were applied to the follow-up data to assess the stability of prediction models.
Results
Prediction models calculated with physiological data achieved high accuracy both for AVH (81%) and paranoia (69%–75%). Accuracy increased by providing models with baseline information about psychotic symptom levels (AVH: 86%; paranoia: 80%–85%). During the follow-up EMA accuracy dropped slightly throughout all models but remained high (73%–84%).
Conclusions
Relying solely on physiological data to detect psychotic symptoms achieved substantial accuracy that remained sufficiently stable over 12 weeks. Experiences of AVHs can be predicted with higher accuracy and long-term stability than paranoia. The findings tentatively suggest that psychophysiology-based prediction models could be used to develop and enhance JITAIs for psychosis.
One of the most significant challenges in research related to nutritional epidemiology is the achievement of high accuracy and validity of dietary data to establish an adequate link between dietary exposure and health outcomes. Recently, the emergence of artificial intelligence (AI) in various fields has filled this gap with advanced statistical models and techniques for nutrient and food analysis. We aimed to systematically review available evidence regarding the validity and accuracy of AI-based dietary intake assessment methods (AI-DIA). In accordance with PRISMA guidelines, an exhaustive search of the EMBASE, PubMed, Scopus and Web of Science databases was conducted to identify relevant publications from their inception to 1 December 2024. Thirteen studies that met the inclusion criteria were included in this analysis. Of the studies identified, 61·5 % were conducted in preclinical settings. Likewise, 46·2 % used AI techniques based on deep learning and 15·3 % on machine learning. Correlation coefficients of over 0·7 were reported in six articles concerning the estimation of calories between the AI and traditional assessment methods. Similarly, six studies obtained a correlation above 0·7 for macronutrients. In the case of micronutrients, four studies achieved the correlation mentioned above. A moderate risk of bias was observed in 61·5 % (n 8) of the articles analysed, with confounding bias being the most frequently observed. AI-DIA methods are promising, reliable and valid alternatives for nutrient and food estimations. However, more research comparing different populations is needed, as well as larger sample sizes, to ensure the validity of the experimental designs.
Data-based methods have gained increasing importance in engineering. Success stories are prevalent in areas such as data-driven modeling, control, and automation, as well as surrogate modeling for accelerated simulation. Beyond engineering, generative and large-language models are increasingly helping with tasks that, previously, were solely associated with creative human processes. Thus, it seems timely to seek artificial-intelligence-support for engineering design tasks to automate, help with, or accelerate purpose-built designs of engineering systems for instance in mechanics and dynamics, where design so far requires a lot of specialized knowledge. Compared with established, predominantly first-principles-based methods, the datasets used for training, validation, and test become an almost inherent part of the overall methodology. Thus, data publishing becomes just as important in (data-driven) engineering science as appropriate descriptions of conventional methodology in publications in the past. However, in mechanics and dynamics, quite widely, still traditional publishing practices are prevalent that largely do not yet take into account the rising role of data as much as that may already be the case in pure data-scientific research. This article analyzes the value and challenges of data publishing in mechanics and dynamics, in particular regarding engineering design tasks, showing that the latter raise also challenges and considerations not typical in fields where data-driven methods have been booming originally. Researchers currently find barely any guidance to overcome these challenges. Thus, ways to deal with these challenges are discussed and a set of examples from across different design problems shows how data publishing can be put into practice.