Impact Statement
The ability to predict autoconversion rates directly from satellite data holds significant potential in enhancing our understanding of how clouds respond to anthropogenic aerosols. We develop a new process, powered by machine learning techniques, leading up to models that are able to predict autoconversion rates directly from satellite data. We argue that our approach is far less computationally expensive than atmospheric simulation-based methods, it offers reasonably good outcomes, and it can therefore easily leverage the wide availability of massive satellite data collection with long-term data provision. This exemplifies how important machine learning is for comprehending how aerosols affect clouds and, ultimately, climate change.
1. Introduction
Climate change represents one of the most pressing challenges afflicting our societies, with governments worldwide, non-governmental organizations, and other stakeholders placing increasing emphasis on measures to successfully tackle and mitigate its adverse effects. A pressing scientific challenge relates to the need to understand aerosol-cloud interactions because these currently represent one of the major sources of uncertainty in determining the radiative forcing responsible for climate change projections (Intergovernmental Panel on Climate Change 2021—IPCC [Reference Masson-Delmotte, Zhai, Pirani, Connors, Péan, Berger, Caud and Chen2021]).
Aerosols can directly affect climate via scattering and absorbing solar radiation, but they also affect climate indirectly by altering the properties of clouds, which impacts Earth’s energy budget. In particular, an increase in aerosol concentrations may result in a larger number of droplets with smaller sizes, which, in turn, results in a higher cloud albedo (radiative forcing due to aerosol-cloud interactions, RFaci—Twomey [Reference Twomey.1974]); simultaneously, a more uniform distribution of smaller droplets reduces precipitation efficiency and increases cloud lifetime (one important adjustment to RFaci—Albrecht [Reference Albrecht1989], Gryspeerdt et al. [Reference Gryspeerdt, Goren, Sourdeval, Quaas, Mülmenstädt, Dipu, Unglaub, Gettelman and Christensen2019], Bellouin et al. [Reference Bellouin, Quaas, Gryspeerdt, Kinne, Stier, Watson-Parris, Boucher, Carslaw, Christensen and Daniau2020]). Therefore, determining the magnitude of force caused by aerosol-cloud interactions has been particularly challenging due to the intricate and interdependent nature of these processes, which involve complex feedback mechanisms. This complexity contributes to considerable uncertainty in our understanding of the radiative forcing of climate change.
One common way to study such complex interactions within the Earth system—i.e., the interaction between aerosols, clouds, and precipitation—is through the utilization of climate models. These models numerically solve the differential equations describing the fluid dynamics of the atmosphere and ocean on a discrete grid, with multiple layers across altitudes. This discretization is necessary due to the complexity of the climate system and the limited amount of computing power available, making it impossible to explain all processes occurring on Earth simultaneously. Processes that are smaller than the grid-scale of the model cannot be captured or represented directly by the model (IPCC [Reference Masson-Delmotte, Zhai, Pirani, Connors, Péan, Berger, Caud and Chen2021]). Parameterization is therefore required to resolve this problem. Parameterization is basically a simplified description, involving empirical parameters, of each small process that enables them to be incorporated into the model.
Alternatively, recent improvements in computer power have made it possible to rely on high-resolution models with smaller grid cells to enhance the representation of small-scale processes in the atmosphere in a realistic, large-domain setting (e.g., Stevens et al. [Reference Stevens, Acquistapace, Hansen, Heinze, Klinger, Klocke, Rybka, Schubotz, Windmiller and Adamidis2020]). While these models excel at capturing small-scale phenomena, their applicability is typically limited to specific spatial domains and short periods due to the considerable computational complexity involved. For instance, one of the high-resolution simulation models that can be used to simulate small-scale processes in the atmosphere is the ICOsahedral Non-hydrostatic Large-Eddy Model (ICON-LEM—Zängl et al. [Reference Zängl, Reinert, Rípodas and Baldauf2015], Dipankar et al. [Reference Dipankar, Stevens, Heinze, Moseley, Zängl, Giorgetta and Brdar2015], Heinze et al. [Reference Heinze, Dipankar, Henken, Moseley, Sourdeval, Trömel, Xie, Adamidis, Ament and Baars2017]). This simulation model requires approximately 13 hours across 300 computer nodes, incurring a cost of approximately EUR 4000 just to generate a single hour of climate data over Germany (Costa-Surós et al. [Reference Costa-Surós, Sourdeval, Acquistapace, Baars, Henken, Genz, Hesemann, Jimenez, König and Kretzschmar2020]). Therefore, given such a high cost, it is imperative to devise alternative approaches to capture complex interactions within the Earth’s climate system.
Our overarching goal—in line with other works showing that data-driven approaches can have a revolutionary impact in a variety of domains (Rolnick et al. [Reference Rolnick, Donti, Kaack, Kochanski, Lacoste, Sankaran, Ross, Milojevic-Dupont, Jaques and Waldman-Brown2019], Reuther et al. [Reference Reuther, Michaleas, Jones, Gadepally, Samsi and Kepner2019])—is to understand whether machine learning approach can be used to address this challenge: in particular, we focus on the prediction of autoconversion rates—the rate at which small droplets (cloud droplets) transform into larger ones (raindrops) through the collision and coalescence of cloud droplets—directly from satellite observations. This is one of the key processes of precipitation formation for liquid clouds, which in turn is key to better understanding cloud responses to anthropogenic aerosols (specifically the cloud lifetime effect—Albrecht [Reference Albrecht1989]).
In view of the fundamental need to understand autoconversion rates, there have also been various recent attempts to predict the autoconversion rates by using machine learning techniques, such as Seifert and Rasp [Reference Seifert and Rasp2020], Chiu et al. [Reference Chiu, Yang, van Leeuwen, Feingold, Wood, Blanchard, Mei and Wang2021], and Alfonso and Zamora [Reference Alfonso and Zamora2021]. The study by Seifert and Rasp [Reference Seifert and Rasp2020] utilizes neural networks to explore cloud microphysical parameterizations, focusing on warm-rain formation processes, such as autoconversion, accretion, and self-collection. Enhanced parameterizations for autoconversion and accretion rates in warm rain clouds were introduced by Chiu et al. [Reference Chiu, Yang, van Leeuwen, Feingold, Wood, Blanchard, Mei and Wang2021], informed by machine learning and optimization techniques. These parameterizations are based on in situ cloud probe measurements obtained during the Atmospheric Radiation Measurement Program field campaign in the Azores. Furthermore, Alfonso and Zamora [Reference Alfonso and Zamora2021] developed a machine-learned parameterization using a deep neural network. This neural network was trained using a dataset of autoconversion rates derived from solving the kinetic collection equation over a wide range of droplet concentrations and liquid water contents.
Our research distinguishes itself from prior studies by shifting its focus. While prior research has predominantly focused on estimating autoconversion rates within simulation data and in-situ measurements, our investigation revolves around the more complex objective of directly estimating autoconversion rates from satellite observations.
Our motivation derives from the fact that satellite data provide extensive geographic coverage and frequent observations, which therefore have the potential to enhance our understanding of autoconversion rates in clouds on a scale that simulation-based models, due to computational and cost constraints alluded to earlier, cannot achieve. However, directly developing a machine learning model capable of predicting autoconversion rates from satellite observations is not a trivial task. Therefore, we adopt an innovative approach involving two phases leading up to the development of such a prediction model: (1) first, we leverage high-resolution climate model data to develop machine learning models capable of predicting autoconversion rates, and (2) second, we repurpose our best machine learning model to predict autoconversion rates directly from satellite observations.
Our paper is structured as follows: In Section 2, we delve into the methodology employed in our study, specifically discussing the datasets, machine learning models, procedures adopted in our study, and the evaluation of the machine learning models. In Section 3, we present a number of experimental results showcasing that our approach can successfully predict autoconversion rates directly from satellite data. Finally, in Section 4, we conclude our paper, also offering insights into potential future directions for further investigation.
2. Methodology
We now describe the approach we adopt to develop machine learning models that can be used to estimate autoconversion rates directly from satellite observations. In summary, our methodology encompasses two stages as follows:
-
• The first stage involves building a set of machine learning models that can be used to deliver the autoconversion rates given the cloud’s effective radius. The training and testing of these models are performed by leveraging high-resolution climate model data. Please refer to Sub-subsection 2.3.1 for more details.
-
• The second stage involves repurposing the (best) machine learning model to predict autoconversion rates directly from satellite observations. Please refer to Sub-subsection 2.3.2 for more details.
Our methodology is also briefly portrayed in Figure 1, where the left-hand side emphasizes how we use climate science-based data derived from a high-resolution climate science model to generate a dataset to train, validate, and test our machine learning models (under stage 1), whereas the right-hand side illustrates the use of the best machine learning model to directly predict autoconversion rates from satellite data (under stage 2).
We next describe in detail the various datasets (Subsection 2.1), machine learning models (Subsection 2.2), learning procedures (Subsection 2.3), and the evaluation of the machine learning models (Subsection 2.4), underlying these two stages.
2.1. Data
2.1.1. Stage 1: climate model data
We first describe the data underlying stage 1 of our approach, where we develop some models capable of predicting autoconversion rates given the cloud effective radius, at each grid point in three-dimensional space, including latitude, longitude, and altitude or level. We rely on climate science-based procedures, leveraging data from high-resolution climate models, to generate datasets containing various input-output pairs consisting of cloud effective radius paired with the corresponding autoconversion rate.
In particular, we utilize data deriving from the ICON-LEM simulations over Germany on 2 May 2013 to train, validate, and test our machine learning models. The ICOsahedral non-hydrostatic model (ICON) is a unified model for global numerical weather prediction and climate simulations which is the result of a joint project between the German Meteorological Service (Deutscher Wetterdienst, DWD) and the Max Planck Institute for Meteorology (MPI-M) (Zängl et al. [Reference Zängl, Reinert, Rípodas and Baldauf2015]). The ICON Large Eddy Model (ICON-LEM) is an ICON version which can be run in a high-resolution large eddy mode. Detailed descriptions and comprehensive evaluations of ICON-LEM are provided in references Dipankar et al. [Reference Dipankar, Stevens, Heinze, Moseley, Zängl, Giorgetta and Brdar2015] and Heinze et al. [Reference Heinze, Dipankar, Henken, Moseley, Sourdeval, Trömel, Xie, Adamidis, Ament and Baars2017], respectively.
Specifically, we utilize data deriving from the ICON-LEM simulations relating to a time period approximately corresponding to the overpass times of the polar-orbiting satellites, specifically focusing on the time period of 09:55 UTC to 13:20 UTC. The period was chosen due to the fact that distinct cloud regimes occurred in the model domain on the chosen day (2 May 2013), providing an opportunity to investigate quite different elements of cloud formation and evolution (Heinze et al. [Reference Heinze, Dipankar, Henken, Moseley, Sourdeval, Trömel, Xie, Adamidis, Ament and Baars2017]).
The data covers 150 altitude levels in the vertical grid, with the grid reaching the top of the model at 21 km. The ICON-LEM simulations also cover 3 different horizontal resolutions, in two-way nests, namely 625, 312, and 156 m. Our study focuses on ICON-LEM with the highest resolution, 156 m, on the native ICON grid, and then regridded to a regular 1 km resolution to match the resolution of the MODIS data. This allows us to directly compare the simulated data with the observed data from MODIS.
Further, we also utilize data deriving from ICON numerical weather prediction (ICON-NWP) simulations (Kolzenburg et al. [Reference Stephan Kolzenburg, Thordarson, Höskuldsson and Dingwell2017], Haghighatnasab et al. [Reference Haghighatnasab, Kretzschmar, Block and Quaas2022]) that were performed over the North Atlantic ocean for a week from 1 September 2014 to 7 September 2014 since these simulations capture completely different weather conditions, thereby offering an opportunity to further test the performance of our machine learning model. The ICON-NWP simulations for the Holuhraun volcano have 75 altitude levels in the vertical grid, with the grid reaching the top of the model at 30 km, and a horizontal resolution of approximately 2.5 km.
The simulation data contains a number of variables associated with each grid point such as cloud effective radius, cloud optical thickness, etc. However, due to the vast amount of storage, the large-domain high-resolution simulation of Heinze et al. [Reference Heinze, Dipankar, Henken, Moseley, Sourdeval, Trömel, Xie, Adamidis, Ament and Baars2017] did not store all the quantities necessary for our study. Instead, state quantities and process rates had to be diagnosed (recomputed) from the available three-dimensional instantaneous output. In particular, the two-moment microphysical parameterization of Seifert and Beheng [Reference Seifert and Beheng2006] was used to derive the autoconversion rates, in our training, validation, and testing data.
The autoconversion parameterization given by Seifert and Beheng [Reference Seifert and Beheng2006]—which applies a double-moment bulk cloud and precipitation microphysical scheme with prognostic mass and number for different hydrometeor classes—is given by:
This equation—which describes the rate of change of rainwater content (specific mass) $ \left({L}_r\right) $ , due to the process of autoconversion—makes the assumption that the size distribution of cloud droplets follows a hyper-gamma distribution and the size distribution of raindrops follows an exponential distribution. The hyper-gamma distribution is a probability distribution that is characterized by a width parameter ( $ {\nu}_c $ ), whereas the exponential distribution is a probability distribution that is commonly used to model the size distribution of raindrops (Marshall and Palmer [Reference Marshall and Palmer1948]).
The variable $ \tau $ captures the ratio of the cloud water content, $ {L}_c $ , to the sum of the cloud and rain water content, i.e., $ \tau =1-\left({L}_c/\left({L}_c+{L}_r\right)\right) $ . The constant $ {k}_{cc} $ , which represents the cloud-cloud collision rate, has a value of $ 4.44\times {10}^9\;{\mathrm{m}}^3\;{\mathrm{kg}}^{-2}\;{\mathrm{s}}^{-1} $ , as determined by the study conducted by Pinsky et al. [Reference Pinsky, Khain and Shapiro2001]. The density of air at sea level, $ {\rho}_0 $ , is assumed to be equal to $ 1.225\;\mathrm{kg}\;{\mathrm{m}}^{-3} $ . The varying air density is denoted as $ \rho $ . The function $ {\Phi}_{au}\left(\tau \right) $ is defined as $ 400\;{\tau}^{0.7}{\left(1-{\tau}^{0.7}\right)}^3 $ . A drop mass of $ {x}^{\ast }=2.6\times {10}^{-10} $ kg is used as a threshold to distinguish between cloud droplets and raindrops. This value corresponds to an individual drop with a radius of 40 μm and a liquid water density of $ 1000\;\mathrm{kg}\;{\mathrm{m}}^{-3} $ . Additionally, $ {\overline{x}}_c $ represents the mean mass of cloud droplets in kg.
The computation of the autoconversion rates utilizing the method outlined in the study by Seifert and Beheng [Reference Seifert and Beheng2006], requires several additional pieces of information. Specifically, we need the mass density of cloud droplets or the cloud water content ( $ {L}_c $ ), the mass density of raindrops $ \left({L}_r\right) $ , and the air density ( $ \rho $ ). Appendix A contains the information required to compute these values.
Equation 2.1 highlights that this autoconversion rate formulation functions as a highly non-linear expression based on specific cloud mass. In this context, specific cloud mass, cloud droplet number concentration, and cloud effective radius exhibit interconnectedness. The droplet number concentration within the simulations serves as a prognostic variable, originating from the activation of cloud condensation nuclei (CCN), prescribed to vertical variation but constant in the horizontal and in time. Several sink processes of droplet number include autoconversion itself, accretion, and also droplet evaporation following turbulent mixing.
Predicting the autoconversion rates holds significance. The “real” process is the coagulation of droplets and drops across the size spectrum. While categorizing the particle size spectrum into two distinct classes is a simplification, it remains a justified one for liquids. This approach proves particularly useful in understanding how clouds lose their water through precipitation.
Finally, we specifically extracted the autoconversion rates pertaining to cloud tops because this reflects satellite-like data. The cloud-top autoconversion rates are determined by selecting autoconversion rates at the level where, integrating from the cloud-top, the cloud optical thickness exceeds 1. This optical thickness threshold represents the penetration depth for retrieving cloud microphysical quantities by the optical satellite sensors. While our approach provides practicality, it might not directly translate to scattering scenarios, particularly regarding wavelengths relevant to MODIS optical properties retrieval (Platnick [Reference Platnick2000]). While acknowledging that this simplification might not perfectly align with scattering cases and could potentially have limitations, we believe it still fulfills its testing purpose. It is important to note that we exclusively utilize this quantity for testing purposes. Our training model remains unaffected, as each input-output variable is trained with its corresponding vertical variety.
In summary, by adopting this procedure, we are able to create a dataset containing a series of data points consisting of cloud effective radius—autoconversion rate pairs corresponding to different points on the grid.
2.1.2. Stage 2: satellite data
We now turn to the data underlying stage 2 of our approach where we repurpose the learned machine learning (best) model to predict the autoconversion rates directly from satellite observations.
Concretely, we utilize Collection 6 of level-2 (L2) cloud product of Terra and Aqua of the Moderate Resolution Imaging Spectroradiometer (MODIS) (Platnick et al. [Reference Platnick, Meyer, King, Wind, Amarasinghe, Marchant, Arnold, Zhang, Hubanks, Holz, Yang, Ridgway and Riedi2017], Platnick et al. [Reference Platnick, Meyer, King, Wind, Amarasinghe, Marchant, Arnold, Zhang, Hubanks, Ridgway and Riedi2018])—MOD06 and MYD06, respectively. The MODIS L2 cloud product contains information on both cloud-top properties and cloud optical properties.
Onboard the Terra and Aqua satellites, MODIS is one of the most commonly utilized remote sensing instruments for Earth science research. Terra’s orbit crosses the equator from north to south in the morning, whereas Aqua’s orbit crosses the equator from south to north in the afternoon. The approximate overpass time for the Terra satellite is around 10.30 hours, whereas, for the Aqua satellite, it is around 13.30 hours of local solar time.
Additionally, MODIS employs 36 spectral bands spanning from visible to infrared wavelengths. MODIS visible and near-infrared channel radiances are employed to determine cloud particle phases, effective cloud particle radii, and cloud optical thickness. We utilize the cloud effective radius at each grid point in a two-dimensional space, which corresponds to a specific range of latitude and longitude with a resolution of 1 km. The machine learning model ingests this cloud effective radius to deliver an estimate of the cloud-top autoconversion rates.
Regarding the MODIS data, we have selected cloud effective radius (CER) derived from the 2.1 μm channel. This decision was based on considering the following options: the 3.7 μm option resulted in an excessively small CER distribution, while the 1.6 μm option would yield an overly large distribution (Zhang and Platnick [Reference Zhang and Platnick2011]). Therefore, the selection of the middle value, 2.1 μm, seemed most suitable given these considerations.
2.1.3. Data preprocessing
In line with standard practices, we further preprocess the data (for both simulation output and satellite data) in order to train, validate, and test our various machine learning models. In particular, we adopt the following procedure:
Preprocessing step 1: First, we exclusively utilize data relating to cloudy grid columns that are further filtered by selecting cloud effective radius smaller than or equal to 40 μm, since this specific threshold of 40 μm is also used to separate cloud droplets and raindrops as used in Seifert and Beheng [Reference Seifert and Beheng2006]. The reason for adopting such a filtering process has to do with the fact that we intend to focus exclusively on cloud droplets, since our primary focus lies in the autoconversion process, rather than accretion or self-collection processes.
Preprocessing step 2: Second, we transform and normalize both input data (cloud effective radius) and output data (autoconversion rate) according to the transformation given by:
where $ {z}^{\prime } $ denotes the normalized data point (input or output) and $ z $ denotes the original data point, $ \mu $ corresponds to the mean value of the logarithmically transformed input or output data points, and $ \sigma $ corresponds to the standard deviation of the logarithmically transformed data points. It is worth noting that this normalization operation is carried out independently for points from the training and testing sets to avoid contamination. The rationale for adopting this transformation/normalization procedure is to effectively deal with extremely small values of autoconversion rates and cloud effective radius (this is particularly important for the autoconversion rates since they exhibit minuscule values). Furthermore, by applying this transformation, it becomes more manageable to disentangle the relationship between these two variables and to guarantee that data features are not too far away from zero in order to stabilize the learning process.
Before proceeding with both preprocessing steps mentioned above, we initially proceeded to split the relevant datasets into subsets for training, validating, and testing the machine learning models. Notably, in stage 1 of our approach, we split the data into a training set (containing 80% of the original data) and a testing set (containing 20% of the original data). For the neural network models, we use 20% of the training set as the validation set, serving as a means to fine-tune the model’s performance. Meanwhile, for the random forest model, we use 3-fold cross-validation. In stage 2 of our approach, we only use a testing set since we are only assessing the performance of the best model under stage 1 using real satellite data.
2.2. Machine learning models
We have selected, trained, and tested various machine learning models to predict desired output from input, including linear regression, polynomial regression, random forest (RF—Breiman [Reference Breiman2001]), shallow neural network (NN), and deep neural network (DNN—Schmidhuber [Reference Schmidhuber2015]). We briefly describe some of these models in the sequel.
2.2.1. Random forest (RF)
A random forest is an ensemble method that is constructed from decision tree algorithms (Breiman [Reference Breiman2001]). This technique uses bootstrap aggregation (bagging) to select a random sample from a training set where individual data points can be selected multiple times. The term “forest” refers to a group of trees, in this case, decision trees.
The bagging technique is applied to collect several sets of random samples. Each sample set is then dealt with by each individual decision tree. The final prediction delivered by a random forest is determined by averaging the predictions of each individual decision tree.
A random forest uses a range of hyperparameters, including $ {\mathrm{N}}_t, $ $ {\mathrm{min}}_s $ , $ {\mathrm{min}}_l $ , and $ {\mathrm{max}}_d $ . The hyperparameter $ {\mathrm{N}}_t $ represents the number of trees in a random forest; the number of trees influences the complexity of the model. The hyperparameter $ {\mathrm{min}}_s $ governs the minimum number of samples required to split a tree node in a random forest. In other words, if the node contains samples that exceed or equal to $ {\mathrm{min}}_s $ , it can be further subdivided. A small value of $ {\mathrm{min}}_s $ may result in a model that overfits the data; conversely, a large value of $ {\mathrm{min}}_s $ may result in a model that underfits the data. The hyperparameter $ {\mathrm{min}}_l $ indicates the least amount of samples which, notwithstanding a split, should appear in leaf nodes. This means that any split point within any level must leave at least $ {\mathrm{min}}_l $ training samples on the left and right branches. The hyperparameter $ {\mathrm{max}}_d $ defines the depth of the tree, which is the furthest distance between the root and leaf nodes.
These hyperparameters can be optimized by using techniques such as grid search and random search to balance the trade-off between model performance and complexity. Since the performance of an RF is strongly influenced by its hyperparameters, we conduct a random search to find reasonable hyperparameter values. Our final hyperparameters are shown in Table 1.
2.2.2. Neural networks
A neural network is a type of machine learning model inspired by the work of the human brain, simulating how neurons interact with each other, which was first proposed by McCulloch and Pitts [Reference McCulloch and Pitts1943]. A neural network is made up of several layers, including an input layer (ingesting the input data), one or more hidden layers, and an output layer (delivering the output data). Each individual layer is composed of various neurons, connecting to other neurons in the previous layer, and connecting to neurons in the subsequent layer, with each connection carrying a corresponding weight.
The Deep Neural Network (DNN—Schmidhuber [Reference Schmidhuber2015]) is a neural network with multiple hidden layers. Let $ {x}_j^{(l)} $ be the value of the $ j $ -th neuron in layer $ l $ , $ {x}_i^{\left(l-1\right)} $ be the value of the $ i $ -th neuron in layer $ l-1 $ , $ {w}_{ij}^{(l)} $ be the weight of the connection between neuron $ i $ in layer $ l-1 $ and neuron $ j $ in layer $ l $ , and $ {b}_j^{(l)} $ be the bias term of neuron $ j $ in layer $ l $ . Then, we can express the value of $ {z}_j^{(l)} $ (before applying the activation function) in terms of the values of $ {x}_i^{\left(l-1\right)} $ , $ i=1,\dots, {n}^{\left(l-1\right)} $ , where $ {n}^{\left(l-1\right)} $ is the number of neurons in the layer $ l-1 $ , as follows:
The output of this neuron is then passed through the activation function $ g $ :
where $ g\left(\cdot \right) $ is a nonlinear activation function such as a rectified linear unit (ReLU). The input to the neural network (at the very first layer) corresponds to the various data features and the output of the neural network (at the very final layer) corresponds to the desired prediction.
A DNN is parameterized by various parameters, i.e., the weights and the biases corresponding to the different neural network layers. The parameters are adjusted during training using the backpropagation algorithm to minimize a desired loss function measuring the discrepancy between the predicted output and the reference simulation data, leveraging the availability of a training set.
We developed two neural network models—one very simple model (shallow NN) and another slightly more complex model (pyramid-shaped DNN) to evaluate the impact of alterations to the architecture of neural networks. The first model, referred to as the shallow NN or NN, is a basic neural network that has only one hidden layer consisting of 64 neurons. In contrast, the second model, referred to as the pyramid-shaped DNN or DNN, employs a more complex architecture that encourages hierarchical learning in the network. However, it is worth noting that this architecture was chosen arbitrarily and alternative architectures can also be employed. The DNN architecture of our machine learning model, with cloud effective radius ( $ {r}_e $ ) as input and autoconversion rates ( $ {a}_u $ ) as output, is depicted in Figure 2. The model contains five fully connected hidden layers, with node counts increasing by a factor of two for each subsequent layer—64 nodes in the first layer, 128 nodes in the second layer, and so on until the fifth layer.
Both the DNN and NN models use Leaky ReLU as the activation function at every hidden layer. As for the loss function for both models, we employed mean squared error (MSE). We choose MSE as our primary metric for several reasons. MSE’s inherent property of squaring errors provides greater weight to larger deviations, which proves advantageous in situations where penalizing larger deviations is crucial, enhancing the model’s sensitivity to outliers. Additionally, its smoothness and differentiability across its entire domain render it well-suited for optimization algorithms, simplifying gradient computations and thus streamlining the training process. Moreover, MSE’s representation as an average of squared differences between predicted and actual values contributes to its interpretability by directly measuring the average magnitude of errors. Finally, MSE stands as the most common metric in regression problems, aligning with our objective of minimizing disparities between predicted and actual continuous numerical values, in this case, autoconversion rates. Additional experiments involving different loss functions are also included in Appendix B.1.
To optimize the models, we employed Adam’s optimizer and utilized the KerasTuner package (O’Malley et al. [Reference O’Malley, Bursztein, Long, Chollet, Jin and Invernizzi2019]) to determine the optimal number of training samples per batch. KerasTuner is a python package to solves the challenge of hyperparameter search through its user-friendly and scalable hyperparameter optimization framework.
2.3. Training/testing procedure
We next describe further the settings underlying our training and/or testing procedures relating to stages 1 and 2.
2.3.1. Stage 1—predicting the autoconversion Rates from atmospheric simulation model output
As discussed earlier, this phase involves training machine learning models which are able to predict autoconversion rates given the cloud effective radius, at each grid point in three-dimensional space, including latitude, longitude, and altitude or level, by leveraging a dataset containing input-output pairs derived from the atmospheric simulations (as described earlier).
In particular, as described in the previous subsections, we train various machine learning models of different complexities, including linear regression, polynomial regression, random forest, and neural networks, to determine their effectiveness. These models are trained using the dataset derived from the ICON-LEM simulations over Germany, as explained earlier.
Subsequently, we evaluate our best model using different testing datasets and scenarios associated with the ICON-LEM simulations over Germany and the ICON-NWP simulations over Holuhraun, as follows:
-
1. ICON-LEM Germany: In this testing scenario, we evaluate the performance of our machine learning models using the same data that was utilized during its training process. Once again, this data, which consists of a set of cloud effective radius and autoconversion rates corresponding to different points in three-dimensional space, was collected through the use of ICON-LEM simulations specifically over Germany. The testing data, however, differs from the training data as we focus on a different time period, specifically 2 May 2013 at 1:20 pm. This approach enables us to assess the model’s generalization capability to new data within the same region and day, while considering significant weather variations that evolved considerably (Heinze et al. [Reference Heinze, Dipankar, Henken, Moseley, Sourdeval, Trömel, Xie, Adamidis, Ament and Baars2017]).
-
2. Cloud-top ICON-LEM Germany: In this testing scenario, we evaluate the performance of our machine learning model by utilizing the same data as in the previous scenario, with the exception that we are only considering the cloud-top information of the data. The data involved in this testing scenario is a collection of cloud-top autoconversion rates and cloud effective radius pairs associated with 2D spatial points, representing a specific range of latitude and longitude, as well as a specific altitude for each point, specifically the cloud-top height for each point. We extract this cloud-top 2D data from the 3D atmospheric simulation model by selecting the variable value at any given latitude and longitude where the cloud optical thickness exceeds 1, integrating vertically from the cloud-top.
-
3. Coarser Horizontal Resolution of Cloud-top ICON-LEM Germany: This testing scenario is designed to evaluate the performance of our machine learning model by utilizing the same data, but at different resolutions. Specifically, we have modified the resolution of cloud-top ICON-LEM Germany from 1 km to 2.5 km and 10 km. The evaluation of a model’s performance in the presence of data with different spatial resolutions is important since it can help us understand the model’s ability to make accurate predictions in real-world applications where data with varying spatial resolutions may be encountered.
-
4. Cloud-top ICON-NWP Holuhraun: This final testing scenario uses distinct data from that of previous scenarios. In particular, we use the cloud-top of ICON-NWP Holuhraun data that was acquired at a different location, time, and resolution compared with the data used in the previous scenarios. The ability of the model to perform well in the presence of new data is important in many practical applications, allowing the model to make accurate predictions on unseen data, adapt to varying geographical locations, and adapt to different meteorological conditions.
Overall, these testing scenarios are crucial in evaluating the performance and robustness of our machine learning models and can provide valuable insights that can be used to improve the model’s accuracy and usefulness in a wide range of applications.
2.3.2. Stage 2—Predicting the autoconversion rates from satellite data
Finally, as discussed earlier, this stage focuses on predicting the autoconversion rates directly from satellite data. Notably, we do not train new machine learning models but instead simply select the most promising model derived during the previous stage to forecast the autoconversion rates of the satellite data-derived inputs, specifically the cloud effective radius. The best machine learning model is then tested on several specific scenarios, which can be found in Table 2.
The reasons behind our selection of those specific datasets at those particular times and locations are twofold. Firstly, the chosen datasets contained substantial cloud data, contributing significantly to our study. Secondly, the data available at those specific times and locations were present in both the MODIS and ICON datasets, offering valuable consistency and comparability for our analysis. The use of these specific datasets allows us to evaluate the performance of the selected model under various geographical conditions as well as different time periods.
2.4. Model evaluation
The performance of each machine learning model is evaluated by calculating various metrics—such as R2, MAPE (mean absolute percentage error), RMSPE (root mean squared percentage error), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM)—on the testing data. However, given that the autoconversion rates are presented in very small values, we have carried out a transformation of the reference simulation autoconversion rates and the predicted ones prior to the calculation of these various error metrics, involving applying a base-10 logarithm to these quantities followed by normalization to the range 0 to 1. These transformations have also enabled us to offer better visualizations of the relevant quantities on a two-dimensional grid, allowing a meaningful comparison between the predictions and the reference simulation data.
The R2 metric is used to find out how well the regression model describes the relationship between the predicted values and actual values. In addition, the MAPE and RMSPE metrics are expressed as percentages making it easy to convey the performance of a model to individuals that may not possess familiarity with the particular data. MAPE is a metric that measures the average percentage difference between the predicted and actual values of a variable. On the other hand, RMSPE is similar to MAPE but it also takes into account not only the absolute difference between predicted and actual values but also their squared difference, which can be advantageous when identifying larger errors is crucial. Both metrics are commonly used in forecasting as they provide a clear indication of the percentage error in the model’s predictions.
Finally, in view of the fact that we report some results using images, we also use two other measures that are widely adopted to evaluate image quality. In particular, we use PSNR and SSIM to measure the similarity of a predicted image with the reference simulation image. PSNR reports the ratio between the maximum signal value and the noise present in the image in decibel (dB). A higher PSNR value indicates superior image quality. SSIM, on the other hand, evaluates the structural similarity between the predicted and reference simulation images, taking into account attributes such as luminance, contrast, and structure. The resulting index ranges from -1 to 1, with a value of 1 indicating a perfect match between the images.
3. Experimental results
We now report our experimental results relating to the prediction of autoconversion rates from the atmospheric simulation model output and satellite data, with a variety of machine learning models and scenarios that have been developed using the procedure described earlier.
3.1. Autoconversion on simulation model (ICON)
We recall that in this scenario we have trained a variety of machine learning models, including simple and more complex ones, that are able to predict autoconversion rates given cloud effective radius; we have then tested the best machine learning model under a variety of scenarios that have been outlined in Subsection 2.3. The main observations relating to the variety of machine learning models and various testing scenarios are as follows:
-
ICON-LEM Germany: Table 3 showcases a variety of evaluation metrics pertaining to the prediction of autoconversion rates through different machine learning models in this testing scenario. The results indicate that simple linear regression produces reasonably good outcomes, with R2 and SSIM values exceeding 88%. Furthermore, employing second-order polynomial regression leads to a slight but noticeable improvement in outcomes.
Among all the models, the best results tend to be delivered by DNN, with R2 and SSIM surpassing 90%, however, NN and random forest also perform well. The shallow NN and DNN yield comparable results, though the RMSPE values for the shallow NN are slightly higher than those for the DNN.
Figure 3 displays the plot of predicted autoconversion rates against the reference simulation autoconversion rates for the various machine learning models for this testing scenario, in order to illustrate further performance differences. We can see that second-order polynomial regression outperforms linear regression, whereas random forest produces results that are essentially identical to the NN and DNN models. We can also see that employing either of these strategies yields fairly decent results that can be applied to the atmospheric simulation model. Our findings also suggest that there is a strong association between the cloud effective radius or liquid cloud particle size and the autoconversion rates.
It is worth noting that the differences in performance among some models, especially RF, NN, and DNN, do not exhibit significance. This leads us to hypothesize that altering the architecture of the ML model using the same dataset may not significantly enhance performance. This hypothesis could be explored further by incorporating uncertainty estimation as a potential avenue for future research.
Ultimately, despite the fact that the DNN model produced slightly better results—albeit the difference not being significant—than competing models, we deem the NN model to be superior due to its ability to produce comparable outcomes with significantly fewer trainable weights. Therefore, we further showcase the NN performance on the ICON-LEM Germany and ICON-NWP Holuhraun, encompassing different types of settings.
Finally, given our ability to predict autoconversion rates from cloud effective radius with reasonable accuracy using linear regression (LR) and polynomial regression (PR) models, as shown in this scenario, we can also immediately write the respective input–output relationship in order to shed some light on the interplay between the autoconversion rates and cloud effective radius. This not only allows for simple calculations but also provides interpretability.
In order to make the calculations more accurate, we use the logarithmic base 10 form of the cloud effective radius. Furthermore, to standardize the data, we also use the logarithmic mean (the mean of the logarithm), $ {m}_{\mathrm{e}}^{\mathrm{log}} $ , and the logarithmic standard deviation (the standard deviation of the logarithm), $ {s}_{\mathrm{e}}^{\mathrm{log}} $ , of the cloud effective radius. We normalize $ {r}_{\mathrm{e}} $ (cloud effective radius) using the following equation:
$$ {\hat{r}}_{\mathrm{e}}^{\mathrm{log}}=\frac{\log_{10}\left({r}_e\right)-{m}_{\mathrm{e}}^{\mathrm{log}}}{s_{\mathrm{e}}^{\mathrm{log}}}. $$Following the application of the simple linear regression formulation y = $ a+ bx $ , we obtain the coefficients $ a $ and $ b $ after training the linear regression model, with $ x $ and $ y $ serving as the respective input and output. In our particular scenario, where the value of $ a $ is exceptionally small, approaching zero, we choose to exclude the coefficient $ a $ . Consequently, with the normalized logarithmic cloud effective radius $ {\hat{r}}_{\mathrm{e}}^{\mathrm{log}} $ , we can express the normalized logarithmic autoconversion rates in the form of linear regression as:
$$ {a}_u^{\mathrm{log}}=0.944\hskip0.1em {\hat{r}}_{\mathrm{e}}^{\mathrm{log}}. $$Similar to linear regression, following the second-order polynomial regression formulation $ \hskip2em y={a}_0+{a}_1x+{a}_2{x}^2 $ , we obtain the coefficients $ {a}_0 $ , $ {a}_1 $ , and $ {a}_2 $ after training the model. Consequently, we can also express the normalized logarithmic autoconversion rates using second-order polynomial regression in the following way:
$$ {a}_u^{\mathrm{log}}=0.085+1.009\hskip0.1em {\hat{r}}_{\mathrm{e}}^{\mathrm{log}}-0.085\hskip0.1em {\left({\hat{r}}_{\mathrm{e}}^{\mathrm{log}}\right)}^2. $$Finally, we can convert the normalized logarithmic autoconversion rates back to their original form by applying the following equation:
$$ {a}_u={10}^{1.396\hskip0.1em {a}_u^{\mathrm{log}}-9.814}. $$The units represented for the above equations use kg m−3 s−1 for the autoconversion rates and m for the cloud effective radius. In summary, we have outlined simple linear and polynomial regression models that can predict the autoconversion rates using cloud effective radius as the input variable. By utilizing logarithmic transformation and normalization techniques, we can obtain more accurate results.
-
Cloud-top ICON-LEM Germany: The autoconversion rate prediction metrics using the NN model for the cloud-top ICON-LEM Germany testing scenario are reported in Table 4 under testing set 2. Both R2 and SSIM exhibit values close to 90%, suggesting that one can potentially estimate the autoconversion rates using model-simulated satellite data accurately, by directly reusing the NN model without further adjustments, such as fine-tuning. This is significant because it alleviates the need to collect additional data and/or run additional time-consuming training processes.
-
Coarser Horizontal Resolution of Cloud-top ICON-LEM Germany: The autoconversion rate prediction metrics using the NN model for the cloud-top Germany with coarser resolution testing scenarios, 2.5 km and 10 km, are also reported in Table 4. The values of the prediction metrics for the coarser resolution scenarios closely resemble those of the higher resolution (1.0 km). Specifically, for the 2.5 km resolution, MAPE, RMSPE, and PSNR are slightly worse, but SSIM is slightly higher. In the case of the 10 km resolution, MAPE and RMSPE show lower performance compared to the other resolutions; however, the overall results remain quite good, with SSIM performing better. These findings suggest that our machine learning model is capable of working with different data resolutions. This is also significant because atmospheric simulation or satellite data can have very different spatial resolutions since the resolution depends on the simulation models used to generate the data or the satellite instruments used to acquire the data, therefore it is important that the model delivers accurate predictions in the presence of data with different spatial resolutions.
-
Cloud-top ICON-NWP Holuhraun: We also infer from Table 4 that our NN model predictive performance on the cloud-top ICON-NWP Holuhraun scenario is slightly inferior to that on the cloud-top ICON-LEM Germany scenarios, but it is still fairly decent. This performance decrease is comprehensible given that we trained our model exclusively using the ICON-LEM Germany data and not the ICON-NWP Holuhraun data. However, we conclude we can still use the model trained on the original ICON-LEM Germany data on the ICON-NWP Holuhraun data with reasonably good outcomes, without additional model retraining, demonstrating the model is reasonably robust in this out-of-distribution generalization scenario.
Finally, we visually display the predicted and reference simulation autoconversion rates on a 2D map (for each latitude and longitude) for the ICON-LEM Germany and ICON-NWP Holuhraun testing scenarios in Figures 4 and 5, respectively.
Figure 4a, 4b, and 4c demonstrate that the predicted autoconversion 2D maps closely resemble the reference simulation maps in the various ICON-LEM Germany testing scenarios (1 km, cloud-top 1 km, and cloud-top 2.5 km). Furthermore, Figures 5a, 5b, and 5c also demonstrate that the predicted autoconversion maps exhibit similar characteristics to the reference simulation maps in the ICON-NWP Holuhraun testing scenarios.
Upon observing the difference figures of each scenario depicted on the right-hand side of Figures 4 and 5, a discernible pattern emerges, where our model exhibits a slight tendency to over-predict, as indicated by the predominance of the blue color surpassing the red. Nonetheless, it becomes evident from these visualizations that the variance between the reference simulation and the model’s predictions remains relatively moderate, with the majority showcasing deviations of less than 10%. Though certain regions exhibit disparities exceeding 10%, the overall performance showcases a reasonable closeness to the actual values. It also reveals that the ICON-NWP Holuhraun testing results exhibit a slightly higher level of over-prediction compared to the ICON-LEM Germany ones. However, as mentioned before, it is comprehensible since we exclusively trained our model using the ICON-LEM Germany data and not the ICON-NWP Holuhraun data.
Overall, these results suggest that the NN model can deliver predictions that are close to the reference simulation data, retaining key spatial features as shown in Figures 4 and 5, in a variety of scenarios, including atmospheric simulation models and satellite-like data.
3.1.1. Comparison with another parameterization
We include a comparison of the autoconversion rates parameterization in our approach with the reference autoconversion rates from Seifert and Beheng [Reference Seifert and Beheng2006], as well as the autoconversion rates parameterization by Freud and Rosenfeld [Reference Freud and Rosenfeld2012]. We specifically compare with Freud and Rosenfeld [Reference Freud and Rosenfeld2012] because they also utilize cloud effective radius as an input to derive the parameterization of the mean collection kernel, which is then used to obtain the autoconversion rates. The comparison is conducted for both cloud-top Germany (2 May 2013 at 1:20 pm) and cloud-top Holuhraun (7 September 2014 at 8 am) and is illustrated in Figure 7. From the figure, it is evident that our approach closely aligns with Seifert and Beheng [Reference Seifert and Beheng2006] since we utilize their parameterization as our reference for autoconversion rates. Conversely, the study by Freud and Rosenfeld [Reference Freud and Rosenfeld2012] results in higher autoconversion rates.
3.2. Autoconversion on satellite observations (MODIS)
We also recall that in this scenario we have used the best machine learning model—identified in the previous section—to predict the autoconversion rates from real satellite MODIS data. In particular, we have chosen to use the NN model as argued earlier.
We compare the autoconversion rate predictions from MODIS satellite data to the cloud-top ICON simulation results relating to the Germany and Holuhraun scenarios.
Figure 6 demonstrates that the MODIS autoconversion rate predictions show statistical agreement with the cloud-top ICON-LEM autoconversion rates over Germany. Specifically, the mean, standard deviation, median, 25th and 75th percentiles of the autoconversion rates of the cloud-top ICON-LEM are relatively close to those of the MODIS over Germany. Minor discrepancies in the percentiles arise due to the cloud effective radius of the ICON-LEM model, which tends to yield slightly higher percentiles compared to MODIS. However, the overall agreement in the prediction of autoconversion rates between ICON-LEM and MODIS over Germany remains notably close.
In the case of Holuhraun, as illustrated in Figure 8, most of the autoconversion rates exhibit good agreement between ICON-NWP Holuhraun and MODIS. However, some discrepancies are observed, varying in degree across the rates. Notably, the discrepancies in autoconversion rates (on the right side of the images) between ICON-NWP Holuhraun and MODIS can be attributed to varying discrepancies in cloud effective radius (on the left side of the images) between ICON-NWP Holuhraun and MODIS. The extent of this variation relies on the magnitude of the difference between simulations (ICON) and satellite data (MODIS). Specifically, a higher discrepancy in cloud effective radius between ICON and MODIS corresponds to a higher discrepancy in autoconversion rates. However, this outcome aligns with our expectations. Since autoconversion is defined based on input values of cloud effective radius, the output will inherently mirror the characteristics of the input. This phenomenon is also observable in the probability density function of the cloud effective radius and the corresponding autoconversion rates at the cloud-top levels of ICON and MODIS over Germany and Holuhraun, as depicted in Figures 10 and 11. These plots showcase the output reflecting input characteristics, although not identical, demonstrating the inherent influence of input on the output.
As previously depicted in Figures 6 and 8, there are larger discrepancies in the mean and standard deviation between ICON and MODIS in Holuhraun compared to those observed in Germany, where the mean and standard deviation difference is relatively lower. A closer examination of specific dates in Holuhraun, particularly on 3 September 2014 and 6 September 2014, reveals substantially larger discrepancies compared to other days. A discernible trend emerges where these discrepancies predominantly occur in conditions where multi-layer clouds are detected, as depicted in Figure 9. From this figure, it can be seen that on days such as 2 May 2013, 1 September 2014, and 7 September 2014, where multi-layer clouds are less prevalent compared to 3 and 6 September 2014, there is better agreement in terms of statistical properties between ICON and MODIS.
In terms of bias, we acknowledge that every simulation and satellite retrieval inherently possesses its own biases compared to actual occurrences. Similarly, our findings also have limitations; the bias in predicting autoconversion rates directly from the satellite, heavily relies on the accuracy of cloud effective radius retrieval from the satellite itself. Particularly, larger discrepancies between cloud effective radius in simulations and satellites directly impact the results of autoconversion rates. This limitation, as demonstrated earlier, is notably prevalent in conditions when multi-layer clouds are dominantly detected. However, despite this constraint, our study suggests a reasonable estimation of autoconversion rates from satellite-derived cloud effective radius or liquid cloud particle size. These findings serve as a starting point for broader discussions and potential avenues for future research.
4. Concluding remarks
In this study, we provide a computationally effective solution to unravel the key process of precipitation formation for liquid clouds, the autoconversion process, using a machine learning approach. This process is critical to advance our understanding of the response of clouds to anthropogenic aerosols (Mülmenstädt et al. [Reference Mülmenstädt, Nam, Salzmann, Kretzschmar, L’Ecuyer, Lohmann, Ma, Myhre, Neubauer and Stier2020]).
Specifically, we demonstrate that one can reasonably predict the autoconversion rates from cloud effective radius, both from atmospheric simulation data or satellite data, using various machine learning techniques, from the most basic ones such as linear regression to more complex ones such as neural networks. Therefore, our findings confirm that cloud effective radius plays a significant role in determining autoconversion rates in warm phase clouds, confirming other existing results derived from a purely theoretical consideration and in-situ observations (e.g., Freud and Rosenfeld [Reference Freud and Rosenfeld2012]).
Our work also points to a few directions for future research. First, one could potentially aim to develop other machine learning models that predict autoconversion rates based on additional features beyond the cloud effective radius, such as the cloud optical thickness per layer. Our suggestion is not limited solely to cloud optical thickness per layer; it can also potentially apply to features such as cloud droplet number concentrations per layer and liquid water content per layer. However, it is important to note that currently, these variables per layer are not available in satellite data. Therefore, in order to pursue this approach, it would be necessary to first predict, for instance, the cloud optical thickness per layer and subsequently incorporate it as an additional feature in the prediction of autoconversion rates. Second, one could also potentially augment the machine learning models with uncertainty quantification capability, in order to deliver not only a point prediction of the variable of interest but also a confidence interval.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/eds.2024.24.
Acknowledgments
We sincerely thank and appreciate the editors and reviewers for their valuable feedback and constructive comments, which have significantly contributed to improving the quality of our manuscript. This work used resources of the Deutsches Klimarechenzentrum (DKRZ) granted by its Scientific Steering Committee (WLA) under project ID bb1143.
Author contribution
Conceptualization: all authors; Supervision: J.Q., M.R.; Methodology: all authors; Visualization: M.N.; Writing—original draft: M.N.; Writing—review and editing: all authors. All authors approved the final submitted draft.
Data availability statement
The preprocessed data used for training, validation, and testing of this study are available at https://doi.org/10.5281/zenodo.10523401. Additionally, the raw model output data used for the development of the research in the frame of this scientific article is available on request from tape archives at the DKRZ, which will remain accessible for 10 years. For satellite data, it can be downloaded from NASA website (https://ladsweb.modaps.eosdis.nasa.gov/search/).
Funding statement
This research receives funding from the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie grant agreement No 860100 (iMIRACLI).
Competing interest
The authors declare no competing interests exist.
Ethical statement
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.