Impact Statement
Nonintrusive model order reduction methods draw much attention in the industry for their easy implementation. However, insufficient accuracy and stability hinder broader applications. Large amounts of training data are needed to ensure their accuracy and stability. In this article, we propose an approach to actively collect the training data. The reduced order models built in such a way have not only much better performance than those built in conventional ways but also users can know their expected accuracy and confidence before deploying them. With little required user interaction, the proposed method offers a huge potential for industrial usage to create the so-called digital twin.
1. Introduction
Model order reduction (MOR) (Willcox and Peraire, Reference Willcox and Peraire2002; Antoulas, Reference Antoulas2005; Volkwein, Reference Volkwein2013), as a key component of the concept of digital twin (DT) (Hartmann et al., Reference Hartmann, Herz and Wever2018; Rasheed et al., Reference Rasheed, San and Kvamsdal2019), is increasingly attracting attention from different research fields. The traditional ways of reduced modeling, such as the Krylov subspace method (Heres, Reference Heres2005), reduced basis method (Haasdonk, Reference Haasdonk2017), proper orthogonal decomposition (POD) (Lu et al., Reference Lu, Jin, Chen, Yang, Hou, Zhang, Li and Fu2019), and the discrete empirical interpolation method (Chaturantabut and Sorensen, Reference Chaturantabut and Sorensen2010) require detailed knowledge of full order models (FOMs) and are known as intrusive reduction methods. The applicability of intrusive methods is limited in industrial engineering simulation and commercial finite element method (FEM) software. Recently, the development of so-called nonintrusive reduction has attracted much attention.
Nonintrusive MOR methods use data generated by FOMs to build surrogate models that can accurately reproduce the physical behavior encoded in the data. Currently, most data-driven MOR methods are based on machine learning (ML). For example, Feed-forward Neural Network (Regazzoni et al., Reference Regazzoni, Dedè and Quarteroni2019), operator inference (OpInf) (Peherstorfer and Willcox, Reference Peherstorfer and Willcox2016), Long-short-term-memory (LSTM) neural network (Mohan and Gaitonde, Reference Mohan and Gaitonde2018), recurrent neural network (RNN) (Kani and Elsheikh, Reference Kani and Elsheikh2017; Wang et al., Reference Wang, Ripamonti and Hesthaven2020; Wu and Noels, Reference Wu and Noels2022), deep learning (Fresca and Manzoni, Reference Fresca and Manzoni2022), sparse identification of nonlinear dynamics (SINDy) (Champion et al., Reference Champion, Lusch, Kutz and Brunton2019), and Runge–Kutta neural network (Zhuang et al., Reference Zhuang, Lorenzi, Bungartz and Hartmann2021) have been tested for reduced modeling.
One of the concerning points of data-driven MOR techniques is collecting enough training data, as computing the FOM to generate the training data is very time-consuming, especially when looking at large 3D simulation models used productively in the industry. Therefore, collecting the training data smartly becomes one of the focuses of the data-driven MOR techniques.
We orient our discussion by focusing on two aspects relevant to snapshot collection: data quality and data quantity. The data quality could describe many aspects, for example, the distribution of the data and the noise in the data. Unlike other applications, the data in the MOR field has fewer corruptions since it is usually obtained directly from the FOM solver, and not, for example, from potentially noisy sensors. Therefore, the main focus will be improving the training data distribution. There are some investigations (Batista et al., Reference Batista, Prati and Monard2004; Gyori et al., Reference Gyori, Palombo, Clark, Zhang and Alexander2022) discussing how the distribution of the data could influence the ML process itself. However, this issue is less prevalent within MOR research. To minimize the amount of training data, we employ the concept of active learning (AL). The concept of AL is widely used in the world of intrusive MOR (Bui-Thanh et al., Reference Bui-Thanh, Willcox and Ghattas2008; Haasdonk et al., Reference Haasdonk, Dihlmann and Ohlberger2011; Lappano et al., Reference Lappano, Naets, Desmet, Mundo and Nijman2016). However, the data-driven nonintrusive MOR technique faces extra challenges without intrusiveness. One challenge is finding a good stopping criteria. The stopping criteria are usually needed to validate the reduced order model (ROM) and compute the accuracy. However, currently, most validation (Przekop et al., Reference Przekop, Guo and Rizzi2012; Perez et al., Reference Perez, Wang and Mignolet2014) is test-case-dependent, which implies that the general performance of the ROM cannot be evaluated without bias. To address this problem, a load-independent metric is proposed for a particular mechanical problem in Kuether and Allen (Reference Kuether and Allen2016). The second difficulty is the absence of an error estimator, which can decide the data increment strategy. Such an error estimator is available in an analytical form for the intrusive MOR methods but lacking in the nonintrusive case.
In this article, we propose an alternative approach to tackle the problems mentioned above. To improve the snapshot distribution, we propose a new method for sampling the training data, and we call it joint space sampling (JSS). With this method, we collect the training data from a joint space consisting of an estimated reduced solution space and a parameter space. Besides, we propose to use a validator based on probably approximately correct (PAC) learning theory (Haussler, Reference Haussler1990) to decide when to stop the AL iteration. The validated ROMs are useful for deploying in real industrial use cases where complex time-dependent inputs can be expected. Furthermore, the error estimator to speed up the convergence is constructed in a data-driven way. Among different existing approaches (Paul-Dubois-Taine and Amsallem, Reference Paul-Dubois-Taine and Amsallem2015; Guo and Hesthaven, Reference Guo and Hesthaven2018; Xiao, Reference Xiao2019), we employ the Kriging interpolation (Krige, Reference Krige1951) introduced in Guo and Hesthaven (Reference Guo and Hesthaven2018) as the error estimator to decide the data increment strategy.
The whole article is structured as follows: in Section 2, MOR based on ML is introduced. In Section 3, the key components and the whole workflow of the new AL-based MOR algorithm are described in detail. Numerical tests used to validate this approach are presented in Section 4. Finally, the conclusions are presented in Section 5.
2. ML-Based MOR
The offline phase of ML-based MOR generally consists of two steps: constructing a low-dimensional (reduced) space and identifying the ROM in the reduced space by ML-based approach. In the online phase, the identified ML model will be used as a surrogate model to predict the evolution of the simulated dynamic system at a real-time speed.
2.1. Reduced space construction
Without loss of generality, we can assume the FOM is governed by the equation:
where $ \boldsymbol{y}\in {\mathrm{\mathbb{R}}}^N $ is the state of the FOM and $ N $ is normally very large. $ \boldsymbol{f} $ is the right-hand side (RHS) function configured by the parameter vector $ \boldsymbol{\mu} =\left[{\mu}_1\hskip0.5em {\mu}_2\hskip0.5em \dots \hskip0.5em {\mu}_{N_m}\right]\in {\mathrm{\mathbb{R}}}^{N_m} $ .
The first step of generating a ROM is to construct the reduced space by finding a low-dimensional space defined by a projection matrix $ \boldsymbol{V}\in {\mathrm{\mathbb{R}}}^{N\times n} $ , such that the reduced state $ {\boldsymbol{y}}_r\in {\mathrm{\mathbb{R}}}^n $ is given by projecting FOM state $ \boldsymbol{y} $ onto the reduced space
2.1.1. Proper orthogonal decomposition
POD is a widely used method to construct the reduced space. Plenty of introductory literature for POD method is available (e.g., Lu et al., Reference Lu, Jin, Chen, Yang, Hou, Zhang, Li and Fu2019). Here we will just briefly introduce it. The main ingredient of POD is a series of observations of the FOM states, which are usually called snapshots and take the form of
where $ {N}_s $ is the number of the available snapshots and $ {\boldsymbol{y}}^{(i)} $ is the $ i $ th column in the snapshot matrix $ \boldsymbol{Y} $ .
Applying singular value decomposition (SVD) to $ \boldsymbol{Y} $ yields
The left singular vectors $ \boldsymbol{V}\in {\mathrm{\mathbb{R}}}^{N\times k} $ can be used as the reduced space basis. This space will minimize the $ {L}^2 $ approximation error for the given snapshots. We can truncate $ \boldsymbol{V} $ at its $ {N}_r $ th column. By doing this, the size of the reduced space ( $ {N}_r $ ) can be chosen freely. A large space can retain more information in the original full space, but it will make the ROM run slower during the online phase.
2.1.2. Conventional snapshot sampling
To increase the diversity of the observed FOM states in the snapshots, usually a hypercubic parameter space is predefined as
where $ {\mu}_{i,\min } $ and $ {\mu}_{i,\max } $ is the upper and the lower limit of the $ i $ th component of the system parameter vector $ \boldsymbol{\mu} $ . The parameter vectors $ \boldsymbol{\mu} $ will be randomly selected from $ \mathcal{M} $ and different initial value problems (IVPs) can be constructed based on them. In this case, FOM’s states under different input parameters can be observed.
Next, we introduce two strategies for the sampling parameter values, static parameter sampling (SPS) and dynamic parameter sampling (DPS), respectively. For SPS, to construct $ m $ IVPs, $ m $ parameter vectors will be sampled from $ \mathcal{M} $ . For the $ i $ th IVP, we have
where $ {\boldsymbol{\mu}}^{(i)} $ is the $ i $ th parameter samples.
With DPS, to construct $ m $ IVPs where each IVP has $ K $ time steps, $ mK $ parameter vectors will be sampled from $ \mathcal{M} $ . Here, in this article, we use Hammersley Low Discrepancy Set (Hammersley and Handscomb, Reference Hammersley and Handscomb1964) to take samples. For the $ i $ th IVP, we have
where
In equation (8), $ \mathrm{Interp}\left(\cdot \right) $ represent a linear interpolation of the points $ {t}_j,{\mu}^{(ij)} $ . Let $ \delta t $ be the time step size, $ {t}_i={t}_0+ i\delta t $ . As we can see that DPS is $ K $ times more efficient in exploring the parameter space $ \mathcal{M} $ , which will make the constructed ROM perform better facing complex input parameters in its online phase.
2.1.2.1. Example
Here, we use an example to show the sample distributions of the SPS and DPS methods, respectively. We consider an IVP,
where $ t\in \left(0,1\right) $ . $ k(t) $ and $ a(t) $ are two input parameters of the system, and they are time-dependent. Their ranges are $ \left[\mathrm{0.1,1}\right] $ and $ \left[1,2\right] $ , respectively. We first use Hammersley Low Discrepancy Set to take samples in the parameter space $ \mathcal{M} $ . Then, we use the SPS and DPS methods to create the IVPs. By solving the IVPs, we obtain the solution snapshots. For each IVP, we collect 50 snapshots evenly on the time grid.
In Figures 1 and 2, we show the sample distributions in the parameter space and the solution space for the SPS and the DPS method, respectively. As observed, we can have a good sample distribution in the solution space with the SPS method. However, we have much fewer samples in the parameter space. In contrast, much more samples are taken in the parameter space with the DPS method. Nevertheless, the diversity of the sample system state is not optimal.
Through this toy example, we can see the advantage and disadvantage of the conventional sampling approaches for MOR. Later in this contribution, we propose a new sampling approach to ensure good sample distribution/diversity in both the parameter space and the solution space.
2.2. ROM identification
As introduced, the second step of ML-based MOR is model identification, which means finding the governing equation in the reduced space. The reduced governing equations are expected to have the form
where $ {\boldsymbol{f}}_{\boldsymbol{r}} $ is the low-dimensional representation of $ \boldsymbol{f} $ .
Different methods can be used to achieve this. In this article, artificial neural network (ANN) and OpInf will be briefly introduced.
2.2.1. Artificial neural network
The ROM we employ in this article is a dynamic model evolving on a discrete-time grid from $ {t}_0 $ to $ {t}_{\mathrm{end}} $ with time step $ \delta t $ :
where $ {\boldsymbol{y}}_{r,i} $ means $ {\boldsymbol{y}}_{\boldsymbol{r}}\left(t={t}_i\right) $ , and $ \boldsymbol{g}\left(\cdot \right) $ is the approximation to the unknown reduced RHS $ {\boldsymbol{f}}_{\boldsymbol{r}}\left(\cdot \right) $ .
To learn the mapping between states of two consecutive time steps, a multilayer perceptron (MLP) is used as $ \boldsymbol{g}\left(\cdot \right) $ in equation (11). In this case, the inputs to the neural network are the current state of the system $ {\boldsymbol{y}}_{r,i} $ and $ {\boldsymbol{\mu}}_i $ , and the predicted reduced state $ {\hat{\boldsymbol{y}}}_{r,i+1} $ is calculated as
This is similar to an Explicit Euler integrator, so we denote the MLP’s function as $ {\boldsymbol{g}}_{\mathrm{ee}}\left({\boldsymbol{y}}_{r,i},{\mu}_i\right) $ . We call such a network explicit Euler neural network (EENN) (Pan and Duraisamy, Reference Pan and Duraisamy2018; Regazzoni et al., Reference Regazzoni, Dedè and Quarteroni2019; Zhuang et al., Reference Zhuang, Lorenzi, Bungartz and Hartmann2021). From equation (12), we know that an EENN is essentially a variant of residual network (He et al., Reference He, Zhang, Ren and Sun2016) which learns the increment between two system states instead of learning to map the new state directly from the old state. This leads to a potential advantage of EENNs that we can use the deeper network for approximating more complex nonlinearity without suffering too much from network degeneration.
The algorithm for training such an EENN is provided in Algorithm 1. In this contribution, all MLPs consist of one input layer, two hidden layers, and one output layer. The activation function between each two consecutive layers is rectified linear unit (ReLU) function (Glorot et al., Reference Glorot, Bordes and Bengio2011). We use Pytorch (Paszke et al., Reference Paszke, Gross, Chintala, Chanan, Yang, DeVito, Lin, Desmaison, Antiga and Lerer2017) as the platform to build and train the networks. The learning rate decay and Early Stopping (Prechelt, Reference Prechelt1998) are applied to facilitate the training.
Algorithm 1. Training of EENN
Require: $ {\mathbf{y}}_{r,i},{\boldsymbol{\mu}}_i,{\mathbf{y}}_{r,i+1} $
1: while $ \mathrm{loss}>{\mathrm{loss}}_{\mathrm{tol}} $ do
2: $ {\hat{\boldsymbol{y}}}_{r,i+1}={\boldsymbol{y}}_{r,i}+\delta t{\boldsymbol{g}}_{\mathrm{ee}}\left({\boldsymbol{y}}_{r,i},{\boldsymbol{\mu}}_i\right) $
3: $ \mathrm{loss}=\mathrm{MSE}\left({\hat{\boldsymbol{y}}}_{r,i+1},{\boldsymbol{y}}_{r,i+1}\right) $
4: Use backpropagation to update the weights and biases of the network $ {\mathrm{g}}_{\mathrm{ee}} $
5: end while
2.2.2. Operator inference
Besides the purely data-driven ROM identification method introduced in Section 2.2.1, we briefly introduce a physics-informed ROM identification method, OpInf (Peherstorfer and Willcox, Reference Peherstorfer and Willcox2016).
The OpInf method uses a hypothetical form for the ROM’s governing equation:
where $ {\boldsymbol{F}}_{\boldsymbol{1}},{\boldsymbol{F}}_{\boldsymbol{2}},{\boldsymbol{F}}_{\boldsymbol{B}},{\boldsymbol{F}}_{\boldsymbol{C}} $ and $ {\boldsymbol{F}}_{\boldsymbol{N}} $ are operators to be inferred. The operation $ \otimes $ is the Kronecker product:
As we see, OpInf assumes a quadratic dependency of RHS on the system state. OpInf can also be applied to systems that are not quadratic, as long as they can be mapped to a quadratic form through variable transformations (see Qian et al., Reference Qian, Kramer, Peherstorfer and Willcox2020).
To infer the unknown operators, we first construct the matrices
where $ K $ is the number of time steps. Using the matrices and the assumed RHS, we can construct a least square problem,
where
By solving the least square problem (equation (16)), we get the matrix $ \boldsymbol{O} $ . We can unpack the matrix $ \boldsymbol{O} $ as
In practice, we solve the least square problem with regularization to produce a stable ROM, and for more details see McQuarrie et al. (Reference McQuarrie, Huang and Willcox2021).
3. AL for ML-Based MOR
As introduced in Section 2.1.2, for the conventional ML-based MOR approaches, snapshots are collected up-front. This could cause the problem that not all important information is captured in the prepared snapshots that are used to construct the ROM. We propose a solution based on the concept of the pool-based AL (Settles, Reference Settles2009). The solution mainly contains the following steps: (a) we first use the JSS method to prepare a large data pool containing diverse joint samples, (b) the current ROM is validated, and the validation result is used to select the samples that are considered most valuable for the current ROM, (c) new training data are generated by the FOM and added to the current training data, and (d) the ROM is updated. The corresponding steps in the whole workflow are sketched in Figure 3.
3.1. Joint space sampling
In this section, we will introduce the new sampling method JSS. The proposed method will be used to create the data pool and the validation samples, which are necessary preparation for the AL algorithm.
3.1.1. Estimating the reduced solution space
In order to take the joint samples, we need first to estimate a reduced solution space. With the predefined parameter space $ \mathcal{M} $ , the boundaries of the reduced solution space will be estimated as follows:
-
1. Use SPS to create $ m $ IVPs for the FOM. In this article, Hammersley Set (Hammersley and Handscomb, Reference Hammersley and Handscomb1964) is used to take samples in the parameter space. Each IVP is integrated from time $ {t}_0 $ to time $ {t}_{\mathrm{end}} $ with $ K $ time steps.
-
2. Solving all the FOM problems will produce a snapshot matrix $ {\boldsymbol{Y}}_{\mathrm{estimate}}=\left[\begin{array}{cccc}{\boldsymbol{y}}^{(1)}& {\boldsymbol{y}}^{(2)}& \dots & {\boldsymbol{y}}^{\left(m\times K\right)}\end{array}\right] $ containing $ m\times K $ FOM state vectors.
-
3. Apply POD on $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ to construct the reduced space.
-
4. Project $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ onto the reduced space to get its low-dimensional representation:
-
5. Find the minimum and the maximum entries for each POD component in $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ and denote them as:
where $ {N}_r $ is the size of the ROM.
-
6. The estimated space $ \mathcal{Y} $ is:
In the workflow, $ m $ can be selected as any positive integer. Since the reduced basis $ \boldsymbol{V} $ computed in the step will be used as the initial reduced basis later, the greater $ m $ is, the better the initial reduced basis will be. As explained in Section 2.1.2, we can have a relatively good observation in the solution space with the SPS method. Therefore, we use the SPS-sampled state to estimate the reduced solution space. Besides, one can also use minimal intrusiveness to the full-order problem to improve the estimate. Specifically, the parameter configurations that most likely drive the system to its extreme status (boundaries of the solution space) can be included in the samples.
A joint sample $ {\boldsymbol{J}}_{\mathrm{sample}} $ taken from the joint space $ \mathcal{J}=\mathcal{Y}\times \mathcal{M} $ has the form of $ {\boldsymbol{J}}_{\mathrm{sample}}=\left[\begin{array}{cc}{\boldsymbol{y}}_{r,\mathrm{sample}}& {\boldsymbol{\mu}}_{\mathrm{sample}}\end{array}\right] $ . We can use $ \boldsymbol{J} $ , $ \boldsymbol{V} $ and the FOM solver to perform a one-step simulation whose initial condition is $ {\boldsymbol{y}}_{r,\mathrm{sample}} $ and step size is $ \delta t $ . The simulation result will form a one-step trajectory in $ \mathcal{Y} $ . We can use such joint samples to prepare the data pool and the validation samples needed for the AL algorithm.
3.1.2. Loosen and trim the estimated solution space
The joint space $ \mathcal{J} $ as created in Section 3.1 is a hyper-cubic space (see example in Figure 4a). We will loosen and trim $ \mathcal{Y} $ to make it closer to the true reduced solution space. The first step is loosening. For this purpose, we introduce an expansion ratio $ \beta $ :
We use $ {\overset{\smile }{y}}_{r,\min}^{(i)} $ and $ {\overset{\smile }{y}}_{r,\max}^{(i)} $ as the lower and upper limits for the loosened space $ \overset{\smile }{\mathcal{Y}} $ . In our practice, $ \beta \in \left(0,1\right) $ can be selected to loosen the initial estimation. To trim the space, we apply two conditions to a reduced state $ {\boldsymbol{y}}_r $ sampled from $ \overset{\smile }{\mathcal{Y}} $ :
where $ \operatorname{MAX}\left(\cdot \right) $ and $ \operatorname{MIN}\left(\cdot \right) $ means the maximum and minimum entry of a vector/matrix. The value $ {y}_{\mathrm{min}} $ and $ {y}_{\mathrm{max}} $ can be either defined by empirical values of the physics quantity $ y $ or by $ \operatorname{MAX}\left({\boldsymbol{Y}}_{\mathrm{estimate}}\right) $ and $ \operatorname{MIN}\left({\boldsymbol{Y}}_{\mathrm{estimate}}\right) $ . In this article, $ \operatorname{MAX}\left({\boldsymbol{Y}}_{\mathrm{estimate}}\right) $ and $ \operatorname{MIN}\left({\boldsymbol{Y}}_{\mathrm{estimate}}\right) $ is used. We here denote the final feasible sampling space for reduced states as $ {\mathcal{Y}}^{\ast } $ , and the final joint space as $ {\mathcal{J}}^{\ast }={\mathcal{Y}}^{\ast}\times \mathcal{M} $ . A graphical example for joint space $ {\mathcal{J}}^{\ast } $ is given in Figure 4b.
Finally, we summarize how to collect $ {N}_s $ samples with the proposed JSS method in Algorithm 2.
Algorithm 2. Generation of a set of random joint samples
Require: $ \mathcal{M},{N}_s,m $ , FOM
1: Create a sample set $ M=\left\{{\boldsymbol{\mu}}_1,{\boldsymbol{\mu}}_2,\dots, {\boldsymbol{\mu}}_m\right\} $ in $ \mathcal{M} $
2: Perform SPS with $ m $ parameter samples and FOM to get $ {\boldsymbol{Y}}_{\mathrm{estimate}} $
3: Apply POD to $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ to get $ \boldsymbol{V} $ and $ {\boldsymbol{X}}_{\mathrm{estimate}} $
4: Find the upper and lower limits for each POD component
5: Create hyper-cubic solution space $ \mathcal{J} $
6: Loosen the space $ \mathcal{J} $ to get $ \overset{\smile }{\mathcal{J}} $
7: Determine $ {y}_{\mathrm{max}} $ and $ {y}_{\mathrm{min}} $
8: $ i=0,\boldsymbol{I}=\left[\right] $
9: while $ i<{N}_s $ do
10: Generate a random sample $ {\boldsymbol{y}}_{r,\mathrm{sample}} $ in $ \overset{\smile }{\mathcal{Y}} $
11: if $ \operatorname{MAX}\left({\boldsymbol{Vy}}_{r,\mathrm{sample}}\right)<{y}_{\mathrm{max}} $ and $ \operatorname{MIN}\left({\boldsymbol{Vy}}_{r,\mathrm{sample}}\right)>{y}_{\mathrm{min}} $ then
12: Generate a random sample $ {\mu}_{\mathrm{sample}} $ in $ \mathcal{M} $
13: end if
14: end while
15: return $ I $
3.2. ROM validation
Finding adequate metrics to validate a ROM is challenging. When the training and test data sets are collected up front, the ROM could be biased toward the data set and lead to a lack of generalization for the model. To make our data-driven ROM more reliable, we must design a more generalized validating procedure.
3.2.1. Accuracy
Here, we define the relative error of the ROM in a one-step prediction as $ e $ :
We furthermore define ROM error as $ e $ . However, the ROM is very likely to be used for a long-term prediction and such a one-step accuracy still cannot reflect the performance of the ROM. Therefore, we will introduce the confidence of the ROM.
3.2.2. Confidence
If the ROM error is $ \tau $ at confidence level $ \eta $ , we can construct such an inequality:
that is, the probability that the error $ e $ smaller than $ \tau $ is greater than $ \eta $ , where $ \eta $ and $ \tau $ is a pair of confidence and accuracy indicator that we are seeking.
We assume we have $ s $ independent tests in the validator. Then we can calculate the observed confidence $ p\left({\tau}^{\ast}\right) $ of a given ROM error $ {\tau}^{\ast } $ using:
However, we can only include a limited number of independent tests into the validator. This means, with a given $ {\tau}^{\ast } $ , the $ p\left({\tau}^{\ast}\right) $ computed from the validator should also have deviation $ \epsilon $ and its corresponding confidence $ 1-\sigma $ from the true confidence $ \hat{p}\left({\tau}^{\ast}\right) $ . Thus, we obtain another inequality:
where $ \hat{p}\left({\tau}^{\ast}\right) $ is the true confidence. If we predefine the deviation $ \epsilon $ and the confidence $ 1-\sigma $ of the validator, the smallest number of the independent tests needed by the validator can be computed using Chernoff’s inequality (Alippi, Reference Alippi2016):
Let $ {\overline{\tau}}^{\ast } $ be the smallest $ {\tau}^{\ast } $ satisfying:
Since Inequality 27 holds for all $ {\tau}^{\ast } $ , it also holds for $ {\overline{\tau}}^{\ast } $ :
That is the inequality:
holds with the probability $ 1-\sigma $ . We further get:
holds with probability $ 1-\sigma $ .
Finally, the ROM error $ \tau $ and its corresponding confidence $ \eta $ are therefore $ {\overline{\tau}}^{\ast } $ and $ 1-\epsilon $ , respectively. Since Inequality 32 holds with probability $ 1-\sigma $ and $ {S}_{\mathrm{PAC}} $ is negatively correlative to $ \sigma $ , we know that the more test samples we have, the more reliable the computed true confidence will be.
We denote such a validator as:
We can predefine a variable $ {\overline{\tau}}_{\mathrm{design}}^{\ast } $ and compute $ p\left({\overline{\tau}}_{\mathrm{design}}^{\ast}\right) $ in iterations for convergence check.
3.3. Error estimator
The ROM will go through all tests in the validator constructed as described in Section 3.2. We denote the ROM error snapshots as:
where $ {e}^{(i)} $ is $ e $ computed in the $ i $ th test. We assume there is a function:
Such a function can be used as the error estimator to estimate the ROM error with a new input $ {\boldsymbol{J}}^{\ast } $ which is excluded from the test inputs $ \boldsymbol{I}=\left\{{\boldsymbol{J}}^{(1)},{\boldsymbol{J}}^{(2)},..,{\boldsymbol{J}}^{(s)}\right\} $ . Here, we use GPR (Williams and Rasmussen, Reference Williams and Rasmussen1996) for interpolation, and the input of the interpolator is the joint input $ \boldsymbol{J} $ of the ROM and the output is the estimated error. In our problem, our training dataset is the test data in the PAC validator:
According to the theory of GPR, the predicted ROM error $ {\hat{e}}^{\ast } $ for the new joint input $ {\boldsymbol{J}}^{\ast } $ is:
where $ \boldsymbol{K} $ is a covariance matrix whose entries can be computed by a preselected covariance function $ \kappa \left({\boldsymbol{J}}^{\ast },{\boldsymbol{J}}_i\right) $ . Here we use radial-basis function (RBF), which is also called squared-exponential function or Gaussian kernel and is one of the most popular choices for the covariance function:
where the amplitude $ \sigma $ and the lengthscale $ l $ are the hyperparameters which can be tuned. RBF is appropriate when the simulated function is expected to be smooth.
Besides GPR, other interpolators or approximators, such as Spline Interpolation (Habermann and Kindermann, Reference Habermann and Kindermann2007), Radial-basis-function Interpolation (Broomhead and Lowe, Reference Broomhead and Lowe1988), and ANN can be used to construct the estimator.
3.4. AL algorithm
Until now, the necessary theory and components of the AL algorithm have been fully introduced. In this section, we summarize the workflow algorithmically.
First, we need to prepare the initial data pool $ {\boldsymbol{I}}_{\mathrm{all}} $ , the PAC validator containing $ {\boldsymbol{I}}_{\mathrm{PAC}} $ and $ {\boldsymbol{O}}_{\mathrm{PAC}} $ using Algorithm 2 and the FOM. With the data pool and the PAC validator on hands, the AL algorithm can be designed as Algorithm 3. Besides the algorithm, the flow chart of the proposed workflow is given in Figure 5.
Algorithm 3. AL for the ROM
Require: $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ , $ \boldsymbol{V} $ , $ {\overline{\tau}}_{\mathrm{design}}^{\ast } $ , $ \Delta {\overline{\tau}}_{\mathrm{tol}}^{\ast } $ , $ {\boldsymbol{I}}_{\mathrm{all}} $ , $ \mathrm{PAC}\left({\tau}^{\ast },\mathrm{ROM}\right) $ , $ {\boldsymbol{I}}_{\mathrm{PAC}} $ , $ {\boldsymbol{O}}_{\mathrm{PAC}} $
1: Define the snapshot increment $ \Delta s $
2: Randomly select $ \Delta s $ samples from $ {\boldsymbol{I}}_{\mathrm{all}} $ as the initial training input samples $ {\boldsymbol{I}}_{\mathrm{train}} $ and generate the training output samples $ {\boldsymbol{O}}_{\mathrm{train}} $ using the FOM solver
3: Train the ROM using $ {\boldsymbol{I}}_{\mathrm{train}} $ , $ {\boldsymbol{O}}_{\mathrm{train}} $ and $ \boldsymbol{V} $
4: $ p\left({\overline{\tau}}_{\mathrm{design}}^{\ast}\right)=\mathrm{PAC}\left({\overline{\tau}}_{\mathrm{design}}^{\ast };\mathrm{ROM}\right) $ and $ {\overline{\tau}}_{\mathrm{old}}^{\ast }={\overline{\tau}}^{\ast } $
5: Compute the error snapshots $ \boldsymbol{e} $ using $ {\boldsymbol{O}}_{\mathrm{PAC}} $
6: while $ p\left({\overline{\tau}}_{\mathrm{design}}^{\ast}\right)<1 $ do
7: Fit the GPR estimator $ \mathrm{\mathcal{E}} $ using $ \boldsymbol{e} $ and $ {\boldsymbol{I}}_{\mathrm{PAC}} $
8: Evaluate $ \mathrm{\mathcal{E}} $ on $ {\boldsymbol{I}}_{\mathrm{all}}\backslash {\boldsymbol{I}}_{\mathrm{train}} $ , denote the estimation as $ {\boldsymbol{e}}_{\mathrm{pool}} $
9: Find $ \Delta s $ largest $ {e}_i $ and their corresponding input samples $ \Delta \boldsymbol{I}=\left\{{\boldsymbol{J}}_{\mathrm{max}}^{(1)},{\boldsymbol{J}}_{\mathrm{max}}^{(2)},\dots, {\boldsymbol{J}}_{\mathrm{max}}^{\left(\Delta s\right)}\right\} $
10: $ {\boldsymbol{I}}_{\mathrm{train}}={\boldsymbol{I}}_{\mathrm{train}}\cup \Delta \boldsymbol{I} $
11: Update $ {\boldsymbol{O}}_{\mathrm{train}} $ using the enriched $ {\boldsymbol{I}}_{\mathrm{train}} $ and the FOM
12: Update $ \boldsymbol{V} $ using $ {\boldsymbol{Y}}_{\mathrm{estimate}}\cup {\boldsymbol{O}}_{\mathrm{train}} $
13: Train the ROM using $ {\boldsymbol{I}}_{\mathrm{train}} $ , $ {\boldsymbol{O}}_{\mathrm{train}} $ and $ \boldsymbol{V} $
14: $ p\left({\overline{\tau}}_{\mathrm{design}}^{\ast}\right)=\mathrm{PAC}\left({\overline{\tau}}_{\mathrm{design}}^{\ast };\mathrm{ROM}\right) $ and compute $ {\overline{\tau}}^{\ast } $
15: Update the error snapshots $ \boldsymbol{e} $
16: $ {\overline{\tau}}_{\mathrm{old}}^{\ast }={\overline{\tau}}^{\ast } $
17: if $ \mid {\overline{\tau}}^{\ast }-{\overline{\tau}}_{\mathrm{old}}^{\ast}\mid <\Delta {\overline{\tau}}_{\mathrm{tol}}^{\ast } $ then
18: Break
19: end if
20: end while
Regarding Algorithm 3, we would like to clarify that the snapshots in $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ are not used for the model identification step, only for the construction of the initial projection matrix $ \boldsymbol{V} $ . The neural network is therefore trained exclusively with one-step trajectories. We notice that the inclusion of $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ does not improve the ROM’s quality but considerably increases the training time for the neural network.
As seen in the algorithm, the reduced basis $ \boldsymbol{V} $ is recomputed during the iterations. In principle, this should influence the estimate of the reduced solution space and require rebuilding the PAC validator using the FOM solver. However, in practice, the initial basis $ \boldsymbol{V} $ produced by $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ should already have a relatively low projection error. The refined reduced basis only slightly changes the boundary of the reduced solution space. We can still use $ \mathcal{Y} $ as the space for sampling the reduced state.
In Algorithm 3, we see another convergence check $ \mid {\overline{\tau}}^{\ast }-{\overline{\tau}}_{\mathrm{old}}^{\ast}\mid <\Delta {\overline{\tau}}_{\mathrm{tol}}^{\ast } $ besides $ p\left({\overline{\tau}}_{\mathrm{design}}^{\ast}\right) $ . Additionally, we check the improvement for the best ROM accuracy between two iterations. If the difference between them is already smaller than the predefined value, the algorithm will be considered to be in a low-efficiency status and should be stopped. Since there are many factors influencing the ROM accuracy other than the amount of training data, for example, ROM structure, ROM dimension, and the way of collecting training data, other possible modification should be considered and adopted to improve the ROM accuracy.
4. Numerical Examples
In this section, three numerical models will be used to evaluate the performance of the proposed algorithm against the conventional ML-based MOR. We integrate ANN-MOR and OpInf-MOR into the proposed method. The gained improvement is shown respectively. In the rest of the article, we use the following notation:
-
• ML-ROM: the ROM is constructed in the conventional workflow of ML-based MOR. The training data are collected by the DPS method, and the ROM is constructed by either POD + ANN or POD + OpInf.
-
• AL-ROM: the ROM is constructed by the proposed AL algorithm in Algorithm 3, where a large data pool is firstly prepared by the JSS method, then the algorithm selectively generates the training data. The ROM is constructed by either POD + ANN or POD + OpInf.
4.1. 2-D linear thermal block
The governing equation of the test system is:
where $ x\in \left(0,10\right),y\in \left(-10,0\right),t\in \left(0,2\right] $ and the static parameters are given by $ {k}_x={k}_y=1 $ . The controllable parameters $ {T}_{\mathrm{left}},{T}_{\mathrm{right}},{T}_{\mathrm{down}} $ and $ {T}_{\mathrm{up}} $ ( $ {}^{\mathrm{o}}\mathrm{C} $ ) have all the same feasible range $ \left[20,1,000\right] $ . Thus, for this problem $ \mathcal{M}=\left[20,1,000\right]\times \left[20,1,000\right]\times \left[20,1,000\right]\times \left[20,1,000\right] $ . The full-order model is discretized by the finite difference method. The square region is divided into $ 50\times 50=2,500 $ cells. We consider 100-time steps uniformly distributed.
For the AL algorithm, we first need to estimate the reduced solution space. Thus, $ m=20 $ samples are taken from $ \mathcal{M} $ . The 20 constant-input IVPs are constructed with the sampled parameters. $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ has totally 2,020 state vectors. The initial reduced space is constructed by 20 POD basis using $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ . The low-dimensional projection of those state vectors is then used to perform the space estimate. After estimating, loosening with $ \beta =10\% $ and trimming, we create the joint space $ {\mathcal{J}}^{\ast }={\mathcal{Y}}^{\ast}\times \mathcal{M} $ and use random sampling to get 40,000 joint samples for $ {\boldsymbol{I}}_{\mathrm{all}} $ .
We define $ \varepsilon =3\% $ and $ \sigma =1\% $ for the validation step. Based on equation (28), we know 2,944 samples are needed by the PAC validator. The confidence level that will be investigated is $ {\eta}_{\mathrm{design}}=97\% $ . We pick $ {\overline{\tau}}_{\mathrm{design}}^{\ast }=1\% $ and $ \Delta {\overline{\tau}}_{\mathrm{tol}}^{\ast }=0.01\% $ . In conclusion, we are looking for a ROM whose $ 97\% $ -confident ROM error is $ 1\% $ .
4.1.1. AL for ANN-ROM
In this section, we present the investigation for enhancing the ANN-MOR with our methods. The ROM has an architecture of EENN, whose MLP has [24, 80, 80, 20] neurons.
As seen in Table 1, the AL algorithm does not converge to the prescribed accuracy after 10 iterations. However, it meets the other stopping criteria $ \Delta {\overline{\tau}}_{\mathrm{tol}}^{\ast}\le 0.01\% $ . As we described before, when $ \Delta {\overline{\tau}}_{\mathrm{tol}}^{\ast}\le 0.01\% $ , we consider that continuing the iteration will not refine the ROM efficiently.
Note. The ROM is constructed with POD + ANN.
For the conventional workflow, we use DPS to create 50 IVPs, and they produce 50 solution trajectories. A total of 101 outputs from each IVP can build 100 input–output samples, therefore, we totally have 5,000 training samples. To construct the POD space, besides the sampled 5,000 snapshots, we also include $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ .
Once the ROMs are built, we use another PAC validator to compute the $ 97\% $ -confident ROM error of the AL-ROM and ML-ROM. The results are given in Table 2.
Note. The ROMs are constructed with POD + ANN.
It is observed that with the same amount of training samples, the two performance indicators $ {\overline{\tau}}^{\ast } $ and $ p\left({\overline{\tau}}_{\mathrm{design}}^{\ast}\right) $ of the conventional ML ROM are both much worse than the ROM trained in the AL algorithm. Based on the PAC theory, the ROM constructed with the conventional approach is not expected to have a better performance than the AL-ROM.
To further validate the ROMs, we designed two additional test cases. In Test 1, four boundary temperatures are described by four different time-dependent functions, respectively. The function curves are given in Figure 6. We pick four different elements as the observation positions. In Figure 7a,b, we present the predicted trajectories by the compared ROMs. In the trajectory figures, it is observed that both ROMs’ predictions are very close to the reference. In Figure 8, we show the error fields at the final time step. To compute the error field, we first lift the reduced solution to the full space using $ \boldsymbol{V} $ and calculate the absolute error between the lifted solution and the FOM solution. We can see that the AL-ROM’s error is lower overall than ML-ROM’s.
In Test 2, the four boundary temperatures are assigned with the same constant value $ {T}_{\mathrm{left}}={T}_{\mathrm{right}}={T}_{\mathrm{down}}={T}_{\mathrm{up}}=1,{000}^{\mathrm{o}}\mathrm{C} $ . The trajectories are presented in Figure 9a,b. The error fields are shown in Figure 10. In Test 2, it is observed that the trajectories predicted by the conventional ML ROM have a large difference from the reference, while the AL-ROM’s prediction still matches the reference. If we check the error fields at the last time step, the AL-ROM’s advantage is even more remarkable. Here a possible reason can be explained in Figure 11.
In Figure 11, we present the time evolution of the first POD coefficient of $ {\boldsymbol{y}}_r $ . We create an IVP with the maximum inputs $ {T}_{\mathrm{left}}={T}_{\mathrm{right}}={T}_{\mathrm{down}}={T}_{\mathrm{up}}=1,000 $ . Similarly, another trajectory is produced with minimal input parameters $ {T}_{\mathrm{left}}={T}_{\mathrm{right}}={T}_{\mathrm{down}}={T}_{\mathrm{up}}=20 $ . These trajectories are marked with green solid lines in Figure 11. These two trajectories are called “extreme trajectories.” Since our full-order problem is continuous, it is reasonable to assume that any trajectory produced by any parameter combination in $ \mathcal{M} $ should be between these two green boundaries. Then we pick a parameter combination $ \left[\mathrm{510,510,510,510}\right] $ , which can be considered as the barycenter of $ \mathcal{M} $ , and the corresponding red-dashed trajectory is called “barycentric trajectory.” Additionally, we plot the trajectories of the DPS-samples in light blue color. It is observed that these trajectories form a thin band whose center is approximately the barycentric trajectory. This means that assigning randomized parameter combinations to the IVP does not produce a wide distribution of the solution trajectory. A result of this narrow distribution is that the ROM will perform badly beyond the barycentric trajectory. This is reflected in Test 2. In the AL workflow, different $ {\boldsymbol{y}}_r $ are randomly sampled in the estimated space $ {\mathcal{Y}}^{\ast } $ and used as the initial states for the one-step snapshots. This distributes the training samples in a wider region in the reduced solution space, and the ROM trained by such a dataset should also be more reliable.
4.1.2. AL for OpInf
In this section, we perform the same tests as in Section 4.1.1, except that we use OpInf instead of ANN. The convergence of the AL algorithm is shown in Table 3. The algorithm uses 150 training samples for convergence, which is remarkably fewer than those needed for constructing the ANN-ROM. Besides, according to Table 4, the OpInf-ROM constructed using DPS-sampled data also has a confident ROM error that is very close to the OpInf-ROM built by the AL algorithm.
Note. The ROM is constructed with POD + OpInf.
Note. The ROMs are constructed with POD + OpInf.
We show the same trajectory comparison for the OpInf-ROM. In Figures 12a,b and 14a,b, we can see that both ROMs have similar performance, even for Test 2 where the maximal input parameters are used in the online prediction. In the error fields presented in Figures 13 and 15, it is observed that the overall accuracy of the OpInf-ROM constructed by the AL algorithm is still better. But the performance of the OpInf-ROM constructed by the conventional approach is remarkably improved compared to the ANN-ROM, especially while predicting with the maximal parameters.
Through this experiment, we draw two conclusions. The first conclusion is that no matter with what kind of training data, building an OpInf-ROM with the same confident ROM error requires less training data compared to building an ANN-ROM. The second conclusion is that the OpInf-ROMs constructed in both ways (the proposed AL algorithm and the conventional approach) have similar qualities. This can be confirmed by both case-independent and case-dependent validation.
These conclusions match OpInf’s feature of being physics-informed. While designing the architecture of the OpInf model, we have a hypothetical form for the governing differential equation of the FOM. By employing the hypothetical form, we reduce the flexibility of the ROM, which enables us to use fewer data to fit the ROM. Besides, the physics information restricts the extrapolated prediction, which is why even using the DPS-sampled data, the OpInf-ROM can still have good accuracy in Test 2.
4.2. Nonlinear RC ladder
The second test model is a nonlinear RC circuit model. This benchmark model is provided by The MORwiki Community (2018). It’s different variants are widely used to test MOR methods, such as in Chen (Reference Chen1999), Rewienski (Reference Rewienski2003), Gu (Reference Gu2011), and Kawano and Scherpen (Reference Kawano and Scherpen2019). To obtain the MATLAB code, readers should refer to Model 1 in The MORwiki Community (2018).
The governing equation is the same as equation (32) in Regazzoni et al. (Reference Regazzoni, Dedè and Quarteroni2019):
We assume there are $ N=128 $ resistors in the circuit. Therefore, the size of the FOM is 128. The range of the input signal $ u(t) $ is $ \mathcal{M}=\left[0,1\right] $ . To prepare the training data, the system is integrate in the time span $ t\in \left(0,1\right] $ with forward Euler method, where the time step $ \delta t=2E-3 $ .
We create 5 IVPs with the SPS method to generate $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ and use $ {\boldsymbol{Y}}_{\mathrm{estimate}} $ to create the initial POD space with $ {N}_r=15 $ . While using the JSS method to prepare the data pool $ {I}_{\mathrm{all}} $ , we use $ \beta =0\% $ . There are 20,000 joint samples in $ {I}_{\mathrm{all}} $ .
We define $ \epsilon =3\% $ and $ \sigma =1\% $ and 2,944 additional samples are collected for the PAC validator. The investigated confidence level is $ {\eta}_{\mathrm{design}}=97\% $ .
4.2.1. AL for ANN
For both ANN-based ROMs, we use an MLP with [16, 64, 64, 15] neurons. The AL-ROM’s convergence is presented in Table 5. The AL algorithm converges while 700 training samples are used. According to the PAC scores in Table 6, the $ 97\% $ -confident ROM error of the AL-ROM is $ 1.00\% $ . The $ 97\% $ -confident ROM error of the ML-ROM is $ 3.07\% $ , which is lower than the AL-ROM. This result tells us that the AL-ROM is assessed to be generally more accurate.
Note. The ROM is constructed with POD + ANN. $ {\overline{\tau}}_{\mathrm{design}}^{\ast }=1\%. $
Note. The ROMs are constructed with POD + ANN.
Then we test the ROMs with a specific test case. In the first test case, the input signal of the system is $ \mu (t)=0.5\left(\cos \left(2\pi t\right)+1\right) $ , which is the same as the test input used in Regazzoni et al. (Reference Regazzoni, Dedè and Quarteroni2019). We monitor the variable $ {v}_0(t) $ of the system and present the comparison between the FOM and ROM solution in Figure 16.
4.2.2. AL for OpInf
The AL-ROM’s convergence is presented in Table 7. The AL algorithm converges while 50 training samples are used, which is much fewer than required by the ROM constructed with POD + ANN. In Table 8, the $ 97\% $ -confident ROM error of the AL-ROM and ML-ROM is $ 0.50 $ and $ 2.07\% $ , respectively. Based on the results, we expect the AL-ROM to perform generally better than the ML-ROM.
Note. The ROM is constructed with POD + OpInf. $ {\overline{\tau}}_{\mathrm{design}}^{\ast }=0.5\% $ .
Note. The ROMs are constructed with POD + OpInf.
The same test case is used to further validate the ROMs. The corresponding results are presented in Figure 17. A similar comparison between the AL-ROM and ML-ROM is observed. In the test case, the OpInf-ROM constructed by the AL algorithm has almost the same transient response as the FOM.
Through this numerical model, we see similar results to the linear model. To be specific, both types of the ROMs are benefited from the AL algorithm. Between them, the ANN-MOR method obtains more significant improvement as a purely data-driven method. The performance of the OpInf-ROM is also improved, but the difference is not as remarkable as the ANN-ROM.
4.3. 3-D vacuum furnace model
In Figure 18a, we present the 3-D simulation model of a vacuum radiation furnace (Hao et al., Reference Hao, Gu, Chen, Zhang and Zuo2008). The model was created and simulated using NX Simcenter 3D. The original FEM model has 20,209 elements. The dark gray part represents the outer case of the furnace. The red parts are heaters, for which there are six in total. The blue component between the heaters and the case is the protecting shell. The cubic area at the center is the workzone where material can be placed. Since the internal space of the furnace is a vacuum, during the industrial process, the material placed in the furnace will only be heated up by the thermal radiation from the heaters. The discrete governing equation of the system can be written as:
where $ \boldsymbol{E} $ is the thermal capacity matrix, $ \boldsymbol{K} $ is the thermal conductivity matrix, $ {\boldsymbol{A}}_{\mathrm{cv}}\left(\alpha \right) $ is the heat convection matrix and $ \alpha $ is the convection coefficient, $ \boldsymbol{R} $ is the radiation matrix, and $ \boldsymbol{B} $ is the parameter-distribution matrix. The state (temperature) vector of this system is $ \boldsymbol{T}\in {\mathrm{\mathbb{R}}}^N $ where $ N=\mathrm{20,209} $ . We consider there are seven controllable parameters in the system: six heating power of the heaters ( $ \mathrm{kW}/{\mathrm{m}}^3 $ ) and the convection coefficient ( $ \mathrm{W}\;{\mathrm{m}}^{-2}{\mathrm{K}}^{-1} $ ) of the outer surface of the case, which are stored in the parameter vector $ \boldsymbol{u} $ . Therefore, we have the vector $ \boldsymbol{u}\in {\mathrm{\mathbb{R}}}^{N_u} $ with $ {N}_u=7 $ . The parameter space is predefined to be:
Based on physical consideration, the time grid is from 0 to 45,000 s with 450 time steps.
For the AL algorithm, $ m=20 $ samples are taken from $ \mathcal{M} $ to estimate the boundaries of $ \mathcal{Y} $ . We also include two special parameter configurations:
The initial reduced space is constructed by $ {N}_r=22 $ POD basis. Then we create the joint space $ {\mathcal{J}}^{\ast }={\mathcal{Y}}^{\ast}\times \mathcal{M} $ , where $ \beta =10\% $ is used for loosening. We use random sampling to get 40,000 joint samples for $ {\boldsymbol{I}}_{\mathrm{all}} $ . We define $ \epsilon =3\% $ and $ \sigma =1\% $ and 2,944 additional samples are collected for the PAC validator. The investigated confidence level is $ {\eta}_{\mathrm{design}}=97\% $ . We pick $ {\overline{\tau}}_{\mathrm{design}}^{\ast }=1\% $ and $ \Delta {\overline{\tau}}_{\mathrm{tol}}^{\ast }=0.01\% $ . In conclusion, we are looking for a ROM whose $ 97\% $ -confident ROM error is $ 1\% $ .
4.3.1. AL for ANN
The AL-ROM’s convergence is presented in Table 9. We construct ROMs with our method (AL) and the conventional method (ML) with the same number of training samples (20,000). The ROM errors are $ 1.00\% $ (AL) and $ 13.07\% $ (ML) at the investigated confidence level ( $ 97\% $ ). Both ROMs have an MLP with [29, 80, 80, 22] neurons (Table 10).
Note. The ROM is constructed with POD + ANN. $ {\overline{\tau}}_{\mathrm{design}}^{\ast }=1\%. $
Note. The ROMs are constructed with POD + ANN.
Two test scenarios are employed to show the ROMs’ performance in relatively realistic conditions. In the first test, all heaters are working with a 3-stage heating profile (Figure 19) and the convection coefficient is set to be $ 50\;\mathrm{W}\;{\mathrm{m}}^{-2}{\mathrm{K}}^{-1}\Big) $ .
In this test case, we monitor the temperature trajectories on a heater and in the workzone.
To show the error for the whole temperature field, we compute the absolute errors for all 20,209 elemental temperature individually. Four statistical measures of the absolute errors, that is, mean, standard deviation (std.), minimum and maximum, are used to reflect the error distribution at the final time point. The results are shown in Table 11.
Note. The ROMs are constructed with POD + ANN.
In another test case, we use the same heating profile for four heaters and turn off two heaters (Figure 18b). This simulates a scenario where two heaters have failed during the industrial process. The existence of failed heaters will cause a nonuniform temperature field in the workzone. We call them cold side and hot side, respectively. In this test, we monitor the temperature trajectories of the cold side and the hot side using the ROMs. The corresponding comparison between ROM-predicted trajectories and FOM-predicted trajectories in two test cases is given in Figure 21 and Table 12.
Note. The ROMs are constructed with POD + ANN.
As observed in both trajectory comparison in Figures 20 and 21, the ROM constructed by the AL algorithm has a better agreement with the reference result. Moreover, in Tables 11 and 12, all statistical measurement of the AL-ROM’s error fields are better than the ML-ROM’s. These facts tell us the proposed method significantly improves the performance of the ANN-ROM in this nonlinear thermal system.
4.3.2. AL for OpInf
In this section, the results of constructing the OpInf-ROM with the AL algorithm and the conventional approach are given. The confident ROM error of the OpInf-ROM successfully converges to $ 1.00\% $ after five iterations. The OpInf-ROM produced by the conventional approach has a confident ROM error of $ 7.54\% $ using the same amount of training data. This confident ROM error is lower than the confident ROM error of the ANN-ROM produced in the conventional way (Figures 22 and 23 and Tables 13–16).
Note. The ROM is constructed with POD + OpInf. $ {\overline{\tau}}_{\mathrm{design}}^{\ast }=1\% $ .
Note. The ROMs are constructed with POD + OpInf.
Note. The ROMs are constructed with POD + OpInf.
Note. The ROMs are constructed with POD + OpInf.
Similar to the results in Section 4.1.2, in the case-dependent validation, we can only see limited improvement by employing the AL algorithm for the OpInf method. Besides the reasons mentioned in Section 4.1.2, we also do not expect to have an extremely high temperature in the use cases of the case-dependent validation for the vacuum furnace model. This means that the temperatures of the system under the validation inputs are still considered to be located in the sampled region of the DPS method. In this case, a ROM trained with the DPS-sampled data can still predict with high accuracy.
5. Summary and Conclusions
In this work, we propose an Active-Learning-based approach to support machine-learning-based (ML-based) MOR methods. It can be applied to the reduction of any problem of the form,
where $ \boldsymbol{y}\in {\mathrm{\mathbb{R}}}^N $ is the state vector and the parameter vector $ \boldsymbol{\mu} \in {\mathrm{\mathbb{R}}}^{N_m} $ is time-dependent. It can also be applied to a wide variety of ML-based MOR methods including purely data-driven methods such as Deep Neural Network in Fresca and Manzoni (Reference Fresca and Manzoni2022), Gaussian process regression in Guo and Hesthaven (Reference Guo and Hesthaven2018)), Dynamic Mode Decomposition (Erichson et al., Reference Erichson, Mathelin, Kutz and Brunton2019), Residual Network in San et al. (Reference San, Maulik and Ahmed2019), as well as physics-informed methods such as OpInf (Peherstorfer and Willcox, Reference Peherstorfer and Willcox2016), physics-informed neural networks (PINNs) (Kim et al., Reference Kim, Choi, Widemann and Zohdi2022) or SINDy (Champion et al., Reference Champion, Lusch, Kutz and Brunton2019). The approach cannot however be used with recurrent ML models such as those in Kani and Elsheikh (Reference Kani and Elsheikh2017), Mohan and Gaitonde (Reference Mohan and Gaitonde2018), Maulik et al. (Reference Maulik, Mohan, Lusch, Madireddy, Balaprakash and Livescu2020), Wang et al. (Reference Wang, Ripamonti and Hesthaven2020), and Wu and Noels (Reference Wu and Noels2022).
One key ingredient is the generation of solution snapshots through the (active) sampling of a joint space consisting of a predefined parameter space and an estimated reduced solution space. This leads to a better coverage of these two spaces, which is otherwise difficult to achieve with simpler strategies. In addition, we present a case-independent validator constructed based on the PAC learning theory. Through numerical experiments, we find that our approach significantly improves the performance of purely data-driven MOR methods and that the PAC validator presents a good indicator of the ROM quality. However, the benefits of using these strategies are less clear when applied to the physics-informed method OpInf. We expect this to be also the case for other methods in this category, such as PINNs (Kim et al., Reference Kim, Choi, Widemann and Zohdi2022) or SINDy (Champion et al., Reference Champion, Lusch, Kutz and Brunton2019). By exploiting knowledge of the physical system, such methods are better at constructing ROMs that can generalize beyond the training data. Therefore the benefit of our improved sampling strategy is less marked.
Acknowledgments
We are grateful to Mr. Hicham Elmrini from Maya HTT for his guidance with the simulation software.
Author Contributions
Conceptualization: Q.Z., D.H., H.J.B., J.M.L.; Data curation: Q.Z.; Formal analysis: Q.Z., D.H., H.J.B., J.M.L.; Investigation: Q.Z.; Methodology: Q.Z., D.H., H.J.B., J.M.L.; Software: Q.Z.; Supervision: D.H., H.J.B., J.M.L.; Validation: Q.Z.; Visualization: Q.Z; Writing—original draft: Q.Z.; Writing—review and editing: D.H., H.J.B., J.M.L.
Competing Interests
The authors declare no competing interests exist.
Data Availability Statement
The data and the source code necessary for reproducing the results in Sections 4.1 and 4.2 will be made available under https://github.com/qinyu-zhuang/AL-MOR-paper 6 months after publication, to comply with internal policies for the publication of this information.
Ethics Statement
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Funding Statement
This work received no specific grant from any funding agency, commercial or not-for-profit sectors.
Comments
No Comments have been published for this article.