Impact Statement
We demonstrate the applicability of the deep operator network (DeepONet) to identify the parameters of a stochastically forced van der Pol oscillator. The DeepONet is used as a surrogate for the first and second finite-time Kramers–Moyal coefficients, offering computational efficiency two orders of magnitude higher than the traditional finite-difference method. The efficiency of DeepONet allows for a Bayesian inference routine at low computational cost.
1. Introduction
Dynamical systems that exhibit limit cycle occur frequently in nature and technological applications. Limit cycles are observed in biology (Nomura et al., Reference Nomura, Sato, Doi, Segundo and Stiber1994; Rompala et al., Reference Rompala, Rand and Howland2007), quantum mechanical system (Yulin et al., Reference Yulin, Sedov, Kavokin and Shelykh2024), flow physics (Boujo and Cadot, Reference Boujo and Cadot2019; Bourquard et al., Reference Bourquard, Faure-Beaulieu and Noiray2021), aerodynamic fluttering (Hoskoti et al., Reference Hoskoti, Misra and Sucheendran2020), thermoacoustic oscillation in a sequential combustors (Dharmaputra et al., Reference Dharmaputra, Shcherbanev, Schuermans and Noiray2023), and so forth. Often, one can model the dynamics with a nonlinear harmonic oscillator. A natural choice to model such behavior is the Van der Pol (VdP) oscillator, which incorporates nonlinearity in the damping term.
When the parameters of the oscillator are known, simulating the system can be done in a straightforward manner. However, identifying these parameters from the measurement data is a rather challenging task. Identifying the parameters is beneficial for understanding the system itself, as well as for control applications. In many situations, classical system identification methods that involve applying an external input forcing to the system are not feasible. However, when an inherent stochastic forcing is present in the system, for instance the effect of flow turbulence in flutter phenomena, it can be exploited for parameter identification. In this work, we focus on identifying the parameters of the VdP oscillator subject to intense additive stochastic forcing. Such forcing enables the observer to analyze the system dynamics as it is continuously and randomly perturbed from its deterministic attractor, providing valuable insights about its nonlinear characteristics.
Various strategies to identify the unknown parameters of a stochastically forced harmonic oscillator from the experimental data are presented in Noiray and Schuermans (Reference Noiray and Schuermans2013). In the case where the system is linearly stable on the fixed-point branch, the linear damping rate $ \nu $ can be directly obtained by fitting a Lorentzian curve on the resonance peak of the measured power spectrum. In this work, we focus on the more interesting case where the system is operating on a limit cycle and nonlinear effects come into play. The system is modeled as a stochastically forced VdP oscillator with a small saturation constant (see Figure 1 for a sample time series). When the noise intensity level is low, one can infer the parameters by fitting the power spectral density of the amplitude fluctuations. A more general method uses the transitional statistics of the stationary acoustic pressure envelope to identify the linear growth rate $ \nu $ , the noise intensity $ \Gamma /2{\omega}^2 $ , and the saturation constant $ \kappa $ . It employs a careful extrapolation to estimate the drift and diffusion coefficients of the Fokker–Planck (FP) equation from the transition moments of the signal envelope. The method along with the postulated VdP oscillator as the underlying model is validated against experimental measurements in Noiray and Denisov (Reference Noiray and Denisov2017) for a thermoacoustic instability in combustor.
A subsequent improvement where no extrapolation procedure needs to be involved is proposed in Boujo and Noiray (Reference Boujo and Noiray2017), and the method has been applied in Boujo et al. (Reference Boujo, Bourquard, Xiong and Noiray2020) for the whistling of a beer bottle and in Lee, Kim et al. (Reference Lee, Kim, Lee, Kim and Yi2023) for Hall thruster discharge instability. The proposed method is more robust to the finite-time effects that occur due to finite sampling rate, non-delta-correlated noise, and band-pass filtering necessary to isolate the mode of interest. The method relies on the fact that finite-time Kramers–Moyal (KM) coefficients can be obtained directly and exactly (up to numerical accuracy) by solving the adjoint FP (AFP) equation with well-defined initial condition (Honisch and Friedrich, Reference Honisch and Friedrich2011; Lade, Reference Lade2009). The equation is discretized by means of the finite-difference (FD) method. An initial guess of the parameters is given and then the equation is solved iteratively until the combination of the parameters gives an acceptable error between the finite-time KM coefficients from the FD solution and from the post-processed time signal data. The authors (Boujo and Noiray, Reference Boujo and Noiray2017) use the Nelder–Mead simplex algorithm (direct search method) to solve the optimization problem. A recent work in Lee, Kim, and Park (Reference Lee, Kim and Park2023) proposed the use of a gradient-based optimizer using FD to approximate the required gradient.
We aim to extend this AFP-based parameter identification framework by using the deep operator network (DeepONet) proposed in Lu, Jin et al. (Reference Lu, Jin, Pang, Zhang and Karniadakis2021). It is a scientific machine learning framework capable of mapping a function to another function. Contrary to the widely known physics-informed neural networks (Raissi et al., Reference Raissi, Perdikaris and Karniadakis2019) that act as the approximator to the solution of a PDE, DeepONet does not require retraining every time a parameter of the PDE is changed. This is because in addition to taking the coordinates of the output function as input, DeepONet also takes an input function that may characterize the boundary condition or the coefficients that parameterize the PDE. As a result, DeepONet has demonstrated successful efficiency improvements in various scientific and engineering applications (Lin et al., Reference Lin, Maxey, Li and Karniadakis2021; Lu et al., Reference Lu, Pestourie, Johnson and Romano2022; Mao et al., Reference Mao, Lu, Marxen, Zaki and Karniadakis2021; Yin et al., Reference Yin, Zhang, Yu and Karniadakis2022). In particular, Lu et al. (Reference Lu, Pestourie, Johnson and Romano2022) combined DeepONet with either a genetic algorithm or gradient-based optimizer to solve an inverse problem in nanoscale heat transport very efficiently.
In this work, we employ DeepONet as a surrogate for the operator that maps the parameters of the AFP equation to the first and second finite-time KM coefficients. The use of DeepONet as the forward simulator replaces multiple calls to the FD solver and avoids intermediate calculation of the whole solution field of the AFP equation. Once trained, DeepONet can run a forward simulation in a fraction of a second. Furthermore, owing to the automatic differentiation framework, exact gradient information is available such that gradient-based optimizers can be used to further accelerate the parameter identification process. On top of that, the extremely fast forward pass using DeepONet allows for a Markov Chain Monte Carlo (MCMC) algorithm to be performed where thousands of samples are generated to perform Bayesian inference. In this way, the uncertainties of the inferred parameters could be obtained.
This article is organized as follows. Section 2 reviews the stochastically forced harmonic oscillator and the associated FP formulations. Section 3 describes the working principle of DeepONet followed by the corresponding parameter identification framework in Section 4. Section 5 describes the Bayesian inference procedure using the gradient-based adaptive MCMC algorithm. Several results from the numerical experiments are then presented in Section 6 where the proposed methodology is evaluated to identify the parameters of synthetic and experimental time series. Finally, we conclude the findings of this article in Section 7.
2. Background theory
2.1. Harmonic oscillator
Let us start with the general stochastically forced harmonic oscillator with $ \eta $ as the state variable,
where $ \xi (t) $ is a white noise with intensity $ \Gamma $ . By applying Hilbert transform on $ \eta (t) $ and taking the real part, that is, $ \eta (t)=\mathcal{\Re}\left(A(t){e}^{i\omega t+\phi (t)}\right) $ , the following relations hold:
$ A(t) $ and $ \phi (t) $ are assumed to vary slower than the $ \cos \left(\omega t\right) $ term (see Figure 1). Consequently, $ \frac{d^2\eta }{dt^2} $ in the left-hand side of Eq. (2.1) can be expressed as
Let $ \Phi =\omega t+\phi $ , by employing the trigonometric identity: $ {\sin}^2\left(\Phi \right)+{\cos}^2\left(\Phi \right)=1 $ , Eq. (2.1) can then be rewritten as
Using the orthogonality between $ \sin \left(\Phi \right) $ and $ \cos \left(\Phi \right) $ , and grouping terms that depend on $ \sin \left(\Phi \right) $ and $ \cos \left(\Phi \right) $ separately, the first-order equations for $ A(t) $ and $ \phi (t) $ can be retrieved and expressed as
We now employ the nonlinearities associated with the VdP oscillator,
It is worth mentioning that the method, which is validated in this article with this type of nonlinearity, can be applied to more complex saturable nonlinear damping.
Focusing on the dynamics of $ A(t) $ , we apply deterministic averaging to the $ Ff $ term in Eq. (2.6), that is,
Because $ F $ and $ G $ both depend on the dynamical variables $ A $ and $ \Phi $ , an additional “drift,” μ, term appears in the averaged dynamics. This term can be obtained by performing stochastic averaging, as described in Roberts and Spanos (Reference Roberts and Spanos1986) and Stratonovich (Reference Stratonovich1963). The expression for $ \mu $ is as follows:
The noise term $ \xi (t) $ is also averaged, which then induces a new noise variance $ {\sigma}^2 $ ,
Finally, the Langevin equation describing the dynamics of the amplitude can be cast to
where $ \zeta $ is a delta-correlated white noise satisfying $ \left\langle \zeta (t)\zeta \left(t+\tau \right)\right\rangle =\frac{\Gamma}{2{\omega}^2}\hskip0.1em \delta \left(\tau \right) $ such that the Markov property holds for the amplitude process $ A(t) $ .
2.2. The FP equation
The dynamics of the amplitude (Eq. 2.13) can be equivalently described by the evolution of the corresponding probability density function (PDF) governed by the FP equation. The connection is provided through the KM coefficients defined as
Inserting the diffusion process (Eq. 2.13) into the above equation results in
and $ {D}^{(4)}=0 $ . Consequently, according to the Pawula theorem (Pawula, Reference Pawula1967), all other coefficients $ {D}^{(n)} $ with $ n\ge 3 $ to vanish. The evolution of the PDF of the acoustic pressure envelope $ P\left(a,t\right) $ is then exactly described by the following FP equation (Risken, Reference Risken1996),
where the FP operator is
A stationary PDF can be obtained as the long time solution of the above equation, that is,
where $ \mathcal{Z} $ is a normalization coefficient.
2.3. The AFP equation
Parametric system identification can be performed by matching the statistics of the time signal with the transition PDF $ P\left({a}^{\prime },t+\tau\;|\hskip0.1em a,t\right) $ calculated by solving the FP equation (Eq. 2.17). However, this procedure may invoke numerical issues since it involves a Dirac delta initial condition (Honisch and Friedrich, Reference Honisch and Friedrich2011).
The work of Lade (Reference Lade2009) provides an elegant alternative procedure based on the AFP formalism. Starting with the finite-time KM coefficient,
the idea is to interpret the transition PDF as the solution of the Fokker Planck equation with initial condition $ \delta \left({a}^{\prime }-a\right) $ ,
where $ {\hat{L}}^{\dagger } $ is the AFP operator,
Equation (2.21) implies that the finite-time KM coefficients $ {D}_{\tau}^{(n)} $ can be obtained by solving the AFP equation,
with initial condition
and evaluating at $ t=\tau $ and $ {a}^{\prime }=a $ ,
Therefore, given a pressure signal envelope, we can perform parameter identification by minimizing the difference between the finite-time coefficients, $ {D}_{\tau}^{(n)} $ , obtained from the solution of the AFP equation, and the finite-time coefficients obtained by estimating the conditional average of the time signal using Eq. (2.20). Compared to the extrapolation-based procedure for parameter identification (Noiray and Denisov, Reference Noiray and Denisov2017; Siegert et al., Reference Siegert, Friedrich and Peinke1998), the AFP-based approach is robust to error due to the finite-time effects since it does not involve taking the limit $ \tau \to 0 $ (see Boujo and Noiray (Reference Boujo and Noiray2017) and Honisch and Friedrich (Reference Honisch and Friedrich2011) for elaborations).
The solution of the AFP equation can be obtained by employing a numerical time-stepping scheme and using FD to approximate the spatial derivatives. We illustrate in Figure 2 the simulation results of the AFP equation and how the finite-time KM coefficients can be extracted from the solution. It is worth highlighting that for a particular value of the amplitude $ A=a $ , we have a distinct initial condition $ {\left({a}^{\prime }-a\right)}^n $ for the AFP simulation over the $ \left({a}^{\prime },t\right) $ domain to extract the nth finite-time KM coefficient. For fixed parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ , the simulations must be performed $ 2{N}_a $ times, for both KM coefficients and for all amplitudes present in the signal. We then only extract the solution along the line $ {a}^{\prime }=a $ to calculate the finite-time KM coefficients. In the next section, we will present a DeepONet that can directly map the input parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ to the desired outputs $ {D}_{\tau}^{(1)}\left(a,\tau \right) $ and $ {D}_{\tau}^{(2)}\left(a,\tau \right) $ without the need to calculate the entire solution field of the AFP equation multiple times.
3. DeepONet
While it is well known that neural networks with as few as a single hidden layer are universal function approximators, there is a similar theorem that proves that neural networks are universal operator approximators (Chen and Chen, Reference Chen and Chen1995). Inspired by the structure given in the universal approximation theorem for nonlinear operators, Lu, Jin et al., Reference Lu, Jin, Pang, Zhang and Karniadakis2021, propose a neural network architecture termed as the DeepONet that maps a function to another function. In this work, we propose a DeepONet architecture (Figure 3) suited for efficient parameter identification based on the AFP formulation presented in the previous section.
The proposed operator network is designed as a surrogate model for the operator $ \mathcal{G} $ that maps the first KM coefficient, $ {D}^{(1)}(a) $ , to the finite-time KM coefficients, $ {D}_{\tau}^{(n)}\left(a,\tau \right) $ , that is,
The motivation behind this architecture originates from the formulations given in Eqs. (2.22)–(2.25), where finite-time KM coefficients can be extracted from the AFP solution given the drift and diffusion coefficients. Since we only consider additive noise, that is, a constant diffusion coefficient $ {D}^{(2)} $ , we can simply take the drift coefficient, $ {D}^{(1)}(a) $ , to represent the input. As can be seen in Eq. (2.15), the function $ {D}^{(1)}(a) $ is completely characterized by the parameters $ \nu, \kappa $ , and $ {D}^{(2)} $ . Therefore, once the network is trained, given a set of parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ , we can directly predict $ {D}_{\tau}^{(1)} $ and $ {D}_{\tau}^{(2)} $ at specified values of $ \left(a,\tau \right) $ within a fraction of a second avoiding multiple simulations of the AFP equation.
The DeepONet architecture presented in Figure 3 consists of a branch network and two trunk networks. The trunk networks serve to encode the domain $ \left(a,\tau \right) $ of the output functions and generate basis functions. The first trunk network provides basis functions for $ {D}_{\tau}^{(1)} $ and the second trunk network provides basis functions for $ {D}_{\tau}^{(2)} $ . On the other hand, the branch network takes discrete representations of the input function and processes them to generate coefficients for the basis functions provided by the trunk networks. The output functions are then obtained by taking the inner product between the output of the branch and trunk networks as follows,
where $ {D}_{\tau, \theta}^{(n)} $ is the DeepONet prediction of the nth finite-time KM coefficient; $ {b}_k^{(n)} $ and $ {t}_k^{(n)} $ are the outputs of the branch and trunk networks, respectively; $ {b}_0^{(n)} $ is a bias term; and $ p $ is the total number of neurons in the output layer of a trunk network.
In this work, we employ fully connected neural network architecture for the branch and trunk networks. We use the hyperbolic tangent activation function and Glorot normal initialization. All branch and trunk networks have 5 hidden layers with 128 number of neurons per layer. As we have to work with discrete objects, we evaluate the input of the branch network, $ {D}^{(1)}(a) $ , at $ 21 $ fixed locations $ a=\left(0.5,0.6,\dots, \mathrm{2.5}\right) $ . Each trunk network takes 2 values (the coordinate $ \left(a,\tau \right) $ ) as input and outputs a layer containing 128 neurons. The branch network outputs a layer with 128 neurons, which is then connected via a fully connected linear connection to a second output layer, also consisting of 128 neurons. This configuration is intended to provide enhanced expressivity for predicting the second finite-time KM coefficients (see Figure 5). We base our implementation on the scientific machine learning library DeepXDE (Lu, Meng et al., Reference Lu, Meng, Mao and Karniadakis2021) with PyTorch backend.
3.1. Training
We train the network by providing pairs of input output functions $ \left[{D}^{(1)};{D}_{\tau}^{(1)},{D}_{\tau}^{(2)}\right] $ as training data. For a given set of parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ , we can perform FD simulations of the AFP equation to obtain the finite-time KM coefficients $ {D}_{\tau}^{(1)}\left(a,\tau \right) $ and $ {D}_{\tau}^{(2)}\left(a,\tau \right) $ . We record the solutions at several values of $ a $ that occur with probability larger than 0.1% according to the theoretical stationary PDF (Eq. 2.19) with the spacing $ da=0.05 $ and we select 50 values of $ \tau $ uniformly distributed in $ \left[\mathrm{0.01,0.5}\right] $ . The training data comprise input and output samples from 2900 different sets of $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ . The values of $ \nu, \kappa $ , and $ {D}^{(2)} $ range from $ -20 $ to $ 20 $ , $ 0.5 $ to $ 5.25 $ , and $ 1 $ to $ 20 $ , respectively.
Each AFP equation is solved over the following domain $ \left[{a}_{\mathrm{min}}^{\prime },{a}_{\mathrm{max}}^{\prime}\right]\times \left[{\tau}_{\mathrm{min}},{\tau}_{\mathrm{max}}\right] $ = $ \left[ da,\mathrm{1.5}{a}_{\mathrm{max}}\right]\times \left[\mathrm{0,0.5}\right] $ , where $ {a}_{\mathrm{max}} $ is the maximum amplitude selected for the corresponding $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ . The grid spacing used is $ da=0.05 $ , and the time step is chosen according to $ dt=\frac{1}{2}\frac{(da)^2}{D^{(2)}} $ . We use the second-order accurate central difference schemes for the first and second spatial derivatives, while the time-stepping is performed according to the Crank–Nicolson method. For the left and right boundaries, we apply second-order one-sided FDs.
Before training, input–output examples are standardized to have zero mean and unit variance. This is a standard practice in deep learning to avoid the vanishing or exploding gradient problem and to help neural networks converge faster during training. We train the proposed operator network for $ {10}^4 $ iterations using Adam with a learning rate of $ 0.001 $ that decays to 0.98 of its value every $ {10}^3 $ iterations. The training continues for $ 9\times {10}^4 $ iterations with LBFGS. We use the batch size of $ {2}^{16} $ for each iteration. Computations are carried out using double-precision floating point number. The DeepONet training process was completed in 16.3 hours on a system equipped with an NVIDIA Quadro RTX 4000 graphics card.
For the first $ 1.5\times {10}^4 $ iterations, we use the standard mean squared error as the loss function,
where
$ {D}_{\tau}^{(n)} $ and $ {D}_{\tau, \theta}^{(n)} $ correspond to the target and DeepONet-predicted values, respectively, and $ {N}_b $ denotes the batch size. For the rest of the training, we use the following loss function to balance the losses of the first and second finite-time KM coefficients,
The solution field of the second finite-time KM coefficient $ {D}_{\tau}^{(2)}\left(a,t\right) $ generally has more curvature compared to that of the first finite-time KM coefficient (see Figure 5), and thus is harder to train. This motivates the use of separate trunk networks for the first and second finite-time KM coefficients and the modified loss function (Eq. 3.5) in the final stage of the training.
4. Parameter identification framework
In this section, we will outline the system identification procedure to recover from an acoustic time series three parameters of interest, $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ , that describe the amplitude dynamics, Eq. (2.13). We first present how to calculate the finite-time KM coefficients from the statistics of the time series. We then proceed with the parameter identification algorithm.
4.1. Signal processing
As illustrated in Figure 4, finite-time KM coefficients can be estimated from the time series of the acoustic signals in the following manner:
Step 1. Apply band-pass filter to the acoustic signal $ \eta (t) $ around the frequency $ {f}_0 $ of interest, that is, $ \left[{f}_0-\Delta f/2,{f}_0+\Delta f/2\right] $ , where $ \Delta f $ is the bandwidth.
Step 2. Apply Hilbert transform to obtain the acoustic signal envelope $ A(t) $ and estimate the stationary PDF $ P(a) $ by histogram binning.
Step 3. Estimate the joint PDF $ P\left({a}^{\prime },t+\tau; a,t\right) $ of the signal envelope $ A(t) $ and the time-shifted envelope $ A\left(t+\tau \right) $ by histogram binning.
Step 4. Calculate the transitional PDF by normalizing the joint PDF with $ P(a) $
Step 5. The finite-time KM coefficients can then be readily calculated by numerical quadrature according to Eq. (2.20).
The finite-time KM coefficients, $ {D}_{\tau}^{(n)}\left(a,\tau \right) $ , need to be calculated for a selection of amplitudes and time shifts. We select several values of $ a $ that occurs in the signal with probability larger than 0.1% with the spacing $ da=0.05 $ . We distribute 10 values of finite-time shifts uniformly between $ \max \left\{{F}_s^{-1},\Delta {f}^{-1},{\left({\omega}_0/2\pi \right)}^{-1}\right\} $ and $ 2{\tau}_A $ , where $ {\tau}_A $ is the time shift where the autocorrelation of $ A(t) $ reduces to 25% of its initial value. We then take the last nine values of $ \tau $ since typically the finite-time effects still significantly affect the estimated finite-time KM coefficients at the first $ \tau $ .
4.2. Parameter identification procedure
Having the estimated finite-time KM coefficients from the processed time signal, the latent parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ can be obtained by an iterative optimization procedure that minimizes the following loss function,
Here, we denote by $ {\hat{D}}_{\tau}^{(n)} $ the measured finite-time coefficients from time signal data and $ {D}_{\tau, \theta}^{(n)} $ the DeepONet prediction that corresponds to the guess value of $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ . Both $ {\hat{D}}_{\tau}^{(n)} $ and $ {D}_{\tau, \theta}^{(n)} $ have been standardized according to the normalizers used for the DeepONet training data. This helps to ensure that the first and second finite-time KM coefficients have similar magnitude. In the above expression, $ N={N}_a\times {N}_{\tau } $ , where $ {N}_a $ and $ {N}_{\tau } $ are the number of discrete envelopes and time-shifts extracted from the time signal, respectively. To take into account the statistical significance of different amplitudes in the time signal data, we include the weight $ \tilde{P}\left({a}_i\right)=p\left({a}_i\right){N}_a $ into the loss function, where $ p(a) $ is the probability mass function of the discretized amplitude.
Given a processed time signal data, the AFP-based system identification procedure is outlined as follows:
Step 1. Provide an initial guess to the parameters $ \nu, \kappa $ , and $ {D}^{(2)} $ .
Step 2. Predict $ {D}_{\tau, \theta}^{(1)}\left(a,\tau \right) $ and $ {D}_{\tau, \theta}^{(2)}\left(a,\tau \right) $ using DeepONet according to the parameters provided.
Step 3. Evaluate the loss function given in Eq. (4.2).
Step 4. Update $ \nu, \kappa, {D}^{(2)} $ iteratively using a selected optimizer.
Step 5. If after optimization, the loss function is still larger than $ {10}^{-2} $ , return to step 1 to start with another initial guess.
A large selection of optimizers and their implementations exist in scientific computing libraries. In this work, we select LBFGS (Liu and Nocedal, Reference Liu and Nocedal1989), a gradient-based optimizer whose implementation is readily available in PyTorch. The exact gradient between the loss function and the parameters is available since the computations in modern deep learning frameworks can be tracked through a computational graph. The automatic differentiation framework (also known as backpropagation) calculates the chain rule from the loss function to the output of the DeepONet, from the output of the DeepONet to the input of the branch network, and from the input of the branch network to the parameters $ \nu, \kappa $ and $ {D}^{(2)} $ .
A widely known gradient-free Nelder–Mead simplex algorithm (Nelder and Mead, Reference Nelder and Mead1965) can also be used as an alternative. It is used in Boujo and Noiray (Reference Boujo and Noiray2017) as the optimizer for the FD based parameter identification. In Section 6, we also provide the results by combining DeepONet with the Nelder–Mead optimizer (with implementation provided in SciPy) to facilitate comparison between FD and DeepONet approaches for solving the inverse problem.
5. Bayesian inference/MCMC
In this section, we will describe a Bayesian inference procedure to estimate the posterior distribution of the parameters $ \vartheta =\left\{\nu, \kappa, {D}^{(2)}\right\} $ given a specified prior and likelihood distribution. According to the Bayes rule, the posterior distribution is given by
where $ P\left(\vartheta \right) $ refers to the prior distribution and $ P\left(\mathcal{D}|\vartheta \right) $ refers to the likelihood distribution. $ \mathcal{D} $ denotes a set of $ N $ observations, $ \mathcal{D}={\left\{a,\tau, {\hat{D}}_{\tau}^{(1)},{\hat{D}}_{\tau}^{(2)}\right\}}_{1:N} $ , that is available about the system of interest.
As the denominator in Eq. (5.1) is often intractable, we estimate the posterior distribution over the parameters by employing an MCMC algorithm that only needs pointwise evaluations of the unnormalized target distribution. The algorithm aims to construct a Markov chain whose stationary distribution converges to the target distribution (Eq. 5.1). In each step, a set of new parameters $ {\vartheta}^{\prime } $ is proposed according to a certain proposal distribution $ {q}_L\left({\vartheta}^{\prime }|\vartheta \right) $ . The proposed parameters are then evaluated according to the Metropolis–Hastings (M–H) acceptance criteria,
where the proposal will be accepted with probability $ \alpha $ . We then set $ {\vartheta}_{t+1}={\vartheta}^{\prime } $ if accepted, and $ {\vartheta}_{t+1}={\vartheta}_t $ if rejected.
In this work, we employ the Metropolis-adjusted Langevin algorithm (MALA) with a gradient-based adjustment to its covariance matrix. The proposal distribution of the MALA algorithm takes the following form,
It proposes a new state according to the Gaussian distribution where the mean encourages a step toward an increase in posterior value and with a covariance matrix $ {LL}^T $ .
In a standard MALA, the covariance matrix is isotropic, that is, $ {LL}^T={\sigma}^2I $ with constant variance $ {\sigma}^2 $ . In the proposed approach, the covariance matrix $ {LL}^T $ is adaptively updated according to the fast variant of the gradient-based adaptive MALA algorithm, termed gadMALAf originally devised in Titsias and Dellaportas (Reference Titsias and Dellaportas2019). The key idea is to adjust the proposal distribution to maximize the generalized speed measure given by,
Here, the notion of speed refers to how fast the Markov chain converges to its stationary distribution, that is, the target posterior. In the above equation, $ \mathrm{\mathscr{H}}{q}_L $ is the entropy of the proposal distribution and $ \beta $ is a hyperparameter. Intuitively, this objective function encourages high variance (large step) and high acceptance rate simultaneously.
During the burn in period, we update the Cholesky factor of the proposal covariance matrix $ L $ according to
where $ {\mathrm{\mathcal{F}}}_L $ is the lower bound of $ {s}_L $ and $ {\rho}_t $ is the learning rate. The algorithm is particularly efficient because the adaptation still occurs even when the candidate states are rejected (Titsias and Dellaportas, Reference Titsias and Dellaportas2019). Further details on the derivation of the gadMALAf algorithm are provided in Appendix A. It will be demonstrated through the results presented in the next section that careful construction of the covariance matrix of the proposal distribution is necessary to mitigate the slow mixing of the MCMC samples due to highly correlated parameters.
6. Numerical investigations
6.1. Synthetic data
We generate synthetic time signals using Simulink by simulating the stochastic VdP oscillator (Eqs. 2.1 and 2.8) for 500 seconds. We use the sampling rate $ {F}_s=10 $ kHz, frequency $ {\omega}_0/2\pi =150 $ Hz, and bandwidth $ \Delta f=100 $ Hz. Hundred synthetic signals are generated, for which the oscillator parameters $ \nu, \kappa $ and $ {D}^{(2)} $ are uniformly distributed across the ranges $ \left[\mathrm{2.5,18.5}\right] $ , $ \left[\mathrm{1.6,4.6}\right] $ , and $ \left[\mathrm{2.5,18.5}\right] $ , respectively.
It is worth noting that the test data are different from the training data generated by simulating the AFP equation. First, although the values of $ \nu, \kappa $ and $ {D}^{(2)} $ in the test data are within the support of the training data, the values have not been seen before by the DeepONet. Furthermore, the finite-time KM coefficients calculated from the processed time signals are corrupted by errors due to band-pass filtering, the finite sampling rate, and the finite sample size of the signals.
Figure 5 visually demonstrates how the proposed DeepONet tries to predict the parameters that yield finite-time KM coefficients that match those obtained from the processed time-series data. The initial guess is represented by the transparent wireframe, while the final prediction is depicted by the darker wireframe. Notably, there is excellent agreement between the predicted wireframe and the measured values extracted from the signal.
To quantitatively evaluate the accuracy of the proposed approach in tackling the inverse problem, we present in Figure 6a the relative error between the true and predicted parameters. The results are provided in the form of violin error plots. In the same figure, we also present the results using the FD-based system identification procedure proposed in Boujo and Noiray (Reference Boujo and Noiray2017), where the Nelder–Mead simplex method is chosen as the optimization algorithm. For the sake of comparison, we also include parameter identification results using DeepONet with the Nelder–Mead optimizer. It can be seen that all methods successfully identify the latent parameters with excellent accuracy, especially for $ \nu $ and $ \kappa $ with median error below 5%.
We perform tests to evaluate the effect of the filtering bandwidth on the parameter identification accuracy using DeepONet. To this end, the bandwidth $ \Delta f $ is varied among three distinct values: 30, 60, and 100 Hz. The results are presented in the right panel of Figure 6. The reduction in bandwidth mainly increases the error in the diffusion coefficient $ {D}^{(2)} $ , while the prediction accuracy of $ \nu $ and $ \kappa $ is only slightly affected. This outcome can be attributed to the fact that $ {D}^{(2)} $ characterizes the noise intensity, whereas $ \nu $ and $ \kappa $ primarily influence the limit-cycle amplitude at the peak frequency (in the purely deterministic case, the limit-cycle amplitude is $ {A}_{\mathrm{det}}=\sqrt{8\nu /\kappa } $ ). Since white noise has a flat spectral profile, narrowing the bandwidth around the peak frequency then tends to predominantly increase the prediction error in $ {D}^{(2)} $ .
To assess the efficiency of the proposed parameter identification approach, we measure the time required to identify $ \nu, \kappa $ , and $ {D}^{(2)} $ given the measured values of $ {\hat{D}}_{\tau}^{(1)} $ and $ {\hat{D}}_{\tau}^{(2)} $ from the time signals. The results are presented in the left panel of Figure 7 in the form of violin plots and quantitatively detailed in Table 1. It can be seen that DeepONet with LBFGS manages to solve the inverse problem two orders of magnitude faster than the FD approach with the Nelder–Mead simplex algorithm. For this study, we have selected the FD simulation parameters that minimize the time required to solve the inverse problem (see Appendix B) to facilitate meaningful comparison. The initial guesses for $ \nu, \kappa $ , and $ {D}^{(2)} $ are taken from $ \left\{\mathrm{1,10.5,20}\right\} $ , $ \left\{\mathrm{0.5,2.875,5.25}\right\} $ , and $ \left\{\mathrm{1,10.5,20}\right\} $ , respectively. For most cases, however, the first initial guess is enough for the optimization problem to converge. The reported time includes all the trials needed to solve the inverse problem. The DeepONet model is trained and evaluated on a GPU (NVIDIA Quadro RTX 4000), while the FD computations are performed on a CPU (AMD Ryzen Threadripper 3970X).
Note: The presented values represent the median and standard deviation of the time measurements. For the FD approach, the temporal and spatial resolutions as well as the number of processes have been varied such that the minimum time is reported.
The major speed-up gain can be attributed to the fact that DeepONet directly predicts the finite-time KM coefficients without the need to calculate solution over the whole $ \left({a}^{\prime },\tau \right) $ domain (see Figure 2), and without the need for time-stepping. This is demonstrated on the right panel of Figure 7 in the form of violin plots and in Table 1 in the quantitative form where we present the time comparison for the forward pass, that is, the time to predict the finite-time KM coefficients given the parameters $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ . Furthermore, owing to the differentiability of the neural network, we can leverage optimizers, such as LBFGS, which utilizes gradient information to further accelerate the system identification process. Figure 8 illustrates how the gradient-based optimization algorithm can identify the desired parameters in a significantly fewer number of iterations.
Finally, we perform parameter identification on synthetic signals with parameters $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ where the values reside outside of the support of the training data. Specifically, we generate 64 signals with $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ taking values $ \left\{\mathrm{22.5,26.5,30.5,34.5}\right\} $ , $ \left\{\mathrm{5.6,6.6,7.6,8.6}\right\} $ , $ \left\{\mathrm{22.5,26.5,30.5,34.5}\right\} $ , respectively. We recall that the training data have maximum values of 20, 5.25, and 20, for $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ , respectively. This means that the parameters for testing are significantly distanced from those used during training, thus putting the DeepONet extrapolation abilities to the test. The violin plots of the relative error between the prediction and the true values are provided in Figure 9. We report median errors of 6.75%, 7.33%, and 31.33% for $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ , respectively, for DeepONet-LBFGS. The FD approach with the Nelder–Mead optimizer results in a median error of 3.13%, 3.04%, and 19.97% for $ \nu $ , $ \kappa $ , and $ {D}^{(2)} $ , respectively. Indeed, this result demonstrates that the errors for DeepONet are larger in the out-of-distribution case compared to the in-distribution case. However, it is noteworthy that the error increase primarily affects the $ {D}^{(2)} $ prediction, while the errors in $ \nu $ and $ \kappa $ do not increase significantly. Hence, in the out of distribution setting, DeepONet still maintains the capability to recover the linear growth rate $ \nu $ and the cubic saturation coefficient $ \kappa $ , albeit with reduced accuracy.
6.2. Experimental data
In this section, we attempt to identify the parameters $ \nu, \kappa $ and $ {D}^{(2)} $ characterizing acoustic pressure signals obtained from experiments. Specifically, we refer to the experiments performed in Noiray and Denisov (Reference Noiray and Denisov2017) of premixed methane flames on a lab-scale cylindrical combustion chamber. The authors observed three different operating conditions: linearly stable (c1), marginally stable (c2), and linearly unstable (c3) by increasing the equivalence ratio from 0.47 to 0.5.
We process the time signals under the operating conditions c2 and c3 with 40-Hz bandwidth around the thermoacoustic mode at 150 Hz. The signals are 60 s long with 10-kHz sampling rate. We then employ DeepONet to identify the parameters of interest. It should be noted that the same DeepONet, trained only once with synthetic data (as described in Section 3.1), is used here. The resulting identified parameters are presented in Table 2, together with the estimated parameters from Noiray and Denisov (Reference Noiray and Denisov2017) where the authors used the extrapolation method. Using the identified parameters, we reconstruct the theoretical stationary PDFs (Eq. 2.19) in Figure 10 presented together with the normalized histogram of the corresponding time signals. Good agreement can be observed between the reconstructed PDFs and the histogram data.
Furthermore, we can go one step further to provide uncertainty quantification for our prediction by performing Bayesian inference using the gradient-based adaptive MALA. We use a Gaussian likelihood and a Gamma Prior to construct the unnormalized posterior density. We assume i.i.d. Gaussian observations such that the likelihood takes the form,
We include the weight $ \tilde{P}\left({a}_i\right) $ to scale the variance (a constant $ {\sigma}_n^2 $ ) of each data point to express higher confidence for amplitudes that have higher occurrence in the time signal. We can see the resemblance of the above likelihood with the loss function given in Eq. (4.2). We estimate $ {\sigma}_n^2 $ for $ n=1,2 $ by calculating the weighted mean squared error between $ {\hat{D}}_{\tau}^{(n)} $ and $ {D}_{\tau, \theta}^{(n)} $ .
As for the prior distribution, we use gamma prior $ \Gamma \left(\alpha, \beta \right) $ with parameters $ \alpha $ and $ \beta $ , that is,
where $ \alpha $ and $ \beta $ are selected such that the parameters have the estimated values from Noiray and Denisov (Reference Noiray and Denisov2017) as the prior mean and 0.1 of the mean values as the prior standard deviation.
We conduct MCMC simulations to generate 30,000 samples, setting aside the first 10,000 samples as a burn-in period to adapt the proposal covariance matrix and to converge to the stationary distribution. This was performed successfully in 2.6 minutes, demonstrating the high efficiency of DeepONet as a forward simulator. The posterior distribution for each of the parameters $ \nu, \kappa $ and $ {D}^{(2)} $ is provided in Figures 11 and 13 for the operating conditions c2, and c3, respectively. The joint distributions between each two of the three parameters are given in Figures 12 and 14 for the operating conditions c2, and c3, respectively.
The joint distribution shows how $ \nu $ and $ \kappa $ can be very strongly correlated. A transition proposal with isotropic covariance matrix (e.g., random walk Metropolis or standard MALA) will either result in too many rejections or force the use of a very small learning rate. A chain that moves too slowly indicates high autocorrelation, so the effective sample size becomes much smaller than the actual generated samples. This motivates the use of the adaptive algorithm gadMALAf (Titsias and Dellaportas, Reference Titsias and Dellaportas2019), where a suitable proposal distribution is automatically constructed during the burn-in period by maximizing the generalized speed measure. This is reflected in Figures 15 and 16 where we can observe a good mixing of the MCMC samples with rapidly decaying autocorrelation.
7. Concluding remarks
In this work, a DeepONet has been utilized to recover the model parameters from time series data. The numerical results presented in this article demonstrated a major advantage in terms of speed, while maintaining a level of accuracy on par with the classical FD method. The speed advantage can be attributed to two key factors. First, using DeepONet as a surrogate, we managed to bypass the calculations required to solve the AFP equation across the entire simulation domain $ \left({a}^{\prime },\tau \right) $ for each value of the amplitude $ a $ present in the time signal. Second, the differentiability of the neural network allows the computation of exact gradient information between the predicted finite-time KM coefficients and the latent parameters $ \left\{\nu, \kappa, {D}^{(2)}\right\} $ . This facilitates the utilization of gradient-based optimizers, further accelerating the parameter identification process. Furthermore, these advantages allow the application of Bayesian inference through the gradient-based adaptive MALA algorithm to provide uncertainty quantification of the parameters prediction.
In addition to these advantages, there are several opportunities for further studies. First, our proposed operator network, trained solely on input–output examples, can only make reliable predictions within the range of the training data. To address this limitation, one potential solution is to incorporate PDE information into the neural network, as demonstrated in the Physics Informed DeepONet (Wang, Wang, and Perdikaris, Reference Wang, Wang and Perdikaris2021), or to explore other reliable extrapolation approaches, as suggested in Zhu et al. (Reference Zhu, Zhang, Jiao, Karniadakis and Lu2023). Additionally, it might be worthwhile to explore alternative neural network architectures beyond fully connected networks. Attention-based architectures, as seen in Vaswani et al. (Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017) and Wang, Teng, and Perdikaris (Reference Wang, Teng and Perdikaris2021), have shown success in various applications and could be a promising avenue for improvement. The methodology presented in this article can be adapted to handle different nonlinear forcing functions by adjusting the network’s input function. Extending the methodology to stochastic processes with multiplicative noise (nonconstant diffusion coefficient) might necessitate architecture modification (e.g., using the multiple input–output networks proposed in Jin et al. (Reference Jin, Meng and Lu2022)) to accommodate the drift and diffusion coefficients as two separate input functions. Finally, it is interesting to investigate the possibility to extend our methodology to systems involving multimode oscillations or coupled oscillators. Provided that the corresponding AFP equations can be formulated and solved using the FD method to generate the necessary training data, the approach outlined in this article could be effectively applied.
Acknowledgments
The authors are grateful for the technical assistance of Dr. Boujo for the initial implementation of the FD solver.
Data availability statement
Data could be made available upon request.
Author contribution
B.D. conceived the study with the support of N.N. T.S. developed the algorithms with the support of B.D. T.S. carried out the numerical simulations. All authors analyzed the results. T.S. and B.D. wrote the original manuscript. All authors reviewed the manuscript.
Funding statement
This study was supported by the European Research Council under the ERC Consolidator Grant (No. 820091) TORCH (2019–2024).
Competing interest
The authors declare none.
Ethical standard
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Appendix A. Derivation of the gradient-based adaptive MALA algorithm
In this section, we will describe the derivation of the gradient-based adaptive MALA (gadMALAf) algorithm proposed in Titsias and Dellaportas (Reference Titsias and Dellaportas2019). The aim is to adapt the Cholesky factor $ L $ of the proposal distribution covariance matrix to maximize the generalized speed measure $ {s}_L $ . This is equivalent to maximizing $ \log {s}_L $ ,
Using Jensen’s inequality, we can obtain the lower bound $ {\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right)\le \log {s}_L\left({\vartheta}_t\right) $ as follows,
where we have introduced $ \pi $ to denote a general target distribution (in the context of Bayesian inference $ \pi $ denotes the posterior distribution $ \pi \left(\vartheta \right)=P\left(\vartheta |\mathcal{D}\right) $ ).
We aim to take a gradient ascent in the direction of maximum $ {\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right) $ . However, it is not clear how to take the gradient with respect to $ L $ of the expectation over a distribution that itself depends on $ L $ . We employ the reparameterization trick, a widely used technique in variational inference, to be able to take a gradient toward maximizing $ {\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right) $ . We parameterize $ {\vartheta}^{\prime } $ with $ \unicode{x025B} :\hskip0.1em p\left(\unicode{x025B} \right) $ according to a deterministic transformation $ {\vartheta}^{\prime }={g}_L\left(\unicode{x025B}; {\vartheta}_t\right) $ such that
The gradient of $ \mathrm{\mathcal{F}} $ can then be obtained according to
We now focus on the MALA algorithm, where the reparameterization of the proposal $ {\vartheta}^{\prime }={g}_L\left(\unicode{x025B}; {\vartheta}_t\right) $ takes the form of
The corresponding Monte Carlo estimator of the gradient $ {\nabla}_L{\mathrm{\mathcal{F}}}_L\left({\vartheta}_t\right) $ can be obtained as
The fast variant of the gadMALAf algorithm assumes that $ \nabla \log \pi \left({\vartheta}_t^{\prime}\right) $ is constant with respect to $ L $ . Furthermore, in the perspective of implementation, the gradient of the M–H log ratio in the above equation can be calculated from the lower triangular part of the following outer product,
The gradient terms appearing in this algorithm can be obtained conveniently using the automatic differentiation readily available in modern deep learning frameworks such as PyTorch.
We now have complete information to update $ L $ according to the gadMALAf algorithm. During the burn-in period, we iteratively adapt $ L $ according to the following procedure,
Step 1. Propose $ {\vartheta}_t^{\prime }={g}_L\left({\unicode{x025B}}_t;{\vartheta}_t\right) $ with $ \unicode{x025B} :\hskip0.1em \mathcal{N}\left(\unicode{x025B} |\;0,I\right) $ .
Step 2. Update $ L\leftarrow L+{\rho}_t{\nabla}_L{\mathrm{\mathcal{F}}}_L\left({\vartheta}_t,{\unicode{x025B}}_t\right) $ .
Step 3. Accept or reject $ {\vartheta}_t^{\prime } $ according to the M–H acceptance ratio to obtain $ {\vartheta}_{t+1} $ .
Step 4. Set $ {\hat{\alpha}}_t=1 $ if $ {\vartheta}_t^{\prime } $ is accepted and $ {\hat{\alpha}}_t=0 $ otherwise. Update the hyperparameter $ \beta \leftarrow \beta \left(1+{\rho}_{\beta}\left({\hat{\alpha}}_t-{\alpha}_{\ast}\right)\right) $
We set the hyperparameters according to the recommendations in the original paper (Titsias and Dellaportas, Reference Titsias and Dellaportas2019). The learning rate for $ {\rho}_t $ is adjusted using the root mean square propagation algorithm, where $ {\rho}_t=\eta /\left(1+\sqrt{G_t}\right) $ and $ {G}_t=0.9{G}_t+0.1{\left[{\nabla}_L{\mathrm{\mathcal{F}}}_L\left({\vartheta}_t,{\unicode{x025B}}_t\right)\right]}^2 $ . Furthermore, we set $ {\rho}_{\beta }=0.02 $ , and $ {\alpha}_{\ast }=0.55 $ as the target acceptance ratio for the MALA algorithm.
Appendix B . Parameter study of the finite-difference solver
We conducted a study to select simulation parameters for the finite-difference simulation that minimizes the time required to solve the inverse problem. We vary the spatial resolution $ da $ taking values $ \left\{\mathrm{0.05,0.1,0.2,0.5,0.75}\right\} $ . The temporal resolutions are varied by parameterizing the time step with a constant $ k $ , that is, by setting $ dt=k\times \frac{1}{2}\frac{D^{(2)}}{da^2} $ . We also run the forward simulation in parallel since the AFP equations can be solved independently for different amplitudes $ a $ . We vary the number of processes taking values $ \left\{\mathrm{1,2,4,8,12}\right\} $ .
Across 100 synthetic test data, the median time to identify the parameters of the processed time signal is reported in Figure A1. Although the Crank–Nicolson method allows stable simulations in all parameters tested, we can see that the required time does not decrease further when $ da $ increases above $ 0.5 $ and $ k $ increases above $ 300 $ . In fact, the minimum median time is obtained for $ da=0.5 $ with $ k=300 $ and using two processes. This is the parameter setting that is used for the comparison with DeepONet in Section 6.
Comments
No Comments have been published for this article.