Physics-constrained convolutional neural networks for inverse problems in spatiotemporal partial differential equations

Daniel Kelshaw; Luca Magri

doi:10.1017/dce.2024.46

Physics-constrained convolutional neural networks for inverse problems in spatiotemporal partial differential equations

Published online by Cambridge University Press: 20 December 2024

Daniel Kelshaw and

Luca Magri

Show author details

Daniel Kelshaw: Affiliation:
Department of Aeronautics, Imperial College London, London, UK
Luca Magri*: Affiliation:
Department of Aeronautics, Imperial College London, London, UK The Alan Turing Institute, The British Library, London, UK DIMEAS, Politecnico di Torino, Torino, Italy.
*: Corresponding author: Luca Magri; Email: [email protected]

Article contents

Abstract
Impact Statement
Introduction
Background
Convolutional neural networks
Physics-constrained convolutional neural networks (PC-CNN)
Synthetic data generation
Results on uncovering solutions from biased data
Results on reconstruction from sparse information
Conclusions
Data availability statement
Author contribution
Funding statement
Competing interest
Ethical standard
Footnotes
References

Abstract

We propose a physics-constrained convolutional neural network (PC-CNN) to solve two types of inverse problems in partial differential equations (PDEs), which are nonlinear and vary both in space and time. In the first inverse problem, we are given data that is offset by spatially varying systematic error (i.e., the bias, also known as the epistemic uncertainty). The task is to uncover the true state, which is the solution of the PDE, from the biased data. In the second inverse problem, we are given sparse information on the solution of a PDE. The task is to reconstruct the solution in space with high resolution. First, we present the PC-CNN, which constrains the PDE with a time-windowing scheme to handle sequential data. Second, we analyze the performance of the PC-CNN to uncover solutions from biased data. We analyze both linear and nonlinear convection-diffusion equations, and the Navier–Stokes equations, which govern the spatiotemporally chaotic dynamics of turbulent flows. We find that the PC-CNN correctly recovers the true solution for a variety of biases, which are parameterized as non-convex functions. Third, we analyze the performance of the PC-CNN for reconstructing solutions from sparse information for the turbulent flow. We reconstruct the spatiotemporal chaotic solution on a high-resolution grid from only 1% of the information contained in it. For both tasks, we further analyze the Navier–Stokes solutions. We find that the inferred solutions have a physical spectral energy content, whereas traditional methods, such as interpolation, do not. This work opens opportunities for solving inverse problems with partial differential equations.

Keywords

scientific machine learning convolutional neural networks inverse problems

Type: Research Article
Information: Data-Centric Engineering , Volume 5 , 2024 , e43

DOI: https://doi.org/10.1017/dce.2024.46 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Impact Statement

Given partial information on the solution of an equation, can we infer the full solution? Current data-driven methods that perform this task are limited to steady problems, or they require access to the full solution for training. We propose the physics-constrained convolutional neural network (PC-CNN) to tackle spatiotemporal nonlinear partial differential equations, with applications to fluid mechanics and transport phenomena.

1. Introduction

Physical and engineering phenomena can be modeled by nonlinear partial differential equations, which may vary both in space and time. In partial differential equations, on the one hand, the goal of the forward problem is to predict the output (e.g., the evolution of the system) given inputs on the initial conditions, system parameters, and boundary conditions. On the other hand, the goal of the inverse problem is to compute the input that corresponds to a given set of outputs. Whereas forward problems are typically well-posed, inverse problems are not. The overarching objective of this paper is to propose a method—the physics-constrained convolutional neural networks (PC-CNN)—to solve two types of inverse problems of engineering interest: (i) uncovering the solution to the partial differential equation that models the system under investigation from biased data; and (ii) the reconstruction of the solution on a high-resolution grid from sparse information.

The first inverse problem concerns error removal, which, in the literature, typically considers only the case of aleatoric noise-removal, which is a term attributed to methods that remove small, unbiased stochastic variations from the underlying solution. This has been achieved through various methods, including filtering methods (Kumar and Nachamai, Reference Kumar and Nachamai2017); proper orthogonal decomposition (Mendez et al., Reference Mendez, Raiola, Masullo, Discetti, Ianiro, Theunissen and Buchlin2017; Raiola et al., Reference Raiola, Discetti and Ianiro2015); and the use of autoencoders (Vincent et al., Reference Vincent, Larochelle, Lajoie, Bengio and Manzagol2010), to name only a few. Our first demonstration of the proposed physics-constrained convolutional neural network aims, instead, to formalize and solve an inverse problem to remove large-amplitude bias (systematic error) from data; ultimately, uncovering the true solution of the partial differential equation.

The second inverse problem concerns the reconstruction of solutions to partial differential equations from sparse information, which describes only a small portion of the solution and is a challenge for system identification [e.g., (Brunton et al., Reference Brunton, Proctor and Kutz2016)]. Current methods primarily make use of convolutional neural networks due to their ability to exploit spatial correlations. The current data-driven approaches need access to the full high-resolution samples in the training, which are needed to produce a parametric mapping for the reconstruction task [e.g., (Dong et al., Reference Dong, Loy, He and Tang2014; Shi et al., Reference Shi, Caballero, Huszár, Totz, Aitken, Bishop, Rueckert and Wang2016; Yang et al., Reference Yang, Zhang, Tian, Wang, Xue and Liao2019; Liu et al., Reference Liu, Tang, Huang and Lu2020)]. In many cases, access to the high-resolution samples is limited, on which the method proposed in this paper does not rely.

Common to both inverse problems that we tackle in this paper is the goal of finding a mapping from some known quantity to the corresponding solution of the governing partial differential equation. By exploiting the universal function approximation property of neural networks, it is possible to obtain numerical surrogates for function mappings or operators (Hornik et al., Reference Hornik, Stinchcombe and White1989; Zhou, Reference Zhou2020). For a network parameterized by weights, there exists an optimal set of weights that results in the desired mapping; the challenge is to realize these through an optimization process. This optimization is formulated as the minimization of a loss function, the evaluation of which offers a distance metric to the desired solution. When considering forms of structured data, convolutional neural networks excel due to their ability to exploit spatial correlations in the data, which arise from the use of localized kernel operations.

With physical systems governed by partial differential equations, the problems encountered in the absence of ground-truth high-resolution observations can be mitigated by employing a physics-constrained approach by imposing prior knowledge of the governing equations [e.g., (Lagaris et al., Reference Lagaris, Likas and Fotiadis1998; Raissi et al., Reference Raissi, Perdikaris and Karniadakis2019; Doan et al., Reference Doan, Polifke and Magri2021)]. One such tool is that of Physics-informed neural networks (PINNs) (Raissi et al., Reference Raissi, Perdikaris and Karniadakis2019), which provide a tool for physically-motivated problems, exploiting automatic differentiation to constrain the governing equations. Although the use of PINNs for inverse problems shows promising results for simple systems (Eivazi and Vinuesa, Reference Eivazi and Vinuesa2022), they remain challenging to train and are not designed to exploit spatial correlations (Grossmann et al., Reference Grossmann, Komorowska, Latz and Schönlieb2023; Krishnapriyan et al., Reference Krishnapriyan, Gholami, Zhe, Kirby and Mahoney2021). On the other hand, convolutional neural networks are designed to exploit spatial correlations, but they cannot naturally leverage automatic differentiation to evaluate the physical loss, as PINNs do, because they provide a mapping between states, rather than a mapping from the spatiotemporal coordinates to states as in PINNs. As such, the design of physics-informed convolutional neural networks is more challenging and relies on finite-different approximations or differentiable solvers (Gao et al., Reference Gao, Sun and Wang2021; Kelshaw and Magri, Reference Kelshaw and Magri2023b). For example, the authors of (Gao et al., Reference Gao, Sun and Wang2021) show results for a steady flow field (with no temporal dynamics), which produces a mapping for stationary solutions of the Navier–Stokes equations.

In this work, we propose a physics-constrained convolutional neural network with time-windowing suitable for handling a range of physically motivated inverse problems. Imposing prior knowledge of physics allows us to restrict the function space and use limited collocation points to find a solution. We emphasize that this is not a PINN approach (Raissi et al., Reference Raissi, Perdikaris and Karniadakis2019) because we do not leverage automatic differentiation to obtain gradients of outputs with respect to inputs to constrain the physics. This network employs a time-windowing scheme to effectively compute the time-dependent residuals without resorting to recurrent-based network architectures. Realizations of the network that do not conform to the given governing equations are penalised, which ensures that the network learns to produce physical predictions.

We first provide the necessary background in Section 2, and introduce the two physically motivated inverse problems. A brief background on convolutional neural networks is provided in Section 3, before introducing the proposed physics-constrained convolutional neural networks in Section 4. Here, we outline the generic approach before discussing specifics for each of the inverse problems discussed in the paper. In Section 5 we discuss data generation, and how employing a pseudospectral solver can be used to further constrain physical properties of the system. Finally, we showcase the results for each method. Results for the bias-removal task are shown in Section 6, where the method is demonstrated on three partial differential equations of increasing complexity: linear convection-diffusion, nonlinear convection-diffusion, and the Navier–Stokes equations. Results for the reconstruction task are shown in Section 7, in which results are analyzed on the turbulent chaotic Navier–Stokes equations. Conclusions in Section 8 mark the end of the paper.

2. Background

We consider dynamical systems, which are governed by partial differential equations (PDEs) of the form

(1)

$$ \mathcal{R}\left(\overset{\sim }{u};p\right)\equiv {\partial}_t\overset{\sim }{u}\left(x,t\right)-\mathcal{N}\left(\overset{\sim }{u};p\right), $$

where $ x\in \varOmega \subset {\mathbb{R}}^n $ is the spatial location; $ n $ is the space dimension; and $ \varOmega $ is the spatial domain. Time is denoted by $ t\in \left[0,T\right]\subset {\mathcal{R}}_{\ge 0} $ ; $ \overset{\sim }{u}\left(x,t\right):\varOmega \times \left[0,T\right]\to {\mathbb{R}}^m $ is the state, where $ m $ is the number of components; and $ \boldsymbol{p} $ is a vector that encapsulates the physical parameters of the system. A sufficiently smooth differential operator is denoted by $ \mathbb{N} $ , which represents the nonlinear part of the system of $ m $ partial differential equations; $ \mathbb{R} $ is the residual; and $ {\partial}_t $ is the partial derivative with respect to time. A solution $ \boldsymbol{u} $ of the PDE Eqn. (1) is the state that makes the residual vanish, i.e. $ \boldsymbol{u} $ is such that $ \mathcal{R}\left(u;p\right)=0 $ , with Eq. (1) being subject to prescribed boundary conditions on the domain boundary, $ \partial \varOmega $ , and initial conditions at $ t=0 $ .

2.1. Uncovering solutions from biased data

We first consider the inverse problem of uncovering the solution of a PDE (Eq. 1) from biased data. We assume that we have biased information on the system’s state

(2)

$$ \boldsymbol{\zeta} \left(\boldsymbol{x},t\right)=\boldsymbol{u}\left(\boldsymbol{x},t\right)+\boldsymbol{\phi} \left(\boldsymbol{x}\right), $$

where $ \boldsymbol{\phi} $ is a bias, which in practice, is spatially varying (systematic error) that can be caused by miscalibrated sensors (Sciacchitano et al., Reference Sciacchitano, Neal, Smith, Warner, Vlachos, Wieneke and Scarano2015), or modelling assumptions (Nóvoa and Magri, Reference Nóvoa and Magri2022). The quantity $ \boldsymbol{\zeta} \left(\boldsymbol{x},t\right) $ in Eq. (2) constitutes the biased dataset, which is not a solution of the PDE Eq. (1), i.e.

(3)

$$ \mathcal{R}\left(\boldsymbol{\zeta} \left(\boldsymbol{x},t\right);\boldsymbol{p}\right)\ne 0. $$

Given the biased information, $ \boldsymbol{\zeta} $ , our goal is to uncover the correct solution to the governing equations, $ \boldsymbol{u} $ , which is referred to as the true state. Computationally, we seek a mapping, $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ , such that

(4)

$$ {\boldsymbol{\eta}}_{\boldsymbol{\theta}}:\boldsymbol{\zeta} \left(\varOmega, t\right)\mapsto \boldsymbol{u}\left(\varOmega, t\right), $$

where the domain $ \Omega $ is discretized on a uniform, structured grid $ \varOmega \subset {\mathrm{\mathbb{R}}}^{N^n} $ . We choose the mapping $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ to be a convolutional neural network, which depends on trainable parameters $ \boldsymbol{\theta} $ (Section 3). The nature of the partial differential equations has a computational implication. For a linear partial differential equation, the residual is an explicit function of the bias, i.e., $ \mathcal{R}\left(\zeta \left(x,t\right);p\right)=\mathcal{R}\left(\phi \left(x,t\right);p\right) $ . This does not hold true for nonlinear systems, which makes the problem of removing the bias more challenging. This is discussed in Section 7. Figure 1(a) provides an overview of the inverse problem of uncovering solutions from biased data.

Figure 1. Inverse problems investigated in this paper. (a) Uncovering solutions from biased data. The model $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ is responsible for recovering the solution (true state), $ \boldsymbol{u}\left(\varOmega, t\right) $ , from the biased data, $ \boldsymbol{\zeta} \left(\varOmega, t\right) $ . The bias (systematic error), $ \boldsymbol{\phi} \left(\boldsymbol{x}\right) $ , is the difference between the biased data and the solution. (b) Reconstructing a solution from sparse information. The model $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ is responsible for mapping the sparse field $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ to the high-resolution field $ \boldsymbol{u}\left({\varOmega}_H,t\right) $ . The term $ \tau $ in both cases denotes the number of contiguous time-steps passed to the network, required for computing temporal derivatives. An explanation of the proposed physics-constrained convolutional neural network (PC-CNN), which is the ansatz for both mappings $ {\boldsymbol{\eta}}_{\theta } $ and $ {\boldsymbol{f}}_{\theta } $ , is provided in Section 4.

2.2. Reconstruction from sparse information

We formalize the inverse problem of reconstructing the solution from sparse information. Given sparse information, $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ , we aim to reconstruct the solution to the partial differential equation on a fine grid, $ \boldsymbol{u}\left({\varOmega}_H,t\right) $ , where the domain $ \varOmega $ is discretised on low- and high-resolution uniform grids, $ {\varOmega}_L\subset {\mathrm{\mathbb{R}}}^{N^n} $ and $ {\varOmega}_H\subset {\mathrm{\mathbb{R}}}^{M^n} $ , respectively, such that $ {\varOmega}_L\cap {\varOmega}_H={\varOmega}_L $ ; $ M=\kappa N $ ; and $ \kappa \in {\mathrm{\mathbb{N}}}^{+} $ is the scale factor. Our objective is to find $ \boldsymbol{u}\left({\varOmega}_H,t\right) $ given the sparse information $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ . We achieve this by seeking a parameterized mapping $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ such that

(5)

$$ {\boldsymbol{f}}_{\boldsymbol{\theta}}:\boldsymbol{u}\left({\varOmega}_L,t\right)\to \boldsymbol{u}\left({\varOmega}_H,t\right), $$

where the mapping $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ is a convolutional neural network parameterised by $ \boldsymbol{\theta} $ Footnote ¹. We consider the solution $ \boldsymbol{u} $ to be discretised with $ {N}_t $ times steps. Figure 1(b) provides an overview of the reconstruction task.

3. Convolutional neural networks

The mappings $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ and $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ (Eqs. (4, 5) respectively) are design choices. For data from spatiotemporal partial differential equations, we look for a tool that: (i) can, in principle, learn any continuous functions, i.e., is a universal approximator; (ii) learn spatial correlations, as partial differential equations are defined by local operators; (iii) do not violate the conservation laws imposed by the PDE (1); and (iv) can handle sequential data for the temporal dynamics. In this paper, we choose convolutional neural networks (LeCun and Bengio, Reference LeCun and Bengio1998), which are suitable tools for learning datasets of partial differential equations, from fluid mechanics (Gao et al., Reference Gao, Sun and Wang2021; Kelshaw and Magri, Reference Kelshaw and Magri2023a; Murata et al., Reference Murata, Fukami and Fukagata2020) to heat transfer (Edalatifar et al., Reference Edalatifar, Tavakoli, Ghalambaz and Setoudeh2021; Kim and Lee, Reference Kim and Lee2020). Convolutional neural networks naturally fulfill requirements (i, ii), but do not fulfill (iii, iv). In this paper, we propose the physics-constrained convolutional neural network (PC-CNN) to fulfill all requirements (i–iv). A convolutional neural network, $ {\boldsymbol{k}}_{\boldsymbol{\theta}} $ , is a composition of functions

(6)

$$ {\boldsymbol{k}}_{\boldsymbol{\theta}}={\boldsymbol{k}}_{{\boldsymbol{\theta}}_Q}^Q\circ \cdots \circ h\left({\boldsymbol{k}}_{{\boldsymbol{\theta}}_2}^2\right)\circ h\left({\boldsymbol{k}}_{{\boldsymbol{\theta}}_1}^1\right), $$

where $ {\boldsymbol{k}}_{{\boldsymbol{\theta}}_i}^i $ denote discrete convolutional layers; $ h $ is an element-wise nonlinear activation function, which increases the expressive power of the network; and $ Q $ is the number of layers. The convolutional layersFootnote ² are responsible for the discrete operation $ {\boldsymbol{k}}_{\boldsymbol{\theta}}:\left(\boldsymbol{x};\boldsymbol{w},\boldsymbol{b}\right)\mapsto \boldsymbol{w}\ast \boldsymbol{d}+\boldsymbol{b} $ , where $ \boldsymbol{\theta} =\left(\boldsymbol{w},\boldsymbol{b}\right) $ , $ \ast $ is the convolution operation, $ \boldsymbol{x} $ is the data, $ \boldsymbol{w} $ are the trainable weights of the kernel, and $ \boldsymbol{b} $ are the trainable biases. As the kernel operates locally around each pixel, information is leveraged from the surrounding grid cells. For a pedagogical explanation about CNNs the reader is referred to (Magri, Reference Magri2023).

4. Physics-constrained convolutional neural networks (PC-CNN)

We propose a physics-constrained convolutional neural network (PC-CNN) with time-windowing to allow for processing of sequential data. In this section, we describe the loss functions for the tasks of uncovering solutions from biased data and reconstruction from sparse information (Sections 2.1, 2.2, respectively), and the time windowing for computing the time derivatives required in the physics-constraint.

4.1. Losses for training

The mappings $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ and $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ are parameterised as convolutional neural networks with the ansatz of equation (6). Training these networks is an optimization problem, which seeks an optimal set of parameters $ {\boldsymbol{\theta}}^{\ast } $ such that

(7)

$$ {\theta}^{\ast }=\underset{\theta }{\mathrm{argmin}}{\mathcal{L}}_{\theta }\ \mathrm{where}\hskip1em {\mathcal{L}}_{\theta }={\lambda}_D{\mathcal{L}}_{\mathcal{D}}+{\lambda}_P{\mathcal{L}}_{\mathcal{P}}+{\lambda}_C{\mathcal{L}}_C, $$

where the loss function $ {\mathcal{L}}_{\theta } $ consists of: $ {\unicode{x1D543}}_D $ , which quantifies the error between the data and the prediction (“data loss”); $ {\unicode{x1D543}}_P $ , which is responsible for quantifying how unphysical the predictions are (“physics loss”); and $ {\unicode{x1D543}}_C $ , which embeds any further required constraints (“constraint loss”). Each loss is weighted by a non-negative regularization factor, $ {\lambda}_{\left(\cdot \right)} $ , which is an empirical hyperparameter. The definitions of the losses and regularization factors are task dependent; that is, we must design a suitable loss function for each inverse problem we solve.

4.1.1. Losses for uncovering solutions from biased data

Given biased data $ \boldsymbol{\zeta} \left(\varOmega, t\right) $ , we define the loss terms for the task of uncovering solutions as

(8)

$$ {\mathrm{\mathcal{L}}}_D=\frac{1}{N_t}\sum \limits_{i=0}^{N_t-1}\parallel {\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\boldsymbol{\zeta} \left(\partial \varOmega, {t}_i\right)\right)-\boldsymbol{u}\left(\partial \varOmega, {t}_i\right){\parallel}_{\partial \varOmega}^2, $$

(9)

$$ {\mathrm{\mathcal{L}}}_P=\frac{1}{N_t}\sum \limits_{i=0}^{N_t-1}\parallel \mathcal{R}\left({\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\zeta \left(\varOmega, {t}_i\right)\right);\boldsymbol{p}\right){\parallel}_{\varOmega}^2, $$

(10)

$$ {\mathcal{L}}_C=\frac{1}{N_t}{\sum}_{i=0}^{N_t-1}\parallel {\partial}_t\left[\zeta \left(\varOmega, {t}_i\right)-{\eta}_{\theta}\left(\zeta \left(\varOmega, {t}_i\right)\right)\right]{\parallel}_{\varOmega}^2, $$

where $ \partial \varOmega $ denotes boundary points of the grid, and $ \parallel \hskip0.33em \cdot \hskip0.33em {\parallel}_{\varOmega } $ is the $ {\mathrm{\ell}}^2 $ -norm over the discretized domain. We impose the prior knowledge that we have on the dynamical system by defining the physics loss, $ {\unicode{x1D543}}_P $ , which penalises network parameters that yield predictions that violate the governing equations (1) Footnote ³. Mathematically, this means that in the absence of observations (i.e., data), the physics loss $ {\unicode{x1D543}}_P $ alone does not provide a unique solution. By augmenting the physics loss Eqn. (9) with the data loss, $ {\unicode{x1D543}}_D $ (see equations (7), (8)), we ensure that network realizations conform to the observations (data) on the boundaries whilst fulfilling the governing equations, e.g., conservation laws. In the case of this inverse problem, the constraint loss $ {\mathrm{\mathcal{L}}}_{\mathbf{C}} $ Eqn. (10) prevents the bias $ \boldsymbol{\phi} =\boldsymbol{\phi} (x) $ from temporally changing. Penalizing network realizations that do not provide a unique bias helps stabilize training and drive predictions away from trivial solutions, such as the stationary solution $ \boldsymbol{u}\left(\varOmega, t\right)=0 $ . Although we could use collocation points within the domain (as well as on the boundary), we show that our method can uncover the solution of the PDE with samples on the boundary only. Spatial sparsity could be increased by only using a subset of samples on the boundary, but for this task, we use information from all the boundary nodes.

4.1.2. Losses for reconstruction from sparse information

Given low-resolution observations $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ , we define the loss terms for the reconstruction from sparse information as

(11)

$$ {\displaystyle \begin{array}{l}{\mathrm{\mathcal{L}}}_D=\frac{1}{N_t}\sum \limits_{i=0}^{N_t-1}\parallel {\left.{\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,{t}_i\right)\right)\right|}^{\varOmega_L}-\boldsymbol{u}\left({\varOmega}_L,{t}_i\right){\parallel}_{\varOmega_L}^2,\\ {}{\mathrm{\mathcal{L}}}_P=\frac{1}{N_t}\sum \limits_{i=0}^{N_t-1}\parallel \mathcal{R}\left({\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,{t}_i\right)\right);\boldsymbol{p}\right){\parallel}_{\varOmega_H}^2,\end{array}} $$

where $ {\left.{\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\cdot \right)\right|}^{\varOmega_L} $ denotes the corestriction of $ {\varOmega}_H $ on $ {\varOmega}_L $ . In order to find an optimal set of parameters $ {\boldsymbol{\theta}}^{\ast } $ , the loss is designed to regularize predictions that do not conform to the desired output. Given sparse information $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ , the data loss, $ {\unicode{x1D543}}_D $ , is defined to minimise the distance between known sparse data, $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ , and their corresponding predictions on the high-resolution grid, $ {\left.{\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right)\right|}^{\varOmega_L} $ . Crucially, as a consequence of the proposed training objective, we do not need the high-resolution field as a labelled dataset, which is required with conventional super-resolution methods (Dong et al., Reference Dong, Loy, He and Tang2014; Freeman et al., Reference Freeman, Jones and Pasztor2002; Vincent et al., Reference Vincent, Larochelle, Lajoie, Bengio and Manzagol2010).

4.2. Time-windowing of the physics loss

Computing the residual of a partial differential equation is a temporal task, as shown in Eq. (1). We propose a simple time-windowing approach to allow the network to account for the temporal nature of the data. This windowing approach provides a means to compute the time derivative $ {\partial}_t\boldsymbol{u} $ required for the physics loss $ {\unicode{x1D543}}_P $ . The network takes time-windowed samples as inputs, each sample consisting of $ \tau $ sequential time steps. Precisely, we first group all time steps into non-overlapping sets of $ \tau $ sequential time steps to form the set $ \unicode{x1D54B}={\left\{\left[{t}_{i\tau},\dots, {t}_{\left(i+1\right)\tau -1}\right]\right\}}_{i=0}^{N_t\tau } $ . Elements of this set are treated as minibatches and are passed through the network to evaluate their output on which the physics loss is computed. The time derivative $ {\partial}_t\boldsymbol{u} $ is computed by applying a forward-Euler approximation across successive time steps. Figure 2 provides an overview of the time-batching scheme and how elements of each batch are used to compute the temporal derivative.

Figure 2. Time-windowing scheme. Time-steps are first grouped into non-overlapping subsets of successive elements of length $ \tau $ . Each of these subsets can be taken for either training or validation. Subsets are treated as minibatches and passed through the network to evaluate their output. The temporal derivative is then approximated using a forward-Euler approximation across adjacent time steps.

Employing this time-windowing scheme, for the task of reconstructing from sparse information, the physics loss is

(12)

$$ {\mathrm{\mathcal{L}}}_P=\frac{1}{\left(\tau -1\right)\mid \mathcal{T}\mid}\sum \limits_{t\in \mathcal{T}}\sum \limits_{i=0}^{\tau -1}\parallel {\partial}_t{\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,{t}_{i+1}\right)\right)-\mathcal{N}\left({\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,{t}_{i+1}\right)\right);\boldsymbol{p}\right){\parallel}_{\varOmega_H}^2, $$

whereas for the task of uncovering solutions from biased data, the physics loss is

(13)

$$ {\mathrm{\mathcal{L}}}_P=\frac{1}{\left(\tau -1\right){N}_t}\sum \limits_{t\in \mathcal{T}}\sum \limits_{i=0}^{\tau -1}\parallel {\partial}_t{\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\varOmega, {t}_{i+1}\right)\Big)-\mathcal{N}\left({\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\boldsymbol{\zeta} \left(\varOmega, {t}_{i+1}\right)\right);\boldsymbol{p}\right){\parallel}_{\varOmega}^2, $$

where $ \mid \mathcal{T}\mid $ denotes the cardinality of the set, or the number of minibatchesFootnote ⁴. Predicting the high-resolution field for $ \tau $ sequential time steps allows us to obtain the residual for the predictions in a temporally local sense, i.e., we compute the derivatives across discrete time windows rather than the entire simulation domain. The networks $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ and $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ are augmented to operate on these time windows, which vectorizes the operations over the time window. For a fixed quantity of training data, the choice of $ \tau $ introduces a trade-off between the number of input samples $ {N}_t $ , and the size of each time-window $ \tau $ . A larger value of $ \tau $ corresponds to fewer time windows. Limiting the number of time windows used for training has an adverse effect on the ability of the model to generalize; the information content of contiguous timesteps is scarcer than taking timesteps uniformly across the time domain. Although evaluating the residual across large time windows promotes numerical stability, this smooths the gradients computed in backpropagation and makes the model more difficult to train, especially in the case of chaotic systems where the dynamics might vary erratically. Upon conducting a hyperparameter study, we take a value $ \tau =2 $ , which is also the minimum window size for computing the temporal derivative of the physics loss by finite difference. We find that this is sufficient for training the network whilst simultaneously maximizing the number of independent samples used for training. To avoid duplication of the data in the training set, we ensure that all samples are at least $ \tau $ time-steps apart so that independent time windows are guaranteed to not contain any overlaps. The remaining loss terms operate in the same manner, i.e., over the time window as well as the conventional batch dimension.

5. Synthetic data generation

We utilize a differentiable pseudospectral spatial discretization to solve the partial differential equations of this paper (Kelshaw, Reference Kelshaw2022). The solution is computed on the spectral grid $ {\hat{\varOmega}}_k\in {\mathrm{\mathbb{Z}}}^K $ , where $ K $ is the number of wavenumbers. This spectral discretization enforces periodic boundary conditions on the domain boundary, $ \mathrm{\partial \Omega } $ . A solution is produced by time-integration of the dynamical system with the explicit forward-Euler scheme, in which we choose the timestep $ \Delta t $ to satisfy the Courant–Friedrichs–Lewy (CFL) condition (Canuto et al., Reference Canuto, Hussaini, Quarteroni and Zang1988). The initial conditions in each case are generated using the equation

(14)

$$ {\displaystyle \begin{array}{l}\boldsymbol{u}\left({\hat{\varOmega}}_k,0\right)=\frac{\iota {e}^{2\pi i\boldsymbol{\unicode{x025B}}}}{\sigma \sqrt{2\pi }}{e}^{-\frac{1}{2}{\left(\frac{\mid {\hat{\varOmega}}_k\mid }{\sigma}\right)}^2}\\ {}\hskip3em \mathrm{with}\hskip1em {\boldsymbol{\unicode{x025B}}}_i\sim N\left(0,1\right),\end{array}} $$

where $ \iota $ denotes the magnitude; $ \sigma $ denotes the standard deviation; and $ {\boldsymbol{\unicode{x025B}}}_i $ is a sample from a unit normal distribution. Equation (14) produces a pseudo-random field scaled by the magnitude of the wavenumber $ \mid {\hat{\varOmega}}_k\mid $ , where magnitude is computed pointwise for each wavenumber in the spectral grid. This scaling ensures that the resultant field has spatial structures of varying lengthscale. We take $ \iota =10,\sigma =1.2 $ for all simulations in order to provide an initial condition, which provides numerical stability.

As a consequence of the Nyquist–Shannon sampling criterion (Canuto et al., Reference Canuto, Hussaini, Quarteroni and Zang1988), the resolution of the spectral grid $ {\hat{\varOmega}}_k $ places a lower bound on the spatial resolution. For a signal containing a maximal frequency $ {\omega}_{\mathrm{max}} $ , the sampling frequency $ {\omega}_s $ must satisfy the condition $ {\omega}_{\mathrm{max}}<{\omega}_s/2 $ , therefore, we ensure that the spectral resolution satisfies $ K\le N/2 $ . Violation of this condition induces spectral aliasing, in which energy content from frequencies exceeding the Nyquist limit $ {\omega}_s/2 $ is fed back to the low frequencies, which amplifies energy content unphysically (Canuto et al., Reference Canuto, Hussaini, Quarteroni and Zang1988). To prevent aliasing, in the task of uncovering solutions from biased data, we employ a spectral grid $ {\hat{\varOmega}}_k\in {\mathrm{\mathbb{Z}}}^{32\times 32} $ , which corresponds to the physical grid $ \varOmega \in {\mathrm{\mathbb{R}}}^{64\times 64} $ . For the task of reconstructing the solution from sparse data, we employ a high-resolution spectral grid $ {\hat{\varOmega}}_k\in {\mathrm{\mathbb{Z}}}^{35\times 35} $ , which corresponds to the physical grid $ \varOmega \in {\mathrm{\mathbb{R}}}^{70\times 70} $ . Approaching the Nyquist limit allows us to resolve the smallest structures possible without introducing spectral aliasing.

5.1. Physics loss in the Fourier domain

The pseudospectral discretization provides an efficient means to compute the differential operator $ \mathbb{N} $ , which allows us to evaluate the physics loss $ {\unicode{x1D543}}_P $ Eqn. (11) in the Fourier domain. Computationally, we now evaluate the physics loss for the task of uncovering solutions from biased data Eqn. (12) as

(15)

$$ {\displaystyle \begin{array}{l}{\mathrm{\mathcal{L}}}_P=\frac{1}{\left(\tau -1\right)\mid \mathcal{T}\mid}\sum \limits_{t\in \mathcal{T}}\sum \limits_{i=0}^{\tau -1}\parallel {\partial}_t{\hat{\boldsymbol{\eta}}}_{\theta}\left(\boldsymbol{\zeta} \left(\varOmega, {t}_{i+1}\right)\right)-\hat{\mathcal{N}}\left({\hat{\boldsymbol{\eta}}}_{\theta}\left(\boldsymbol{\zeta} \left(\varOmega, {t}_{i+1}\right)\right)\right){\parallel}_{{\hat{\varOmega}}_k}^2,\end{array}} $$

and for the task of reconstruction from sparse data Eqn. (13) as

(16)

$$ {\displaystyle \begin{array}{l}{\mathrm{\mathcal{L}}}_P=\frac{1}{\left(\tau -1\right)\mid \mathcal{T}\mid}\sum \limits_{t\in \mathcal{T}}\sum \limits_{i=0}^{\tau -1}\parallel {\partial}_t{\hat{\boldsymbol{f}}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,{t}_{i+1}\right)\right)-\hat{\mathcal{N}}\left({\hat{\boldsymbol{f}}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,{t}_{i+1}\right)\right)\right){\parallel}_{{\hat{\varOmega}}_{\mathbf{k}}}^2,\end{array}} $$

where $ \hat{\cdot}=\mathcal{F}\left\{\cdot \right\} $ is the Fourier-transformed variable, and $ \hat{\mathcal{N}} $ denotes the Fourier-transformed differential operator. The pseudospectral discretization is fully differentiable, which allows us to numerically compute gradients with respect to the parameters of the network $ \boldsymbol{\theta} $ . Computing the loss $ {\mathrm{\mathcal{L}}}_P $ in the Fourier domain provides two advantages: (i) periodic boundary conditions are naturally enforced, which enforces the prior knowledge in the loss calculations; and (ii) gradient calculations yield spectral accuracy, as shown in equations (15, 16). In contrast, a conventional finite differences approach requires a computational stencil, the spatial extent of which places an error bound on the gradient computation. This error bound is a function of the spatial resolution of the field.

6. Results on uncovering solutions from biased data

We first introduce the mathematical parameterization of the bias (i.e., systematic error). Second, we discuss numerical details and performance metrics used for the analysis. Third, we showcase results for three partial differential equations of increasing complexity: the linear convection-diffusion, nonlinear convection-diffusion (Burgers’ equation (Bateman, Reference Bateman1915; Burgers, Reference Burgers1948)), and two-dimensional turbulent Navier–Stokes equations (the Kolmogorov flow (Fylladitakis, Reference Fylladitakis2018)). The Navier–Stokes, which is a spatiotemporal chaotic system, will be used for the task of reconstruction from sparse information in Section 7. Finally, we analyze the physical consistency of physics-constrained convolutional neural network (PC-CNN) predictions for the two-dimensional turbulent flow.

6.1. Parameterization of the bias

Errors in predictions can broadly fall into two categories: (i) aleatoric errors, which are typically referred to as noise due to environmental and sensors’ stochastic behavior; and (ii) epistemic uncertainties, which are typically referred to as bias. We parameterize the bias with a non-convex spatially varying function to model an epistemic error. We employ a modified Rastrigin parameterization (Rastrigin, Reference Rastrigin1974), which is commonly used in non-convex optimization benchmarks

(17)

$$ \phi \left(x;\mathcal{M},{k}_{\phi },{u}_{\mathrm{max}}\right)=\frac{\mathcal{M}{u}_{\mathrm{max}}}{2{\pi}^2+40}\left(20+{\sum}_{i=1}^2\left[{\left({x}_i-\pi \right)}^2-10\mathit{\cos}\left({k}_{\phi}\left({x}_i-\pi \right)\right)\right]\right) $$

where $ {u}_{\mathrm{max}} $ is the maximum absolute velocity in the flow; $ \mathcal{M} $ is the relative magnitude of the bias; $ {k}_{\phi } $ is the wavenumber of the bias, and $ {\boldsymbol{x}}_1 $ and $ {\boldsymbol{x}}_2 $ are the streamwise and transversal coordinates respectively. Parameterising the bias with $ \mathcal{M} $ and $ {k}_{\phi } $ allows us to evaluate the performance of the methodology with respect to different degrees of non-convexity of the bias. Aleatoric noise removal is a fairly established field and is not the focus of this task.

6.2. Numerical details, data, and performance metrics

We uncover the solutions of partial differential equations from biased data for three physical systems of increasing complexity. Each system is solved via the pseudospectral discretisation as described in Section 5, producing a solution by time-integration using the Euler-forward method with a timestep of $ \Delta t=5\times {10}^{-3} $ , which is chosen to satisfy of the Courant–Friedrichs–Lewy (CFL) condition to promote numerical stability. Model training is performed with 1024 training samples and 256 validation samples, which are pseudo-randomly selected from the time domain of the solution. Each sample contains $ \tau =2 $ sequential time steps. Following a hyperparameter study, the Adam optimizer is employed (Kingma & Ba, Reference Kingma, Ba, Bengio and LeCun2015) with a learning rate of $ 3\times {10}^{-4} $ . The weighting factors for the loss, as described in equation 7, are taken as $ {\lambda}_P={\lambda}_C={10}^3 $ , which are empirically determined (through a hyperparameter search) to provide stable training. Details on the network architecture can be found in A.1. Each model is trained for a total of $ {10}^4 $ epochs, which is chosen to provide sufficient convergence. The accuracy of prediction is quantified by the relative error on the validation dataset

(18)

$$ e=\sqrt{\frac{\sum_{t\in \mathcal{T}}\parallel \boldsymbol{u}\left(\varOmega, t\right)-{\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\boldsymbol{\zeta} \left(\varOmega, t\right)\right){\parallel}_{\varOmega}^2}{\sum_{t\in \mathcal{T}}\parallel \boldsymbol{u}\left(\varOmega, t\right){\parallel}_{\varOmega}^2}}. $$

This metric takes the magnitude of the solution into account, which allows the results from different systems to be compared. All experiments in this paper are run on a single Quadro RTX 8000 GPU. Numerical tests are run with different random initializations (seeds) to assess the robustness of the network.

6.3. The linear convection-diffusion equation

The linear convection-diffusion equation is used to describe transport phenomena (Majda and Kramer, Reference Majda and Kramer1999).

(19)

$$ {\partial}_t\boldsymbol{u}+\boldsymbol{c}\cdot \nabla \boldsymbol{u}=\nu \Delta \boldsymbol{u}, $$

where $ \nabla $ is the nabla operator; $ \Delta $ is the Laplacian operator; $ \boldsymbol{c}\equiv \left(c,c\right) $ , where $ c $ is the convective coefficient; and $ \nu $ is the diffusion coefficient. The flow energy is subject to rapid decay because the solution quickly converges towards a fixed-point solution at $ \boldsymbol{u}\left(\varOmega, t\right)=0 $ , notably in the case where $ \nu /\boldsymbol{c} $ is large. In the presence of the fixed-point solution, we observe $ \boldsymbol{\zeta} \left(\varOmega, t\right)=\boldsymbol{\phi} \left(\varOmega \right) $ , which is a trivial case for identification and removal of the bias. In order to avoid rapid convergence to the fixed-point solution, we take $ c=1.0,\nu =1/500 $ as the coefficients, producing a convective-dominant solution. A snapshot of the results for $ {k}_{\phi }=3,\mathcal{M}=0.5 $ is shown in Figure 3(a). There is a marked difference between the biased data $ \zeta \left(\varOmega, t\right) $ in panel (i) and the true state $ \boldsymbol{u}\left(\varOmega, t\right) $ in panel (ii), most notably in the magnitude of the field. Network predictions $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\zeta \left(\varOmega, t\right)\right) $ in panel (iii) uncover the solution to the partial differential equation with a relative error of $ e=6.612\times {10}^{-2} $ on the validation set.

Figure 3. Uncovering solutions from biased data. (a) Linear convection-diffusion case $ {k}_{\phi }=3 $ and $ \mathcal{M}=0.5 $ . (b) Nonlinear convection-diffusion case with $ {k}_{\phi }=5 $ and $ \mathcal{M}=0.5 $ . (c) Two-dimensional turbulent flow case with $ {k}_{\phi }=7 $ and $ \mathcal{M}=0.5 $ . Panel (i) shows the biased data, $ \boldsymbol{\zeta} $ ; (ii) shows the true state, $ \boldsymbol{u} $ , which we wish to uncover from the biased data; (iii) shows the bias, which represents the bias (i.e., systematic error), $ \boldsymbol{\phi} $ ; (iv) shows the network predictions, $ {\boldsymbol{\eta}}_{\theta } $ ; and (v) shows the predicted bias, $ \boldsymbol{\zeta} $ - $ {\boldsymbol{\eta}}_{\theta } $ .

6.4. Nonlinear convection-diffusion equation

The nonlinear convection-diffusion equation, also known as Burgers’ equation (Burgers, Reference Burgers1948), is

(20)

$$ {\partial}_t\boldsymbol{u}+\left(\boldsymbol{u}\cdot \nabla \right)\boldsymbol{u}=\nu \Delta \boldsymbol{u}, $$

where the nonlinearity lies in the convective term $ \left(\boldsymbol{u}\cdot \nabla \right)\boldsymbol{u} $ . The nonlinear convective term provides a further challenge: below a certain threshold, the velocity interactions lead to further energy decay. The kinematic viscosity is set to $ \nu =1/500 $ , which produces a convective-dominant solution by time integration. In contrast to the linear convection-diffusion system (Section 6.3), the relationship between the dynamics of the biased state and the true state is more challenging. The introduction of nonlinearities in the differential operator revokes the linear relationship between the bias and data, i.e., $ \mathcal{R}\left(\boldsymbol{\zeta} \left(\boldsymbol{x},t\right);\boldsymbol{p}\right)\ne \mathcal{R}\left(\boldsymbol{\phi} \left(\boldsymbol{x},t\right);\boldsymbol{p}\right) $ as discussed in Section 2.1. Consequently, this increases the complexity of the physics loss term $ {\unicode{x1D543}}_P $ .

Figure 3(b) shows a snapshot of results for $ {k}_{\phi }=5,\mathcal{M}=0.5 $ . The true state $ \boldsymbol{u}\left(\varOmega, t\right) $ of the partial differential equation contains high-frequency spatial structures, shown in panel (ii), which are less prominent in the biased data $ \boldsymbol{\zeta} \left(\varOmega, t\right) $ , shown in panel (i). Despite the introduction of a nonlinear differential operator, we demonstrate that the network retains the ability to recover the true state, as shown in panels (ii, iv), with a relative error on the validation set of $ e=6.791\times {10}^{-2} $ .

6.4.1. Two-dimensional turbulent flow

Finally, we consider a spatiotemporally chaotic flow, which is governed by the incompressible Navier–Stokes equations evaluated on a two-dimensional domain with periodic boundary conditions, which is also known as the Kolmogorov flow (Fylladitakis, Reference Fylladitakis2018). The partial differential equations are expressions of the mass and momentum conservation, respectively

(21)

$$ {\displaystyle \begin{array}{c}\nabla \cdot \boldsymbol{u}=0,\\ {}{\partial}_t\boldsymbol{u}+\left(\boldsymbol{u}\cdot \nabla \right)\boldsymbol{u}=-\nabla p+\nu \Delta \boldsymbol{u}+\boldsymbol{g}.\end{array}} $$

where $ p $ is the pressure field; and $ \boldsymbol{g} $ is a body forcing, which enables the dynamics to be sustained by ensuring that the flow energy does not dissipate. The flow density is constant and assumed to be unity, i.e., the flow is incompressible. The use of a pseudospectral discretization allows us to eliminate the pressure term and handle the continuity constraint implicitly, as shown in the spectral representation of the Navier–Stokes equations in B. The spatiotemporal dynamics are chaotic at the prescribed viscosity $ \nu =1/42 $ . The periodic forcing is $ \boldsymbol{g}\left(\boldsymbol{x}\right)=\left(\sin \left(4{\boldsymbol{x}}_{\mathbf{2}}\right),0\right) $ . In order to remove the transient and focus on a statistically stationary regime, the transient time series up to $ {T}_t=180.0 $ is discarded.

A snapshot for the two-dimensional turbulent flow case is shown in Figure 3(c) for $ {k}_{\phi }=7,\mathcal{M}=0.5 $ . The bias field $ \zeta \left(\varOmega, t\right) $ as shown in panel (i) contains prominent, high-frequency spatial structures not present in the true state $ \boldsymbol{u}\left(\varOmega, t\right) $ shown in panel (ii). The biased field bears little resemblance to the true solution. In spite of the chaotic dynamics of the system, we demonstrate that network predictions $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\boldsymbol{\zeta} \left(\varOmega, t\right)\right) $ shown in panel (iv) successfully uncover the solution with a relative error on the validation set of $ e=2.044\times {10}^{-2} $ .

6.5. Robustness

We analyse the robustness of the methodology by varying the wavenumber $ {k}_{\phi } $ and magnitude $ \mathcal{M} $ of the bias parameterization (17). Varying these parameters allows us to assess the degree to which the true state of a partial differential equation can be recovered when subjected to spatially varying bias with different degrees of non-convexity. To this end, we perform two parametric studies for each partial differential equation: (i) $ \mathcal{M}=0.5 $ , $ {k}_{\phi}\in \left\{\mathrm{1,3,5,7,9}\right\} $ ; and (ii) $ {k}_{\phi }=3 $ , $ \mathcal{M}\in \left\{\mathrm{0.01,0.1,0.25,0.5,1.0}\right\} $ . For each case, we compute the relative error, $ e $ , between the predicted solution $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\boldsymbol{\zeta} \left(\varOmega, t\right)\right) $ and the true solution $ \boldsymbol{u}\left(\varOmega, t\right) $ , as defined in Eq. (18). We show the mean relative $ {\mathrm{\ell}}^2 $ -error for an ensemble of five computations for each test to ensure that the results are representative of the true performance. Empirically, we find that performance is robust to pseudo-random initialization of network parameters with standard deviation of $ \mathcal{O}\left({10}^{-4}\right) $ (result not shown). We employ the same network setups described in Section 6.2 with the same parameters for training. Assessing results using the same parameters for the three partial differential equations allows us to draw conclusions on the robustness of the methodology.

First, in the linear convection–diffusion problem as shown in Figure 4(a), we demonstrate that the relative error is largely independent of the form of parameterized bias. The relative magnitude $ \mathcal{M} $ and Rastrigin wavenumber $ {k}_{\phi } $ have a small impact on the ability of the model to uncover the solution to the partial differential equation, with the performance being consistent across all cases. The model performs best for the case $ {k}_{\varphi }=7,\mathcal{M}=0.5 $ , achieving a relative error of $ e=2.568\times {10}^{-1} $ . Second, results for the nonlinear convection-diffusion case show marked improvement in comparison with the linear case, with errors for the numerical study shown in Figure 4(b). Despite the introduction of a nonlinear operator, the relative error is consistently lower regardless of the parameterization of bias. The network remains equally capable of uncovering the true state across a wide range of modalities and magnitudes. Although the physics loss from the network predictions $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\boldsymbol{\zeta} \left(\varOmega, t\right)\right) $ no longer scales in a linear fashion, this is beneficial for the training dynamics. This is because the second-order nonlinearity promotes convexity in the loss-surface, which is then exploited by our gradient-based optimization approach. Third, the PC-CNN is able to uncover the true state of the turbulent flow. Results shown in Figure 4(c) demonstrate an improvement in performance from the standard nonlinear convection-diffusion case. For both fixed Rastrigin wavenumber and magnitude, increasing the value of the parameter tends to decrease the relative error. The relative error in Figure 4(c) decreases as the non-convexity of the bias increases. This is because it becomes increasingly simple to distinguish the bias from the underlying flow field if there is a larger discrepancy. We also emphasize that these tests were run with a fixed computational budget to ensure fair comparison—with additional training, we observe a further decrease in relative error across all experiments. (We also ran tests with biases parameterized with multiple wavenumbers, an example of which is shown in Appendix C).

Figure 4. Unconvering solutions from biased data: robustness analysis through relative error, $ e $ . (a) Linear convection-diffusion case. (b) Nonlinear convection-diffusion case. (c) Two-dimensional turbulent flow case. Orange-bars denote results for case $ (i) $ : fixing the magnitude and varying the Rastrigin wavenumber. Blue-bars denote results for case $ (ii) $ : fixing the Rastrigin wavenumber and varying the magnitude.

6.6. Physical consistency of the uncovered Navier–Stokes solutions

Results in Section 6.5 show that, for a generic training setup, we are able to achieve small relative error for a variety of degrees of non-convexity. Because the two-dimensional turbulent flow case is chaotic, infinitesimal perturbations $ \sim O\left(\unicode{x025B} \right) $ exponentially grow in time, therefore, the residual $ \mathcal{R}\left(\boldsymbol{u}\left(\varOmega, t\right)+\unicode{x025B} \left(\varOmega, t\right);\boldsymbol{p}\right)\gg \mathcal{R}\left(\boldsymbol{u}\left(\varOmega, t\right);\boldsymbol{p}\right) $ where $ \unicode{x025B} $ is a perturbation parameter. In this section, we analyze the physical properties of the solutions of the Navier–Stokes equation (Eq. 21) uncovered by the PC-CNN, $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ . First, we show snapshots of the time evolution of the two-dimensional turbulent flow in Figure 5. These confirm that the model is learning a physical solution in time, as shown in the error on a log scale in panel (iv). The parameterization of the bias is fixed in this case with $ {k}_{\varphi }=7,\mathcal{M}=0.5 $ .

Figure 5. Uncovering solution from biased data. Temporal evolution of the two-dimensional turbulent flow with $ {k}_{\varphi }=7,\mathcal{M}=0.5 $ . $ {T}_t $ denotes the length of the transient. (i) Biased data, $ \boldsymbol{\zeta} $ ; (ii) true state, $ \boldsymbol{u} $ ; (iii) predicted solution, $ {\boldsymbol{\eta}}_{\theta } $ ; and (iv) squared error, $ {\left\Vert {\boldsymbol{\eta}}_{\theta }-\boldsymbol{u}\right\Vert}^2 $ .

Second, we analyze the statistics of the solution. The mean kinetic energy of the flow at each timestep is directly affected by the introduction of the bias. In the case of our strictly positive bias $ \boldsymbol{\phi} $ (Eq. (2)), the mean kinetic energy is increased at every point. Results in Figure 6(a) show the kinetic energy time series across the time domain for: the true state $ \boldsymbol{u} $ ; the biased data $ \boldsymbol{\zeta} $ ; and the network’s predictions $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ . The chaotic nature of the solution can be observed from the erratic (but not random) evolution of the energy. We observe that the network predictions accurately reconstruct the kinetic energy trace across the simulation domain.

Figure 6. Uncovering solution from biased data: analysis of the solutions in the turbulent flow with $ {k}_{\varphi }=7,\mathcal{M}=0.5 $ . (a) Kinetic energy for the two-dimensional turbulent flow. (b) Energy spectrum for the two-dimensional turbulent flow. $ {T}_t $ denotes the length of the transient.

Third, we analyze the energy spectrum, which is characteristic of turbulent flows, in which the energy content decreases with the wavenumber. Introducing the bias at a particular wavenumber artificially alters the energy spectrum due to increased energy content. In Figure 6(b), we show the energy spectrum for the two-dimensional turbulent flow, where the unphysical increase in energy content is visible for $ \mid \boldsymbol{k}\mid \ge 7 $ . Model predictions $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}}\left(\boldsymbol{\zeta} \left(\varOmega, t\right)\right) $ correct the energy content for $ \mid \boldsymbol{k}\mid <21 $ , and successfully characterize and reproduce scales of turbulence. Although the energy content exponentially decreases with the spatial frequency, the true solution is correctly uncovered up to large wavenumbers, i.e., when aliasing occurs ( $ \mid \boldsymbol{k} \mid \gtrsim\ 29 $ ).

7. Results on reconstruction from sparse information

First, we discuss the generation of the sparse data. Next, we show the ability of the PC-CNN to infer the high-resolution solution of the partial differential equation on points that are not present in the training set. A high-resolution solution of the partial differential equation is generated on the grid $ {\varOmega}_H $ prior to extracting a low-resolution grid $ {\varOmega}_L $ with the scale factor of $ \kappa =N/M $ (Section 2.2). Both the solver and physics loss are discretized with $ K=N2 $ wavenumbers in the Fourier domain, which complies with the Nyquist–Shannon sampling criterion. The downsampling by $ \kappa $ is performed by extracting the value of the solution at spatial locations $ {\varOmega}_L\cap {\varOmega}_H $ , that is a low-resolution representation of the high-resolution solution

(22)

$$ \boldsymbol{u}\left({\varOmega}_L,t\right)\triangleq {\left.\boldsymbol{u}\left({\varOmega}_H,t\right)\right|}^{\varOmega_L}, $$

where $ \mid {}^{\varOmega_L} $ is the corestriction which extracts points only in the relevant domain. While the data is sparse in space, a minimal temporal resolution is required for the computation of the time derivatives.

7.1. Comparison with standard upsampling

We show the results for a scale factor $ \kappa =7 $ , with results for further scale factors shown in D. Results are compared with interpolating upsampling methods, i.e., bi-linear, and bi-cubic interpolation to demonstrate the ability of the method. We quantify the accuracy by computing the relative error between the true solution, $ \boldsymbol{u}\left({\varOmega}_H,t\right) $ , and the corresponding network realization, $ {\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $

(23)

$$ e=\sqrt{\frac{\sum_{t\in \mathcal{T}}\parallel \boldsymbol{u}\left({\varOmega}_{\boldsymbol{H}},t\right)-{\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_{\mathbf{L}},t\right)\right){\parallel}_{\varOmega_{\boldsymbol{H}}}^2}{\sum_{t\in \mathcal{T}}\parallel \boldsymbol{u}\left({\varOmega}_{\boldsymbol{H}},t\right){\parallel}_{\varOmega_{\boldsymbol{H}}}^2}}. $$

Upon discarding the transient, a solution $ \boldsymbol{u}\left({\varOmega}_H,t\right)\in {\mathrm{\mathbb{R}}}^{70\times 70} $ is generated by time integration over $ 12\times {10}^3 $ time steps, with $ \Delta t=1\times {10}^{-3} $ . We extract the low-resolution solution $ \boldsymbol{u}\left({\varOmega}_L,t\right)\in {\mathrm{\mathbb{R}}}^{10\times 10} $ . We extract $ 2048 $ samples at random from the time domain of the solution, each sample consisting of $ \tau =2 $ consecutive time steps. The Adam optimizer (Kingma and Ba, Reference Kingma, Ba, Bengio and LeCun2015) is employed for training with a learning rate of $ 3\times {10}^{-4} $ . We take $ {\lambda}_P={10}^3 $ as the regularisation factor for the loss, as shown in Eq. (7), and train for a total of $ {10}^3 $ epochs, which is empirically determined to provide sufficient convergence, as per the results of a hyperparameter study (result not shown).

Figure 7 shows a snapshot of results for the streamwise component of the velocity field, comparing network realizations $ {\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $ with the interpolated alternatives. Bi-linear and bi-cubic interpolation are denoted by $ BL\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right), BC\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $ respectively. We observe that network realizations of the high-resolution solution yield qualitatively more accurate results as compared with the interpolation. Artifacts indicative of the interpolation scheme used are present in both of the interpolated fields, whereas the network realization captures the structures present in the high-resolution field correctly. Across the entire solution domain the model $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ achieves a relative error of $ e=6.972\times {10}^{-2} $ compared with $ e=2.091\times {10}^{-1} $ for bi-linear interpolation and $ e=1.717\times {10}^{-1} $ for bi-cubic interpolation.

Figure 7. Reconstruction from sparse information: physics-constrained convolutional neural network (PC-CNN) compared with traditional interpolation methods to reconstruct a solution from a sparse grid $ {\varOmega}_L\in {\mathrm{\mathbb{R}}}^{10\times 10} $ (100 points) to a high-resolution grid $ {\varOmega}_H\in {\mathrm{\mathbb{R}}}^{70\times 70} $ (4900 points). Panel (i) shows the low-resolution input, $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ ; (ii) bi-linear interpolation, $ BL\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $ ; (iii) bi-cubic interpolation, $ BC\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $ ; (iv) true high-resolution field, $ \boldsymbol{u}\left({\varOmega}_H,t\right) $ ; (v) model prediction of the high-resolution field, $ {\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $ ; and (vi) energy spectra for each of the predictions. Vertical lines $ {f}^n $ denote the Nyquist frequencies for spectral grid, $ {f}_{{\hat{\varOmega}}_{\boldsymbol{k}}}^n $ , and for the low-resolution grid, $ {f}_{\varOmega_{\boldsymbol{L}}}^n $ .

Although the relative error provides a notion of predictive accuracy, it is crucial to assess the physical characteristics of the reconstructed field (Pope, Reference Pope and Pope2000). The energy spectrum, which is characteristic of turbulent flows, represents a multi-scale phenomenon where energy content decreases with the wavenumber. From the energy spectrum of the network’s prediction, $ {\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $ , we gain physical insight into the multi-scale nature of the solution. Results in Figure 7 show that the energy content of the low-resolution field diverges from that of the high-resolution field, which is a consequence of spectral aliasing. Network realizations $ {\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $ are capable of capturing finer scales of turbulence compared to both interpolation approaches, prior to diverging from the true spectrum as $ \mid \boldsymbol{k}\mid =18 $ . These results show the efficacy of the method, recovering in excess of $ 99\% $ of the system’s total energy. The physics loss, $ {\mathrm{\mathcal{L}}}_P $ , enables the network to act beyond simple interpolation. The network is capable of de-aliasing, thereby inferring unresolved physics. Parametric studies show similar results across a range of scale factors $ \kappa $ (shown in Appendix D).

8. Conclusions

We propose a physics-constrained convolutional neural network (PC-CNN) to solve inverse problems that are relevant to spatiotemporal partial differential equations (PDEs). First, we introduce the physics-constrained convolutional neural network (PC-CNN), which provides the means to compute the physical residual, i.e., the areas of the field in which the physical laws are violated by the prediction provided by the convolutional neural network (CNN). We formulate an optimization problem by leveraging prior knowledge of the underlying physical system to regularise the predictions from the PC-CNN. This is augmented by a simple time-windowing approach for the computation of temporal derivative. Second, we apply the PC-CNN to solve an inverse problem in which we are given data that is contaminated by a spatially varying systematic error (i.e., bias, or epistemic uncertainty). The task is to uncover from the biased data the true state, which is the solution of the PDE. We numerically test the method to remove biases from data generated by three partial differential equations of increasing complexity (linear and nonlinear diffusion-convection, and Navier–Stokes). The PC-CNN successfully infers the true state, and the bias as a by-product, for a large range of magnitudes and degrees of non-convexity. This demonstrates that the method is accurate and robust. Third, we apply the PC-CNN to solve an inverse problem in which we are given sparse information on the solution of a PDE. The task is to reconstruct the solution in space with high resolution but without using the full high-resolution data in the training. We reconstruct the spatiotemporal chaotic solution of the Navier–Stokes equations on a high-resolution grid from a sparse grid containing only 1% of the information. We demonstrate that the proposed PC-CNN provides accurate physical results, both qualitatively and quantitatively, as compared to traditional interpolation methods, such as bi-cubic interpolation. For both inverse tasks, we investigate the physical consistency of the inferred solutions for the two-dimensional turbulent flow. We find that the PC-CNN predictions provide correct physical properties of the underlying partial differential equation (Navier–Stokes), such as the energy spectrum. This work opens opportunities for solving inverse problems with nonlinear partial differential equations. Future work will be focused on experimental data.

Acknowledgements

The authors are grateful to G. Rigas for insightful discussions on turbulence.

Data availability statement

Codes and data are accessible via GitHub: https://github.com/MagriLab/pisr; https://github.com/MagriLab/picr; and https://github.com/MagriLab/kolsol.

Author contribution

Conceptualization: D.K. & L.M.; Funding acquisition: L.M.; Investigation: D.K. & L.M.; Methodology: D.K. & L.M.; Project administration: L.M.; Software: D.K.; Supervision: L.M.; Validation: D.K. & L.M.; Visualization: D.K.; Writing-original draft: D.K. & L.M.; writing—review and editing: D.K. & L.M.

Funding statement

D. Kelshaw. & L. Magri. acknowledge support from the UK Engineering & Physical Sciences Research Council, and ERC Starting Grant PhyCo 949,388. L.M. acknowledges support from the grant EU-PNRR YoungResearcher TWIN ERC-PI_0000005.

Competing interest

None.

Ethical standard

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

Appendix

A. Architectural details

Here, we provide details for the network architectures used to parameterize the mappings employed for each of the inverse problems.

A.1. Uncovering solutions from biased data

For the bias removal task, we use a UNet architecture to extract relevant features from the biased field. The architecture, as well as the general training loop, can be seen in Figure 8. This particular network configuration was found empirically. The input and output layers each feature two channels, corresponding to the two velocity components of the flow being studied. The network does not overfit, also thanks to the physics-based regularisation term in the loss function.

Figure 8. Uncovering solutions from biased data. The model $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ is responsible for mapping the biased state $ \boldsymbol{\zeta} \left(\varOmega, t\right) $ to the true solution $ \boldsymbol{u}\left(\varOmega, t\right) $ . Convolutional layers are parameterized as torch.Conv2D(c_{in}, c_{out}, k), where c_{in}, c_{out} denote the number of input and output channels, respectively; and k is the spatial extent of the filter. The term $ h $ represents the activation, tanh in this case. The terms $ {\unicode{x1D543}}_D,{\unicode{x1D543}}_P,{\unicode{x1D543}}_C $ denote the data loss, physics loss, and constraint loss respectively. The combination of these losses forms the objective loss $ {\mathcal{L}}_{\theta } $ , which is used to update the network’s parameters. The term $ \tau $ denotes the number of contiguous time-steps passed to the network, required for computing temporal derivatives. We provide an in-depth explanation of these losses in Section 4.1.

A.2. Reconstruction from sparse information

For the reconstruction task, a three-layer convolutional neural network is used for the model. Figure 9 provides an overview of the architecture, as well as the general training loop. The baseline network configuration is similar to that of Dong et al. (Reference Dong, Loy, He and Tang2014), with the additional constraints imposed through the physics-based loss function. The input and output layers each feature two channels, corresponding to the two velocity components of the flow being studied.

Figure 9. Reconstruction from sparse information. The model $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ is responsible for mapping the low-resolution field $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ to the high-resolution field $ \boldsymbol{u}\left({\varOmega}_H,t\right) $ . The upsampling layer (blue) performs bi-cubic upsampling to obtain the correct spatial dimensions. Convolutional layers are parameterized as torch.Conv2D(c_{in}, c_{out}, k), where c_{in}, c_{out} denote the number of input and output channels, respectively; and k is the spatial extent of the filter. The term $ h $ represents the activation, tanh in this case. The terms $ {\unicode{x1D543}}_D,{\unicode{x1D543}}_P $ denote the data loss and physics loss, respectively, the combination of which forms the objective loss $ {\mathcal{L}}_{\theta } $ , which is used to update the network’s parameters, $ \boldsymbol{\theta} $ . The term $ \tau $ denotes the number of contiguous time-steps passed to the network, required for computing temporal derivatives. We provide an in-depth explanation of these losses in Section 4.1.

B. Pseudo-spectral discretization of Navier–Stokes residual

Operating in the Fourier domain eliminates the diverge-free constraint (Canuto et al., Reference Canuto, Hussaini, Quarteroni and Zang1988). The equations for the spectral representation of the Kolmogorov flow are.

$ \mathcal{R}\left({\hat{\boldsymbol{u}}}_k;\boldsymbol{p}\right)=\left(\frac{d}{dt}+\nu {\left|\boldsymbol{k}\right|}^2\right){\hat{\boldsymbol{u}}}_k-{\hat{\boldsymbol{f}}}_k+\boldsymbol{k}\frac{\boldsymbol{k}\cdot {\hat{\boldsymbol{f}}}_k}{{\left|\boldsymbol{k}\right|}^2}-{\hat{\boldsymbol{g}}}_k, $ (B.1)

with $ {\hat{\boldsymbol{f}}}_k=-{\left(\hat{\boldsymbol{u}\cdot \nabla \boldsymbol{u}}\right)}_k $ , where nonlinear terms are handled pseudospectrally, employing the 23 de-aliasing rule to avoid the unphysical culmination of energy at high frequencies (Canuto et al., Reference Canuto, Hussaini, Quarteroni and Zang1988). A solution is produced by time-integration of the dynamical system with the explicit forward-Euler scheme, choosing a time-step $ \Delta t $ that satisfies the Courant–Friedrichs–Lewy (CFL) condition. Initial conditions are generated by producing a random field scaled by the wavenumber, which retains the spatial structures of varying lengthscale in the physical domain (Ruan and McLaughlin, Reference Ruan and McLaughlin1998).

C. Multi-wave number bias removal

Further to the parameterization outlined in Section 6.1, we consider stationary biases that contain multiple wave numbers. We define this composite bias as the mean of two bias fields, with $ {k}_{\phi}\in \left\{3,7\right\},\mathcal{M}=0.5 $ . Bias removal was performed for the Kolmogorov flow case, following the same procedure as outlined in Section 2.1. Results for the bias removal achieved a relative error of $ 5.83\times {10}^{-2} $ , with results for multiple time-steps shown in Figure 10. The structure of the predicted bias captures the extrema of the bias, however, there is a small discrepancy in the magnitude as the field should be symmetric along both axes. This further demonstrates the robustness of the proposed approach.

Figure 10. Removal of a stationary bias field containing multiple wave numbers. The first three columns show the biased, original, and predicted flow fields respectively. The rightmost column depicts the predicted additive bias. Results are shown for multiple time-steps in the simulation trajectory.

D. Parametric study on the scale factor

Experiments for reconstruction from sparse information were also run for $ \kappa \in \left\{\mathrm{7,9,11,13,15}\right\} $ using the same experimental setup; ultimately, demonstrating reconstruction results for a range of scale factors. Results for each of these scale factors are shown in Figure 11. We observe that the relative $ {\mathrm{\ell}}^2 $ -error increases with increasing $ \kappa $ ; this is to be expected as the available information becomes more sparse.

Figure 11. Reconstruction from sparse information. Performance for the reconstruction task is shown for a range of scale factors $ \kappa $ , measuring performance with the relative $ {\mathrm{\ell}}^2 $ -errors, $ e $ , between the predicted high-resolution fields $ {\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_{\mathbf{L}},t\right)\right) $ , and the true high-resolution fields $ \boldsymbol{u}\left({\varOmega}_{\boldsymbol{H}},t\right) $ .

Footnotes

¹ With a slight abuse of notation, $ \boldsymbol{\theta} $ denotes the trainable parameters for Eqs. (4, 5).

² CNNs work on structured grids. To deal with unstructured grids and complex domains, the data can be interpolated onto a structured grid and the boundary can be padded. The extension to unstructured grids and intricate domain is beyond the scope of this paper, which is left for future work.

³ For example, in turbulence, the fluid mass and momentum must be in balance with mass sources, and forces, respectively. Thus, equation (1) represents conservation laws.

⁴ Note, the updated versions of the physics losses fundamentally represent the same objective, but differ slightly in their definition.

References

Bateman, H (1915) Some recent researches on the motion of fluids. Monthly Weather Review 43(4), 163–170.2.0.CO;2>CrossRef Google Scholar

Brunton, SL, Proctor, JL and Kutz, JN (2016) Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113(15), 3932–3937.CrossRef Google Scholar PubMed

Burgers, J (1948) A mathematical model illustrating the theory of turbulence. Advances in Applied Mechanics 1, 171–199.CrossRef Google Scholar

Canuto, C, Hussaini, MY, Quarteroni, A and Zang, TA (1988) Spectral Methods in Fluid Dynamics. Berlin/Heidelberg: Springer.CrossRef Google Scholar

Doan, NAK, Polifke, W and Magri, L (2021) Short-and long-term predictions of chaotic flows and extreme events: a physics-constrained reservoir computing approach. Proceedings of the Royal Society A 477(2253), 20210135.CrossRef Google Scholar PubMed

Dong, C, Loy, CC, He, K and Tang, X (2014) Learning a deep convolutional network for image super-resolution. European Conference on Computer Vision 8692, 184–199.Google Scholar

Edalatifar, M, Tavakoli, MB, Ghalambaz, M and Setoudeh, F (2021) Using deep learning to learn physics of conduction heat transfer. Journal of Thermal Analysis and Calorimetry 146(3), 1435–1452.CrossRef Google Scholar

Eivazi, H and Vinuesa, R (2022). Physics-informed deep-learning applications to experimental fluid mechanics. Measurement Science and Technology 35, 075303–075323CrossRef Google Scholar

Freeman, WT, Jones, TR and Pasztor, EC (2002) Example-based super-resolution. IEEE Computer Graphics and Applications 22(2), 56–65.CrossRef Google Scholar

Fylladitakis, ED (2018) Kolmogorov flow: seven decades of history. Journal of Applied Mathematics and Physics 6(11), 2227–2263.CrossRef Google Scholar

Gao, H, Sun, L and Wang, J-X (2021) Super-resolution and denoising of fluid flow using physics-informed convolutional neural networks without high-resolution labels. Physics of Fluids 33(7), 073603.CrossRef Google Scholar

Grossmann, TG, Komorowska, UJ, Latz, J and Schönlieb, C-B (2023) Can physics-informed neural networks beat the finite element method? arXiv preprint arXiv:2302.04107.Google Scholar

Hornik, K, Stinchcombe, M and White, H (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366.CrossRef Google Scholar

Kelshaw, D (2022) KolSol. GitHub. Available at https://github.com/magrilab/kolsol.Google Scholar

Kelshaw, D and Magri, L (2023a) Super-resolving sparse observations in partial differential equations: a physics-constrained convolutional neural network approach. Available at https://arxiv.org/abs/2306.10990.CrossRef Google Scholar

Kelshaw, D and Magri, L (2023b) Uncovering solutions from data corrupted by systematic errors: a physics-constrained convolutional neural network approach. Available at https://arxiv.org/abs/2306.04600.Google Scholar

Kim, J and Lee, C (2020) Prediction of turbulent heat transfer using convolutional neural networks. Journal of Fluid Mechanics 882, A18.CrossRef Google Scholar

Kingma, DP and Ba, J (2015) Adam: a method for stochastic optimization. In Bengio, Y and LeCun, Y (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings. San Diego.Google Scholar

Krishnapriyan, A, Gholami, A, Zhe, S, Kirby, R and Mahoney, MW (2021) Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems 34, 26548–26560.Google Scholar

Kumar, N and Nachamai, M (2017) Noise removal and filtering techniques used in medical images. Oriental journal of computer science and technology 10(1), 103–113.CrossRef Google Scholar

Lagaris, IE, Likas, A and Fotiadis, DI (1998) Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks 9(5), 987–1000.CrossRef Google Scholar PubMed

LeCun, Y and Bengio, Y (1998). Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press, pp. 255–258.Google Scholar

Liu, B, Tang, J, Huang, H and Lu, X-Y (2020) Deep learning methods for super-resolution reconstruction of turbulent flows. Physics of Fluids 32(2), 25105.CrossRef Google Scholar

Magri, L (2023) Introduction to neural networks for engineering and computational science. Zenodo. https://doi.org/10.5281/zenodo.7538418.CrossRef Google Scholar

Majda, AJ and Kramer, PR (1999) Simplified models for turbulent diffusion: theory, numerical modelling, and physical phenomena. Physics Reports 314(4–5), 237–574.CrossRef Google Scholar

Mendez, MA, Raiola, M, Masullo, A, Discetti, S, Ianiro, A, Theunissen, R and Buchlin, JM (2017) POD-based background removal for particle image velocimetry. Experimental Thermal and Fluid Science 80, 181–192.CrossRef Google Scholar

Murata, T, Fukami, K and Fukagata, K (2020) Nonlinear mode decomposition with convolutional neural networks for fluid dynamics. Journal of Fluid Mechanics 882, A13.CrossRef Google Scholar

Nóvoa, A and Magri, L (2022) Real-time thermoacoustic data assimilation. Journal of Fluid Mechanics 948, A35.CrossRef Google Scholar

Pope, SB and Pope, (2000) Turbulent Flows. Cambridge, U.K.: Cambridge University Press.Google Scholar

Raiola, M, Discetti, S and Ianiro, A (2015) On PIV random error minimization with optimal POD-based low-order reconstruction. Experiments in Fluids 56(4).CrossRef Google Scholar

Raissi, M, Perdikaris, P and Karniadakis, GE (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, 686–707.CrossRef Google Scholar

Rastrigin, LA (1974) Systems of extremal control. Nauka. Moscow, 1974.Google Scholar

Ruan, F and McLaughlin, D (1998) An efficient multivariate random field generator using the fast Fourier transform. Advances in Water Resources 21(5), 385–399.CrossRef Google Scholar

Sciacchitano, A, Neal, DR, Smith, BL, Warner, SO, Vlachos, PP, Wieneke, B and Scarano, F (2015) Collaborative framework for PIV uncertainty quantification: comparative assessment of methods. Measurement Science and Technology 26(7), 074004.CrossRef Google Scholar

Shi, W, Caballero, J, Huszár, F, Totz, J, Aitken, AP, Bishop, R, Rueckert, D and Wang, Z (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York City, U.S.: IEEE.CrossRef Google Scholar

Vincent, P, Larochelle, H, Lajoie, I, Bengio, Y and Manzagol, P-A (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371–3408.Google Scholar

Yang, W, Zhang, X, Tian, Y, Wang, W, Xue, J-H and Liao, Q (2019) Deep learning for single image super-resolution: a brief review. IEEE Transactions on Multimedia 21(12), 3106–3121.CrossRef Google Scholar

Zhou, DX (2020) Universality of deep convolutional neural networks. Applied and Computational Harmonic Analysis 48(2), 787–794.CrossRef Google Scholar

Figure 1. Inverse problems investigated in this paper. (a) Uncovering solutions from biased data. The model $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ is responsible for recovering the solution (true state), $ \boldsymbol{u}\left(\varOmega, t\right) $, from the biased data, $ \boldsymbol{\zeta} \left(\varOmega, t\right) $. The bias (systematic error), $ \boldsymbol{\phi} \left(\boldsymbol{x}\right) $, is the difference between the biased data and the solution. (b) Reconstructing a solution from sparse information. The model $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ is responsible for mapping the sparse field $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ to the high-resolution field $ \boldsymbol{u}\left({\varOmega}_H,t\right) $. The term $ \tau $ in both cases denotes the number of contiguous time-steps passed to the network, required for computing temporal derivatives. An explanation of the proposed physics-constrained convolutional neural network (PC-CNN), which is the ansatz for both mappings $ {\boldsymbol{\eta}}_{\theta } $ and $ {\boldsymbol{f}}_{\theta } $, is provided in Section 4.

Figure 2. Time-windowing scheme. Time-steps are first grouped into non-overlapping subsets of successive elements of length $ \tau $. Each of these subsets can be taken for either training or validation. Subsets are treated as minibatches and passed through the network to evaluate their output. The temporal derivative is then approximated using a forward-Euler approximation across adjacent time steps.

Figure 3. Uncovering solutions from biased data. (a) Linear convection-diffusion case $ {k}_{\phi }=3 $ and $ \mathcal{M}=0.5 $. (b) Nonlinear convection-diffusion case with $ {k}_{\phi }=5 $ and $ \mathcal{M}=0.5 $. (c) Two-dimensional turbulent flow case with $ {k}_{\phi }=7 $ and $ \mathcal{M}=0.5 $. Panel (i) shows the biased data, $ \boldsymbol{\zeta} $; (ii) shows the true state, $ \boldsymbol{u} $, which we wish to uncover from the biased data; (iii) shows the bias, which represents the bias (i.e., systematic error), $ \boldsymbol{\phi} $; (iv) shows the network predictions, $ {\boldsymbol{\eta}}_{\theta } $; and (v) shows the predicted bias, $ \boldsymbol{\zeta} $ - $ {\boldsymbol{\eta}}_{\theta } $.

Figure 4. Unconvering solutions from biased data: robustness analysis through relative error, $ e $. (a) Linear convection-diffusion case. (b) Nonlinear convection-diffusion case. (c) Two-dimensional turbulent flow case. Orange-bars denote results for case $ (i) $: fixing the magnitude and varying the Rastrigin wavenumber. Blue-bars denote results for case $ (ii) $: fixing the Rastrigin wavenumber and varying the magnitude.

Figure 5. Uncovering solution from biased data. Temporal evolution of the two-dimensional turbulent flow with $ {k}_{\varphi }=7,\mathcal{M}=0.5 $. $ {T}_t $ denotes the length of the transient. (i) Biased data, $ \boldsymbol{\zeta} $; (ii) true state, $ \boldsymbol{u} $; (iii) predicted solution, $ {\boldsymbol{\eta}}_{\theta } $; and (iv) squared error, $ {\left\Vert {\boldsymbol{\eta}}_{\theta }-\boldsymbol{u}\right\Vert}^2 $.

Figure 6. Uncovering solution from biased data: analysis of the solutions in the turbulent flow with $ {k}_{\varphi }=7,\mathcal{M}=0.5 $. (a) Kinetic energy for the two-dimensional turbulent flow. (b) Energy spectrum for the two-dimensional turbulent flow. $ {T}_t $ denotes the length of the transient.

Figure 7. Reconstruction from sparse information: physics-constrained convolutional neural network (PC-CNN) compared with traditional interpolation methods to reconstruct a solution from a sparse grid $ {\varOmega}_L\in {\mathrm{\mathbb{R}}}^{10\times 10} $ (100 points) to a high-resolution grid $ {\varOmega}_H\in {\mathrm{\mathbb{R}}}^{70\times 70} $ (4900 points). Panel (i) shows the low-resolution input, $ \boldsymbol{u}\left({\varOmega}_L,t\right) $; (ii) bi-linear interpolation, $ BL\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $; (iii) bi-cubic interpolation, $ BC\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $; (iv) true high-resolution field, $ \boldsymbol{u}\left({\varOmega}_H,t\right) $; (v) model prediction of the high-resolution field, $ {\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_L,t\right)\right) $; and (vi) energy spectra for each of the predictions. Vertical lines $ {f}^n $ denote the Nyquist frequencies for spectral grid, $ {f}_{{\hat{\varOmega}}_{\boldsymbol{k}}}^n $, and for the low-resolution grid, $ {f}_{\varOmega_{\boldsymbol{L}}}^n $.

Figure 8. Uncovering solutions from biased data. The model $ {\boldsymbol{\eta}}_{\boldsymbol{\theta}} $ is responsible for mapping the biased state $ \boldsymbol{\zeta} \left(\varOmega, t\right) $ to the true solution $ \boldsymbol{u}\left(\varOmega, t\right) $. Convolutional layers are parameterized as torch.Conv2D(c_{in}, c_{out}, k), where c_{in}, c_{out} denote the number of input and output channels, respectively; and k is the spatial extent of the filter. The term $ h $ represents the activation, tanh in this case. The terms $ {\unicode{x1D543}}_D,{\unicode{x1D543}}_P,{\unicode{x1D543}}_C $ denote the data loss, physics loss, and constraint loss respectively. The combination of these losses forms the objective loss $ {\mathcal{L}}_{\theta } $, which is used to update the network’s parameters. The term $ \tau $ denotes the number of contiguous time-steps passed to the network, required for computing temporal derivatives. We provide an in-depth explanation of these losses in Section 4.1.

Figure 9. Reconstruction from sparse information. The model $ {\boldsymbol{f}}_{\boldsymbol{\theta}} $ is responsible for mapping the low-resolution field $ \boldsymbol{u}\left({\varOmega}_L,t\right) $ to the high-resolution field $ \boldsymbol{u}\left({\varOmega}_H,t\right) $. The upsampling layer (blue) performs bi-cubic upsampling to obtain the correct spatial dimensions. Convolutional layers are parameterized as torch.Conv2D(c_{in}, c_{out}, k), where c_{in}, c_{out} denote the number of input and output channels, respectively; and k is the spatial extent of the filter. The term $ h $ represents the activation, tanh in this case. The terms $ {\unicode{x1D543}}_D,{\unicode{x1D543}}_P $ denote the data loss and physics loss, respectively, the combination of which forms the objective loss $ {\mathcal{L}}_{\theta } $, which is used to update the network’s parameters, $ \boldsymbol{\theta} $. The term $ \tau $ denotes the number of contiguous time-steps passed to the network, required for computing temporal derivatives. We provide an in-depth explanation of these losses in Section 4.1.

Figure 11. Reconstruction from sparse information. Performance for the reconstruction task is shown for a range of scale factors $ \kappa $, measuring performance with the relative $ {\mathrm{\ell}}^2 $-errors, $ e $, between the predicted high-resolution fields $ {\boldsymbol{f}}_{\boldsymbol{\theta}}\left(\boldsymbol{u}\left({\varOmega}_{\mathbf{L}},t\right)\right) $, and the true high-resolution fields $ \boldsymbol{u}\left({\varOmega}_{\boldsymbol{H}},t\right) $.

Submit a response

Comments

No Comments have been published for this article.

Article contents

Physics-constrained convolutional neural networks for inverse problems in spatiotemporal partial differential equations

Abstract

Keywords

Impact Statement

1. Introduction

2. Background

2.1. Uncovering solutions from biased data

2.2. Reconstruction from sparse information

3. Convolutional neural networks

4. Physics-constrained convolutional neural networks (PC-CNN)

4.1. Losses for training

4.1.1. Losses for uncovering solutions from biased data

4.1.2. Losses for reconstruction from sparse information

4.2. Time-windowing of the physics loss

5. Synthetic data generation

5.1. Physics loss in the Fourier domain

6. Results on uncovering solutions from biased data

6.1. Parameterization of the bias

6.2. Numerical details, data, and performance metrics

6.3. The linear convection-diffusion equation

6.4. Nonlinear convection-diffusion equation

6.4.1. Two-dimensional turbulent flow

6.5. Robustness

6.6. Physical consistency of the uncovered Navier–Stokes solutions

7. Results on reconstruction from sparse information

7.1. Comparison with standard upsampling

8. Conclusions

Acknowledgements

Data availability statement

Author contribution

Funding statement

Competing interest

Ethical standard

Appendix

A. Architectural details

A.1. Uncovering solutions from biased data

A.2. Reconstruction from sparse information

B. Pseudo-spectral discretization of Navier–Stokes residual

C. Multi-wave number bias removal

D. Parametric study on the scale factor

Footnotes

References

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests