1. Introduction
Global modelling of multi-scale collisionless plasma phenomena is a long-standing computational challenge. Even with state-of-the-art computing resources, it is often computationally infeasible to resolve the smallest scale, kinetic effects within large-scale, global domains relevant for fusion and astrophysical systems. Thus, progress in understanding these systems is bound to happen through a combination of ab initio kinetic modelling and reduced models. In the latter category, collisionless fluid models have demonstrated their utility in global modelling (Dong et al. Reference Dong, Wang, Hakim, Bhattacharjee, Slavin, DiBraccio and Germaschewski2019; TenBarge et al. Reference TenBarge, Ng, Juno, Wang, Hakim and Bhattacharjee2019; St-Onge et al. Reference St-Onge, Kunz, Squire and Schekochihin2020; Ng et al. Reference Ng, Hakim, Wang and Bhattacharjee2020) and as part of hybrid (kinetic-fluid) schemes (Shi et al. Reference Shi, Lin, Wang, Wang and Nishimura2021; Arzamasskiy et al. Reference Arzamasskiy, Kunz, Squire, Quataert and Schekochihin2023; Achikanath Chirakkara et al. Reference Achikanath Chirakkara, Seta, Federrath and Kunz2023). For such reduced models to be able to capture the essence of the kinetic physics at play, a systematic approach to constructing accurate fluid closures is called for. A fluid closure relates higher-order fluid quantities (e.g. heat flux) – for which the exact evolution equation is not retained – to lower-order fluid quantities (e.g. density, flow velocity) and fields, allowing the truncation the otherwise infinite hierarchy of fluid equations. Due to the lack of a universal closure for collisionless dynamics, each closure must be tailored to the phenomena of interest.
In collisional systems, the distribution functions of the particles remain close to local thermodynamic equilibrium, allowing for rigorous construction of closed fluid models (Braginskii Reference Braginskii1958; Chapman & Cowling Reference Chapman and Cowling1991). There is, however, no generally applicable closures for collisionless systems that are characterised by significant departures from Maxwellianity and non-local kinetic phenomena, such as wave–particle interactions. Nevertheless, there are theoretical approaches which distil various aspects of the relevant physics into the form of the closure equations. The closure by Hammett & Perkins (Reference Hammett and Perkins1990) is constructed to capture Landau damping, the Chew–Goldberger–Low (CGL) closure (Chew, Goldberger & Low Reference Chew, Goldberger and Low1956) evolves the (generally anisotropic) pressure in such a way as to conserve adiabatic invariants in collisionless magnetised plasmas, and the closure by Levermore (Reference Levermore1996) is based on the principle of maximum entropy, to name a few. However, all of these approaches, while theoretically motivated, have limited scope – such as requiring linearity or exact adiabaticity. These assumptions break down for many problems of interest, for instance, in the presence of turbulence or magnetic reconnection. Furthermore, some variants are numerically difficult to work with due to spatial non-locality. In addition, there are ad hoc closures, such as the relaxation closure that drives the pressure tensor towards an isotropic pressure (Wang et al. Reference Wang, Hakim, Bhattacharjee and Germaschewski2015). These have had varying success in reproducing kinetic simulation results, and may contain free parameters that cannot be determined theoretically, and the choice of which can drastically alter dynamics.
An alternative systematic line of action to obtain closures for collisionless plasma systems is to look towards data-driven methods, where the closures are constructed to conform with kinetic simulation data. Neural network-based machine learning is an effective tool to this end (Wang et al. Reference Wang, Xu, Zhu, Ma and Lei2020; Maulik et al. Reference Maulik, Garland, Burby, Tang and Balaprakash2020; Qin et al. Reference Qin, Ma, Jiang, Dong, Fu, Wang, Cheng and Jin2023), though it lacks interpretability, which makes it difficult to gain intuition and generalisable understanding from. However, symbolic regression (Makke & Chawla Reference Makke and Chawla2024) and sparse regression (SR) (Brunton, Proctor & Kutz Reference Brunton, Proctor and Kutz2016; Rudy et al. Reference Rudy, Brunton, Proctor and Kutz2017; Schaeffer Reference Schaeffer2017) methods can be used to infer interpretable and generalisable equations describing a dynamical system. The models thus obtained are parsimonious, lying at the Pareto-front trading between predictive power and model complexity. SR has been used to discover the governing equations of dynamical systems in a broad range of fields previously, but it has only recently been introduced in plasma physics (Dam et al. Reference Dam, Brøns, Juul Rasmussen, Naulin and Hesthaven2017; Kaptanoglu et al. Reference Kaptanoglu, Morgan, Hansen and Brunton2021b , Reference Kaptanoglu, Hansen, Lore, Landreman and Brunton2023a ; Alves & Fiuza Reference Alves and Fiuza2022) and there is only a very limited number of attempts to use it for closure discovery. Donaghy & Germaschewski (Reference Donaghy and Germaschewski2023) employ SR to recover collisionless fluid equations and discover a heat flux closure in the strongly nonlinear state of a two-dimensional Harris-sheet reconnection scenario. They do not attempt to interpret the found closure and avoid the linear regime, which is difficult due to noise at small amplitudes. Combining sparse regression and deep learning neural networks, Cheng et al. (Reference Cheng, Fu, Wang, Dong, Jin, Jiang, Ma, Qin and Liu2023) recover fluid equations and the local approximation (Sharma et al. Reference Sharma, Hammett, Quataert and Stone2006; Ng et al. Reference Ng, Hakim, Wang and Bhattacharjee2020) of the Hammett–Perkins closure in a one-dimensional linearly Landau-damped Langmuir standing wave set-up.
Here, we employ the SINDy (Sparse Identification of Nonlinear Dynamics) algorithm (Brunton et al. Reference Brunton, Proctor and Kutz2016; Rudy et al. Reference Rudy, Brunton, Proctor and Kutz2017) for sparse regression to obtain a heat flux closure – an expression for the heat flux in terms of lower-order moments – in one-dimensional (1-D) electrostatic plasma scenarios. Specifically, we examine Landau-damped Langmuir waves, and to gain further insights into the identified closure terms and illustrate their more general nature, we also study set-ups exhibiting two-stream instability (Stix Reference Stix1992) (and the following nonlinear dynamics), ubiquitous in space and astrophysical systems (Khotyaintsev et al. Reference Khotyaintsev, Graham, Norgren and Vaivads2019). Using particle-in-cell simulation data produced with the OSIRIS code (Fonseca et al. Reference Fonseca2002, Reference Fonseca, Martins, Silva, Tonge, Tsung and Mori2008), we search for optimally accurate expressions for the heat flux at each given model complexity (i.e. number of terms in the closure expression). Covering both linear and nonlinear stages, we follow the time evolution of the closure terms across the development of an electrostatic two-stream scenario, from growth through saturation via the formation and merging of phase-space holes. We also elaborate on the parametric dependences of the terms found and quantify their relative importance. The expressions are interpreted in the context of the local Hammett–Perkins closure. Analytically obtained constraints between the various closure terms are also provided to support the regression results and to assist their interpretation.
The rest of the article is organised as follows. In § 2, we describe the sparse regression method employed and the simulation set-up for the systems we study, exhibiting Landau damping and growth. In § 3, we outline and analyse the heat flux model terms identified by SR. More specifically, we describe the results of applying SR to simulations of Landau-damped Langmuir waves and two-stream instabilities in §§ 3.1 and 3.2, respectively. We then examine the relative importance of the various terms found in § 3.3, relate the six terms found most consistently by SR to linear collisionless theory in § 3.4 and finally, in § 3.5, discuss a fundamentally nonlinear seventh term which is also identified as relevant by SR in many cases. We conclude by summarising our results and giving an outlook on future work in § 4. Furthermore, we include Appendix A, containing a derivation of 1-D electrostatic linear collisionless theory from the Vlasov–Maxwell system, as well as the constraints this imposes on the heat flux model found by SR. Finally, in Appendix B, we give a demonstration of how SR works by going through how one can recover the 1-D momentum equation from simulation data.
2. Methods
Our aim is to find approximate, spatio-temporally local analytical expressions for the heat flux in terms of lower-order fluid quantities, such that these expressions capture most of the variation in the heat flux observed in kinetic simulation data. When discovering heat flux closures for a given physical system, we start by performing a kinetic simulation of the system in question using the particle-in-cell (PIC) code OSIRIS. During the simulation, we export diagnostics for all fluid quantities present in the three lowest-order collisionless fluid equations (A.2), namely the number density
$n_\sigma = \int d^3{\boldsymbol{v}} f_\sigma (\boldsymbol{v})$
, flow velocity
$\boldsymbol{V}_{\sigma } = n_\sigma ^{-1} \int d^3{\boldsymbol{v}} \boldsymbol{v} f_\sigma$
, mass-normalised pressure tensor
$\unicode{x1D665}_\sigma = \int d^3{\boldsymbol{v}} (\boldsymbol{v} - \boldsymbol{V}_{\sigma })^{(2)} f_\sigma$
and mass-normalised heat flux tensor
$\unicode{x1D666}_\sigma = \int d^3{\boldsymbol{v}} (\boldsymbol{v} - \boldsymbol{V}_{\sigma })^{(3)} f_\sigma$
for each species
$\sigma$
, as well as electromagnetic field data (electric field
$\boldsymbol{E}$
and magnetic field
$\boldsymbol{B}$
) at regular time intervals. Here,
$f_\sigma (\boldsymbol{v})$
denotes the distribution function for species
$\sigma$
, and we use notation where
$\left [\boldsymbol{a} \boldsymbol{b}\right ]_{ij} = a_i b_j$
and
$\boldsymbol{a}^{(2)}$
is shorthand for
$\boldsymbol{a}\boldsymbol{a}$
. For accurate regression results, it is important that the version of OSIRIS used here corrects for the otherwise occurring half-time step shifts between position and momentum data, characteristic of PIC codes using a leap-frog scheme (Boris & Shanny Reference Boris and Shanny1972; Hockney & Eastwood Reference Hockney and Eastwood2021). We also post-process our data to correct for staggering of the fields through linear interpolation (see Appendix B for a demonstration of the importance of correcting for such misalignments).
2.1. Sparse regression
In general, the aim of a sparse regression (SR) algorithm is to find an approximate relationship between some target quantity
$y$
and a set of
$M$
possibly relevant quantities
$\left \{\theta _j\right \}_{j=1}^M$
, while keeping model complexity low. In the version of SINDy we use, which is one of the modified versions of the PDE-FIND algorithm described by Alves & Fiuza (Reference Alves and Fiuza2022), the aim is specifically to approximate
$y$
as a linear combination of the
$\theta _j$
quantities. To accomplish this, we randomly select
$N$
small space–time volumes from the simulation domain and integrate both
$y$
and all
$\theta _j$
quantities over these small volumes to reduce noise.Footnote
1
We then collect the volume-integrated
$y$
and
$\theta _j$
quantities from all the sampled points in the domain into a vector
$\boldsymbol{y}$
and a matrix
$\boldsymbol {\varTheta}$
defined so that

With these definitions, the task of approximating
$y$
becomes a question of finding the coefficient vector
$\boldsymbol {\xi }$
that optimally solves the equation

For us, ‘optimally’ means achieving a low mean squared error with as few non-zero terms as possible, maximising not just accuracy but also model simplicity and generalisability. Thus, our cost function looks like

where
$\left \|\boldsymbol {\xi }\right \|_0$
denotes the
$0$
-norm of
$\boldsymbol {\xi }$
, i.e. the number of non-zero coefficients. The
$\lambda$
hyperparameter is effectively gradually increased from
$0$
, leading to increasingly harsher penalisation of models with many non-zero terms. The end result of this procedure is a sequence of models which are optimally accurate at each given model complexity, sweeping along the Pareto front. To curb overfitting and more easily discern which terms are spurious, we perform 10-fold cross-validation – terms in
$\Theta$
which are found consistently are more likely to be physical. The efficacy of our SR approach is demonstrated in Appendix B through a recovery of the electron momentum equation.
In our case, we seek a heat flux closure for modelling electrostatic plasma phenomena, meaning our
$y$
quantities are the elements of the heat flux tensor
$\unicode{x1D666}_\sigma$
. As for the set of possibly relevant quantities
$\theta _j$
, in principle, one would want to include all possible expressions involving
$n_\sigma$
,
$\boldsymbol{V}_{\sigma }$
,
$\unicode{x1D665}_\sigma$
,
$\boldsymbol{E}$
and
$\boldsymbol{B}$
. In practice, however, this is infeasible, since the space of possible expressions is infinite. As it turns out, even restricting to e.g. arbitrary products of the form

where the exponents
$\alpha$
,
$\beta _k$
,
$\gamma _k$
,
$\delta _k$
and
$\varepsilon _{kl}$
are non-negative integers summing to
$\leqslant$
some integer
$s$
, results in enormous term libraries
$\boldsymbol {\varTheta}$
even when
$s$
is relatively small due to the combinatorics involved. Since having a very large term library not only increases computational cost, but also often leads to issues with convergence, choosing a term library with as few superfluous terms as possible is desired.
For the one-dimensional electrostatic plasma problems we consider, where only electron physics is relevant over the time scales of interest, we can first restrict ourselves to considering only
$n_e$
,
$V_{e1}$
,
$E_1$
,
$p_{e11}$
and set
$y = q_{e111}$
. Since only electrons are relevant, and all vectors and tensors in 1-D have just a single degree of freedom, we can also suppress species and coordinate indices from now on. For convenience, we also normalise to the electron mass
$m_e$
, the elementary charge
$e$
, the speed of light
$c$
and the plasma frequency
$\omega _{\mathrm{pe}}=\sqrt {\bar {n}_e e^2/(\varepsilon _0 m_e)}$
at the unperturbed electron density
$\bar {n}_e$
. This also normalises distances to the electron inertial length
$\delta _e=c/\omega _{\mathrm{pe}}$
.
To further narrow the range of possible candidate terms, we start with only those terms which are dimensionally consistent with our
$y$
variable
$q$
, i.e. terms of the form

for some integer
$\alpha \leqslant 3$
, where
$v_{\mathrm{th}} = \sqrt {T} = \sqrt {p/n}$
, defining
$T = p/n$
to be (the
$11$
-component of) the mass-normalised temperature tensor. Inspired by the local approximation (Ng et al. Reference Ng, Hakim, Wang and Bhattacharjee2020) of the Hammett–Perkins closure, which involves a temperature gradient, we extend this initial set of candidate terms to also allow similar ones with first-order spatial derivatives, e.g.
$n v_{\mathrm{th}} \partial _{x}(v_{\mathrm{th}}) \partial _{x}(V)$
. It should be noted, however, that the presence of the spatial derivative in these additional terms means that the coefficient corresponding to each such term will be dimensional. For instance, the example term mentioned above with two spatial derivatives necessitates a coefficient with a dimensionality of length squared. This in turn suggests a scaling
${\sim} L^2$
for the coefficient in question, where
$L$
is the characteristic length scale for the variation in the quantities involved.
We emphasise that restricting our term library in this way specifically is an arbitrary choice, made to limit the term library size so as to make SR convergence more likely. We start by considering dimensionally consistent terms mainly because models constructed from such terms contain only unitless coefficients, which facilitates generalisability. The restriction to integer
$\alpha$
is made for convenience. Our exclusion of terms with higher-order derivatives is chiefly motivated by the fact that their inclusion would lead to difficulties with SR convergence due to the vastly increased term library size. Furthermore, closures constructed from such terms are more difficult to work with computationally, since even first-order derivatives in the expression for
$\unicode{x1D666}$
yield second-order derivatives in
$\boldsymbol {\nabla } \cdot \unicode{x1D666}$
and thus in the fluid equation system one needs to solve. Importantly, the function space we have restricted ourselves to seems sufficient to model
$q$
accurately, as we shall see.
2.2. Simulation set-up
In all of our simulations, we kinetically model an electron–proton plasma in one spatial and threeFootnote
2
velocity dimensions in the centre-of-mass (CoM) frame, with physical mass ratio, a spatial resolution of
$\Delta x = 10^{-3}\,\mathrm {\delta _e}$
and periodic boundary conditions. Since our simulations are all performed in the initial CoM frame and ion flow velocities remain negligibly small, every instance of an electron flow velocity
$V$
below can be thought of as
$V - v_{\mathrm{CoM}}$
, with a spatial average value of
$0$
. Here,
$v_{\mathrm{CoM}}$
is the velocity of the centre of mass. This quantity is invariant under Galilean transformations, just like
$n$
and
$v_{\mathrm{th}}$
, meaning that all terms in our term library are frame-independent with this interpretation of
$V$
. This is very much desirable since
$q$
, the quantity we are seeking to model, is a Galilean-invariant quantity.Footnote
3
For numerical stability, we consistently use a simulation-internal time step slightly smaller than the spatial resolution:
${9.5\times 10^{-4}}\mathrm {\omega _{\mathrm{pe}}^{-1}}$
. To limit the amount of data output as diagnostics, we save the state of the simulation only once every 100 time steps, thus our regression analysis uses an effective temporal resolution of
$\Delta t = {0.095}\mathrm {\omega _{\mathrm{pe}}^{-1}}$
.
2.2.1. Landau-damped Langmuir waves
When studying Landau-damped Langmuir waves, we initialise the plasma as a Maxwell distribution with various non-relativistic thermal speeds
$v_{\mathrm{th}} \sim {0.01}\mathrm {c}$
using a domain size of
${0.256}\mathrm {\delta _e}$
with
$10^5$
$\left (10^4\right )$
electrons (ions) per cell. To excite Langmuir waves, we then apply and smoothly turn off an external sinusoidal
$\boldsymbol{E}$
-field perturbation propagating in the
$+x$
or
$-x$
direction with wavenumber
$\lvert {k}\rvert = 4\pi / \left ({0.256}\mathrm {\delta _e}\right )$
and a frequency
$\omega _r$
matching that of the analytic Langmuir mode. More specifically, this is done by using a single-cycle sine squared envelope, reaching maximum amplitude at
$\omega _{{pe}} t = 3$
and being fully turned off at
$\omega _{\mathrm{pe}} t = 6$
. The values of
$v_{\mathrm{th}}$
considered, along with corresponding
$\lvert {k}\rvert \lambda _{D,e}$
values (where
$\lambda _{D,e}$
is the electron Debye length), as well as frequencies and growth rates of the resulting Langmuir waves, are shown in table 1.
Table 1. Values of
$v_{\mathrm{th}}$
used when studying Landau-damped Langmuir waves, the frequency
$\omega _r$
and the (negative) growth rate
$\gamma$
of the excited Langmuir wave and the duration
$\Delta t_{\textrm {L}}$
of the period of exponential decay, over which SR is applied, as well as the estimated bounce time
$t_b$
for trapped electrons in each case. We also list the values of
$\lvert {k}\rvert \lambda _{D,e}$
and
$\omega _r/(kv_{\mathrm{th}})$
resulting from the other parameters. The frequencies
$\omega _r$
are calculated via Jacobsen interpolation (Jacobsen & Kootsookos Reference Jacobsen and Kootsookos2007) of the peaks in the discrete Fourier transform (DFT) spectrum for the
$\boldsymbol{E}$
-field, with uncertainty (*) corresponding to half the DFT bin size. The growth rate
$\gamma$
is calculated via linear regression on logarithmised data of the average
$\boldsymbol{E}$
-field energy density over the period of exponential decay. The estimated bounce time is calculated as
$t_b = \sqrt { m_e / ( e \lvert {k}\rvert E_{{\rm rms}} ) }$
, where
$E_{{\rm rms}} = \sqrt {\langle {E^2}\rangle }$
is the spatial root-mean-square average of the
$\boldsymbol{E}$
-field magnitude at the point in time when the external drive is switched off.

After the external forcing is removed, the system is left to evolve self-consistently, with the resulting Langmuir waves decaying due to Landau damping – initially exponentially, with only linear processes involved, as can be seen in figure 1(a). The PDE-FIND algorithm is then applied to find a closure for
$q$
during the timeframe of length
$\Delta t_{\textrm {L}}$
where decay is judged to be exponential (e.g.
$6.0 \lt \omega _{\mathrm{pe}} t \lt 21.0$
for initial
$v_{\mathrm{th}} \leqslant {0.01}\mathrm {c}$
– see also figure 1
a, where this time range is highlighted in red). In total,
$\sim {6}\,\%$
of this space–time range is randomly sampled per cross-validation fold, of which there are 10. The values of
$\Delta t_{\textrm {L}}$
for all values of
$v_{\mathrm{th}}$
considered are listed in table 1, together with the estimated bounce times
$t_b$
for trapped electrons. Note that for the four lower thermal speeds considered, decay is exponential for roughly twice the bounce time, meaning that we may expect slight nonlinear trapping effects towards the end of the sampled time window. At higher values of
$v_{\mathrm{th}}$
, however, these effects are drowned out by numerical noise present at the low perturbation amplitudes reached towards the end of exponential decay. Indeed, this is the reason for
$\Delta t_{\textrm {L}}$
being lower than
$t_b$
for the two highest
$v_{\mathrm{th}}$
values considered – in these cases, the decay is so rapid that the perturbation disappears in the numerical noise before trapping effects become visible.

Figure 1. Evolution of the spatially averaged
$\boldsymbol{E}$
-field energy density
$\langle {\frac {1}{2} \varepsilon _0 E^2}\rangle$
over time (a) in the Landau damping case where
$v_{\mathrm{th}} = {0.01}\mathrm {c}$
and (b) in the two-stream case where
$n_b/n_e = 0.1$
, normalised to the rest energy of an electron and the unperturbed electron number density
$\bar {n}$
. The linear decay/growth phase is highlighted in red.
2.2.2. Two-stream instabilities
Apart from studying Landau damping, we also examine a set-up exhibiting Landau growth, namely a two-stream unstable plasma. In this case, we choose a larger domain size of
${2.048}\mathrm {\delta _e}$
to limit the effects of the periodic boundary conditions employed. We initialise the ions in equilibrium, with the electrons split into counterflowing equal temperature Maxwellian populations – a core population with density
$n_c$
and flow velocity
$V_c$
, and a beam population with density
$n_b$
and flow velocity
$V_b$
. The thermal speed for each population individually is
$u_{\mathrm{th}} = {3.16\times 10^{-3}}\mathrm {c}$
. We vary
$n_b$
as a fraction of the total electron density
$n_e$
, keeping the relative velocity
$V_{\mathrm{rel}} = V_c - V_b$
constant at
${0.02}\mathrm {c}$
and staying in the zero-current frame by enforcing
$n_c V_c + n_b V_b = 0$
. Specifically, we consider the range of values for
$n_b/n_e$
listed in table 2. In these simulations, we use
$10^{4}$
(
$200$
) electrons (ions) per cell.
Table 2. Values of
$n_b/n_e$
examined when studying two-stream instabilities and the corresponding values of
$v_{\mathrm{th}}$
, as well as the frequency
$\omega _r$
, growth rate
$\gamma$
and characteristic wavenumber
$k_{\mathrm{char}}$
for the excited perturbation (also listing
$\lvert {k_{\mathrm{char}}}\rvert \lambda _{D,e}$
), together with the duration
$\Delta t_{\textrm {L}}$
of exponential growth and the bounce time
$t_b$
. The same methods are used to calculate
$\omega _r$
and
$\gamma$
as in the Landau damping cases, described in table 1. The values of
$k_{\mathrm{char}}$
are calculated by using a weighted average over the DFT spectrum at the end of linear growth, using the absolute value of the Fourier amplitude squared as the weighting. The minus signs signify propagation towards
$-x$
. Similarly to what was done in the Landau damping cases, we estimate
$t_b = \sqrt { m_e / ( e \lvert {k_{\mathrm{char}}}\rvert E_{ {rms}} ) }$
, with both
$k_{\mathrm{char}}$
and
$E_{{\rm rms}}$
in this case being evaluated at the time of maximum average
$\boldsymbol{E}$
-field energy density. Note that while
$k_{\mathrm{char}}$
at this time is slightly different than the values listed in this table, the difference is marginal (
${\sim} {1}\,\%$
).

Note that the non-zero relative velocity
$V_{\mathrm{rel}}$
between the populations means that the combined electron population had a thermal speed
$v_{\mathrm{th}} = \sqrt {p/n} \gt u_{\mathrm{th}}$
. More specifically, this combined thermal speed is
$v_{\mathrm{th}} = [ u_{\mathrm{th}}^2 + (n_b/n_e) \left ( 1 - n_b/n_e \right ) V_{\mathrm{rel}}^2 ]^{1/2}$
, with a maximum of
$v_{\mathrm{th}} \approx {1.05\times 10^{-2}}\mathrm {c}$
for
$n_b/n_e = 0.5$
.
The counterflowing electron populations drive wave growth via inverse Landau damping. Similarly to the decay of the Langmuir waves above, this growth is initially exponential. It eventually saturates, however (as can be seen in figure 1
b), leading to the formation of phase-space electron holes. With the data from these simulations, SR is performed (a) over the linear part of the growth phase, and (b) over small time slices of length
${1.9}\mathrm {\omega _{\mathrm{pe}}^{-1}}$
covering both the growth phase and the saturated phase to study how the closure coefficients evolve over time, as illustrated in figure 3 – in both cases for all values of
$n_b/n_e$
listed in table 2. Part (a) here is very much analogous to what was done for the Landau damping case. For example, in the case where
$n_b/n_e = 0.1$
, the time range sampled is
$19.0 \lt \omega _{\mathrm{pe}} t \lt 38.0$
, highlighted in red in figure 1(b), meaning
$\omega _{\mathrm{pe}} \Delta t_{\textrm {L}} = 19.0$
. In part (b), the time slices are centred on time steps
$t = 1.9\times \left \{1, 2, 3, \ldots \right \}\omega _{\mathrm{pe}}^{-1}$
, covering the entire simulation domain up to the last such time step which is
$\gt 0.95 \omega _{\mathrm{pe}}^{-1}$
from the end of the simulation, so that every time slice falls entirely within the domain of the simulation. In both of these cases, each of the 10 cross-validation folds randomly samples
${\sim} {2.5}\,\%$
of the data in the space–time ranges of interest.
Since we are now dealing with exponential growth rather than exponential decay, the influence of noise towards the end of the linear part of the process is significantly decreased compared with the situation in the Landau damping simulations. This is also clearly visible in the stronger relationship between
$\Delta t_{\textrm {L}}$
and
$t_b$
in these simulations – we consistently have
$\Delta t_{\textrm {L}} \sim 5.5 t_b$
, meaning growth is exponential for a little over five times the bounce time. This suggests that there is a high probability of nonlinear trapping-related effects being present to some extent in the data towards the latter half of the sampled time range.
The excited perturbations in this case are more broad-spectrum than the Langmuir waves examined above, necessitating the introduction of characteristic wavenumbers
$k_{\mathrm{char}}$
, calculated as outlined in the caption of table 2. The temporal spectrum is dominated by a single frequency peak however, at the value of
$\omega _r$
listed in table 2. Like in table 1, we also, for convenience, show
$\lvert {k_{\mathrm{char}}}\rvert \lambda _{D,e}$
.
For simplicity and to more easily compare our results with those from the Landau-damping case, we only consider the combined electron species rather than treating the counter-streaming populations separately when performing SR. That is, all of our closure models are models of the total electron heat flux, and our term library is constructed from fluid quantities relating to the entire electron population. We note, however, that for the purposes of modelling two-stream unstable systems in fluid codes, it is likely more practical to treat the two electron populations as separate species, with their own respective closures. Preliminary results suggest that applying SR when using such an approach also yields broadly similar closure terms as those identified here, in appropriately chosen reference frames. To avoid diverting our focus, we leave a more detailed analysis of two-stream instability along these lines, with separate closures for the beam and core populations, outside the scope of this article.
3. Results
For both of the set-ups we considered, SR yields very similar results. In both cases, a six-term model
$q = q_{\mathrm{even}} + q_{\mathrm{odd}}$
was found, where

The split into
$q_{\mathrm{even}}$
and
$q_{\mathrm{odd}}$
is based on the dependence on the propagation of the perturbations involved. While the coefficients in front of the
$q_{\mathrm{even}}$
terms are independent of propagation direction (and thus ‘even in
$k$
’), the
$q_{\mathrm{odd}}$
coefficients switch sign if the propagation direction is reversed (and are thus ‘odd in
$k$
’). This also means that if there is no wave propagation, or when oppositely propagating waves are of similar amplitudes, such as in a standing-wave scenario, all
$q_{\mathrm{odd}}$
coefficients go to zero. In several cases, an additional term
$\propto n v_{\mathrm{th}} V^2$
is found. As this term mostly appears for low
$\lvert {\gamma }\rvert$
– specifically towards the end of linear growth or decay processes when
$\lvert {\gamma }\rvert$
is decreasing – it appears to help capture weak nonlinear trapping effects.
We note that, due to the presence of spatial derivatives in terms 2, 3 and 6, the corresponding coefficients are dimensional, having units of length. When plotting them, they are implicitly given in units of
$\delta _e$
, though often re-scaled by a factor of
$10^2$
for readability. In other words, a label
$A_2 \times {10^2}{}$
should read
$10^2 A_2 / \delta _e$
. Similarly,
$A_4$
has the units of mass-normalised heat flux, or number density times velocity cubed, and is implicitly given in units of
$\bar {n} c^3$
. Thus, a curve labelled
$A_4 \times 10^{6}$
shows
$10^6 A_4 / (\bar {n} c^3)$
.
We consistently quantify the error of the various models found by SR using the fraction of variance unexplained (FVU), defined as
$\textrm {FVU} = \sum _i (y_i-\hat {y}_i)^2/\sum _i(y_i-\overline {y})^2$
, where
$y_i$
is the value of the
$y$
quantity at the
$i{\mathrm{th}}$
sampling point,
$\hat {y}_i$
is the
$y$
-value predicted by the model at that point,
$\overline {y} = \frac {1}{N} \sum _i y_i$
is the mean
$y$
-value and the sums run over the
$N$
samples. As stated in § 2, the
$y$
-quantity of interest to us is the total electron heat flux
$q = q_{ e 111 }$
.
3.1. Landau-damped Langmuir waves
In the simulations of Landau-damped Langmuir waves, SR found the six-term closure in (3.1) consistently, with an FVU of 2 %–7 %. The FVU increases with higher initial
$v_{\mathrm{th}}$
and corresponding stronger damping. In simulations of standing waves (i.e. the sum of oppositely propagating waves), only the
$q_{\mathrm{even}}$
terms are found, consistent with the lack of a preferred direction.

Figure 2. Results from the Landau damping simulations. (a) FVU of the successive closures found for
$v_{\mathrm{th}}/c = 0.01$
(where
$\gamma = -0.150$
), with each new term indicated by its coefficient in (3.1). Circles, unlike triangles, denote consistently found terms and blue (orange) marker colour corresponds to performance on testing (training) data. Note that while FVU is typically smaller than unity, it is higher than unity in certain situations. For example, this is the case for the
$0$
-term model, which is identically zero, meaning
$\hat {y}_i = 0$
for all
$i$
, while the mean of the
$y$
data is
$\overline {y} \ne 0$
. (b) Dependence of the closure coefficients on
$v_{\mathrm{th}}$
, for a wave propagating in the
$+x$
direction. Blue (red) lines/symbols indicate
$q_{\mathrm{even}}$
(
$q_{\mathrm{odd}}$
) coefficients. For comparison,
$\lvert {\gamma }\rvert / \omega _{\mathrm{pe}}$
is plotted as a grey tightly dotted line.
We note that in some cases, additional terms are sometimes found consistently. For example, the unannotated seventh circle marker in figure 2(a) corresponds to a term
$\propto n V^2 \partial _{x} V$
. In particular, at low
$\lvert {\gamma }\rvert$
, where nonlinear trapping effects are expected to be slightly more important, a term
$\propto n v_{\mathrm{th}} V^2$
is found, likely helping capture these weak nonlinear effects.

Figure 3. Results from the two-stream instability simulations. (a) Dependence of the growth phase closure coefficients on
$n_b / n_e$
. For comparison,
$\gamma / \omega _{\mathrm{pe}}$
is plotted as a grey tightly dotted line. (b) Evolution of the coefficients over time in the case where
$n_b / n_e = 0.1$
. Blue (red) lines indicate
$q_{\mathrm{even}}$
(
$q_{\mathrm{odd}}$
) coefficients. Here, the grey tightly dotted line is the logarithm of the spatially averaged
$\boldsymbol{E}$
-field energy density in arbitrary units, recognisable from figure 1(b).
Since we are examining an electrostatic 1-D setting affected by Landau damping, one might expect the closure to be similar to the local approximation (Sharma et al. Reference Sharma, Hammett, Quataert and Stone2006; Ng et al. Reference Ng, Hakim, Wang and Bhattacharjee2020) of the Hammett–Perkins closure (Hammett & Perkins Reference Hammett and Perkins1990):
$q\sim -\left (\chi /\lvert {k_{\mathrm{char}}}\rvert \right ) nv_{\mathrm{th}}^2\partial _{x} v_{\mathrm{th}}$
, where
$k_{\mathrm{char}}$
is a characteristic wavenumber. The value one should choose for the heat diffusivity
$\chi$
depends on which values of
$\omega /(kv_{\mathrm{th}})$
are of interest, with the original Hammett–Perkins paper focusing on regimes relevant for the ion temperature gradient (ITG) instability. In our closure, the
$A_3$
term can be identified as playing this role. Sparse regression finds
$v_{\mathrm{th}}$
-dependent coefficients, see figure 2(b). For most coefficients, however, this
$v_{\mathrm{th}}$
-dependence is significantly weaker than the
$v_{\mathrm{th}}$
-dependence of
$\gamma$
over the examined range of values – the major exception being the constant term
$A_4$
, which does not affect
$\partial _{x} q$
, the quantity we are ultimately in need of a closure for. There is one coefficient with non-negligible
$v_{\mathrm{th}}$
-dependence which does affect
$\partial _{x} q$
, however – namely,
$A_1$
. Interestingly, we find that the improvement in predictive power caused by the introduction of this
$A_1 n v_{\mathrm{th}}^2 V$
term into the model is significantly larger than the one coming from the Hammett–Perkins-like
$A_3 n v_{\mathrm{th}}^2 \partial _{x} v_{\mathrm{th}}$
term, as seen in figure 2(a).
3.2. Two-stream instabilities
As noted above, broadly speaking, the same six-term model is found for two-stream instabilities as for Landau-damped Langmuir waves, at an even lower FVU than in the Landau damping simulations of between 0.5 % and 5 %. Furthermore, the term
$\propto n v_{\mathrm{th}} V^2$
is again found in cases with low
$\lvert {\gamma }\rvert$
– in fact, it appears for all
$n_b/n_e \lt 0.4$
(but is left out of figure 3 for readability). Also, some coefficients are close to zero at certain
$n_b/n_e$
values – see figure 3(a). Such near-zero coefficients are generally not found consistently. For example, when
$n_b = 0.5 n_e = n_c$
, the lack of a preferred direction forces all
$q_{\mathrm{odd}}$
coefficients to zero, just like for a Landau-damped standing wave. The sub-1 % FVU values are achieved in the middle of the examined beam density range, where amplitudes are relatively small but growth is still rapid enough for trapping effects to be mostly negligible throughout the linear growth phase.
Both in these set-ups and the Landau damping ones examined above, the growth (or decay) phase value of
$A_3$
tends to be roughly twice
$A_2$
. An exact factor of 2 would correspond to having a single term
$\propto v_{\mathrm{th}} \partial _{x} p = v_{\mathrm{th}}^3 \partial _{x} n + 2 n v_{\mathrm{th}}^2 \partial _{x} v_{\mathrm{th}}$
, i.e. a pressure gradient driven contribution to the heat flux, whereas the
$A_2$
and
$A_3$
terms, on their own, correspond to contributions due to density and temperature gradient driven heat flux contributions, respectively. Notably, while
$A_{2}$
and
$A_3$
are negative for Landau damping they are positive in the growth phase of the instability, where inverse Landau damping occurs. Interestingly, the same holds for
$A_6$
in most cases examined here, despite the fact that the wave propagation is towards
$-x$
here, whereas it is towards
$+x$
for the Langmuir waves examined in figure 2. As we will see in § 3.4, this can be explained quite well by the constraints imposed on the coefficients from linear collisionless theory. Being a
$k$
-odd term, the
$A_6$
term represents a contribution to the heat flux coming from the pressure times the rate of change of flow velocity in the direction of local wave propagation.
So far, we have considered model coefficients obtained only for the growth phase of the instability. Now, we will consider both the growth phase and the saturated phase of the simulation, with coefficients obtained in a time-resolved manner. We find that while the same model terms are sufficient to accurately approximate the heat flux throughout the simulation,Footnote
4
some of the coefficients vary significantly across these phases. Overall,
$A_3$
is very well correlated with the instantaneous growth rate, as is
$A_2$
and
$A_6$
. This is seen in figure 3(b), but is even more apparent if one plots the three coefficients in question against the instantaneous growth rate
$\gamma (t) = \frac {1}{2} \partial _t \ln \langle {\boldsymbol{E}^2}\rangle$
itself – see figure 4. What is interesting is that while the amplitude of the oscillation in
$A_3$
decreases along with that of the oscillation in
$\gamma (t)$
, the other two growth-related coefficients exhibit a far smaller change in oscillation amplitude from the growth phase to the saturated phase. It is also notable that while
$A_2$
and
$A_6$
are practically equal during the saturated phase, they are desynchronised during the growth phase, with
$A_2$
reaching its growth-phase maximum earlier than
$A_6$
. Note, however, that while
$A_2$
and
$A_6$
are out of phase in this way during the growth phase for all examined values of
$n_b/n_e$
(disregarding the symmetric set-up with
$n_b/n_e = 0.5$
, where
$A_6 = 0$
), the exact nature of their relationship varies depending on
$n_b/n_e$
, as seen comparing the two cases shown in figure 4. In general, they synchronise earlier in simulations with higher beam density. Furthermore, the fact that the oscillation amplitudes of
$A_2$
and
$A_6$
are approximately equal only holds when
$n_b/n_e \sim 0.1$
, as one might suspect from the fact that
$A_6 \to 0$
as
$n_b/n_e \to 0.5$
by virtue of being
$k$
-odd.

Figure 4. Three growth-/damping-related coefficients
$A_2$
,
$A_3$
and
$A_6$
, compared with the instantaneous growth rate (a) in the case where
$n_b/n_e = 0.1$
, also shown in figure 3(b), and in the case where
$n_b/n_e = 0.2$
. As we can see,
$A_3$
is very nearly precisely proportional to
$\gamma$
, while the other two coefficients are also strongly correlated with it.
Compared with the growth-related terms,
$A_1$
and
$A_5$
(as well as
$A_4$
) vary relatively slowly over time – especially
$A_5$
. Overall, the value of
$A_1$
fits quite well with the heuristic
$A_1 \sim -3 + \frac {1}{2}\, \textrm {sgn}(k) A_5$
obtained from linear theory in the limit where
$\lvert {\gamma }\rvert \ll \omega _r \sim \omega _{\mathrm{pe}} \sim \lvert {k}\rvert \bar {v}_{\mathrm{th}}$
(see § 3.4), the actual values in this case being
$\lvert {\gamma }\rvert \lt {0.2}\mathrm {\omega _{\mathrm{pe}}}$
,
$\omega _r = {0.80}\mathrm {\omega _{\mathrm{pe}}}$
and
$\lvert {k}\rvert \bar {v}_{\mathrm{th}} \sim {0.4}\mathrm {\omega _{\mathrm{pe}}}$
. The one notable exception to this is the very early growth phase, where the prevalence of high-
$k$
noise means that the physics involved is very far from this limit. In fact, in the later parts of the saturated phase, the heuristic performs better than expected, given the fact that the merging of electron holes should decrease the characteristic wavenumber
$\lvert {k}\rvert$
as time goes on. There also appears to be a contribution from the growth rate
$\gamma$
on top of this, leading to slight oscillations in
$A_1$
over time, consistent with the constraints imposed by linear theory as outlined in § 3.4.
When it comes to giving a physical interpretation of these non-growth-correlated terms,
$A_4$
can be thought of as a global heat flux induced by passing waves. As seen in § 3.4, this term is beyond the purview of linear theory – in fact, the exact mechanism giving rise to this contribution is unclear. However, the
$A_4$
term does not affect the divergence of the heat flux, which is what one is ultimately seeking to model when creating a closure for the pressure equation. The
$A_1$
term is transparently a product of pressure and flow velocity, and it provides either the most or the second most important contribution to
$q$
, depending on
$n_b/n_e$
. Its appearance in our closure might be related to the fact that
$\left \{\boldsymbol{V}\unicode{x1D665}\right \}$
is part of the expression relating the energy flux
$\unicode{x1D64C} = \int {\rm d}^3{\boldsymbol{v}} \boldsymbol{v}^{(3)} f$
to the heat flux
$\unicode{x1D666}$
:
$\unicode{x1D64C} = n \boldsymbol{V}^{(3)} + 3\left \{\boldsymbol{V}\unicode{x1D665}\right \} + \unicode{x1D666}$
. Thus, having a term in our
$q$
model equal to specifically
$-3\left \{\boldsymbol{V}\unicode{x1D665}\right \}$
(i.e. having
$A_1 = -3$
, which is quite a typical value found by SR) would signify a cancelling of this term in the energy flux. In the 1-D pressure equation, having an
$A_1$
term as part of
$q$
similarly leads to partial cancellations. Specifically, writing the rest of
$q$
(i.e.
$q$
excluding the
$A_1$
term) as
$q_r$
, the equation reduces to

so that the value
$A_1 = -1$
would cancel the second term and
$A_1 = -3$
would cancel the third term. The final of the three most important terms in our
$q$
model (see § 3.3), i.e. the
$A_5$
term, is
$k$
-odd. Thus, it is clearly related to the wave propagation direction. Furthermore, in set-ups like ours which are in the CoM frame,
$V$
oscillates around zero while
$n$
and
$v_{\mathrm{th}}$
have positive equilibrium values. As discussed in § 3.4, this means that all of the terms in our six-term model contribute to
$q$
at first-order in perturbation theory. While there are many terms in our term library that do this, there are actually only two which contribute to
$q$
at zeroth-order in perturbation theory due to how our term library is constructed – precisely the
$A_4$
and
$A_5$
terms. This may be related to why they are identified by SR as being useful for modelling
$q$
.
The fact that some of the coefficients correlate with growth (or decay) – and as such, differ significantly between the growth and the saturated phases of the instability – means that we should not expect to find a closure with fixed coefficients which is accurate throughout both phases with respect to the contribution from these terms. To capture both phases, one would need coefficients which are informed about the phase, through e.g. a volume-averaged electric-to-thermal energy ratio. We do not aim to provide such closures here. However, as we shall see in the next section, the growth-related terms contribute relatively little to the accuracy of the model compared with the
$A_1$
,
$A_4$
and
$A_5$
terms, meaning the closure we obtain in the growth phase remains quite accurate even in the saturated phase.
3.3. Quantifying the importance of terms:
$\delta _{{FVU}}$
To get a better sense of the circumstances under which each term in our closure is important, let us quantify their individual contributions by how much their exclusion increases the FVU of the closure, a measure we will refer to as
$\delta _{{FVU}}$
. Doing this for the various time slices examined in figure 3(b), we get figure 5.
The two by far most important terms are the two terms with order unity coefficients, i.e.
$A_5$
and
$A_1$
. The next most important term by
$\delta _{{FVU}}$
is the constant term
$A_4$
, although, as noted previously, this term is not very relevant to the accuracy of the closure since it has no impact on
$\partial _{x} q$
. Among the growth-related terms,
$A_3$
is overall the most important by some margin, while
$A_6$
and
$A_2$
are the least important terms –
$A_6$
, in general, being slightly more important than
$A_2$
in this case. Interestingly,
$A_2$
is important mostly in the first half of a linear growth or decay process, while
$A_6$
mostly matters during the latter half. This agrees very well with what one might guess from solely looking at the sizes of the coefficients in question in figure 3(b). While it is not obvious that this should be the case, it is reasonable from the perspective that if the best fit for a coefficient is zero at some point in time, its importance is necessarily also zero. Since the achieved FVU is at best
${\sim} {5\times 10^{-3}}{}$
, any term with
$\delta _{{FVU}} \lesssim 10^{-4}{}$
can be safely assumed to be irrelevant at our level of accuracy for describing the physics during that time range.

Figure 5. Importance of the six terms found by SR as they vary over time in the two-stream unstable set-up with
$n_b/n_e = 0.1$
, as measured by
$\delta _{{FVU}}$
.

Figure 6. A comparison of the OSIRIS
$q$
data (top left) from the
$n_b/n_e = 0.1$
simulation with our six-term SR model (top middle) and a local Hammett–Perkins model equivalent to keeping only
$A_3$
and
$A_4$
(top right), as well as the resulting
$\partial _{x} q$
(bottom row). Both models are trained solely on data from the linear growth phase, corresponding to
$19.0 \lt \omega _{\mathrm{pe}} t \lt 38.0$
. The six-term model performs very well even in the saturated regime, corresponding to
$\omega _{\mathrm{pe}} t \gtrsim 40$
. All mass-normalised heat fluxes, like the
$A_4$
coefficient, are given in units of
$\bar {n}c^{3}$
, and spatial derivatives of such quantities are given in units of
$\bar {n}\omega _{\mathrm{pe}} c^{2}$
.
The fact that the most important terms,
$A_1$
and
$A_5$
, vary quite slowly means that regardless of on which time range we perform the regression, we should expect the resulting closure to work quite well over the entire simulated time range. And indeed, this is what one finds if one plots the six-term model
$q = q_{\mathrm{even}} + q_{\mathrm{odd}}$
with coefficient values from (e.g.) the growth phase over the entire space–time domain of the simulation and compares it to a plot of the actual
$q$
output by OSIRIS – see figure 6.
This is especially promising since one of the primary use cases envisioned for these kinds of closures is sub-grid scale modelling within a larger simulation, where the instability occurs on a very short time scale compared with the overall time scales of interest. In such a situation, modelling the saturated phase ‘end state’ where
$\gamma \approx 0$
is the most important. The fact that there is no growth on average means that the six-term model can be reduced to a three-term model with only
$A_1$
,
$A_4$
and
$A_5$
– and, of course, providing a value for
$A_4$
is unnecessary if one is only interested in solving the fluid equations, since
$A_4$
does not affect
$\partial _{x} q$
.
In general, using a six-term model trained solely on growth phase data like in figure 6 tends to yield FVU
$\sim1\,\%$
during the growth phase and FVU
$\sim5\,\%{-}10\,\%$
in the saturated phase, while using a six-term model trained on data from the saturated phase where there is no net wave growth (or, more-or-less equivalently, a three-term model with only the
$A_1$
,
$A_4$
and
$A_5$
terms) yields FVU
$\sim5\,\%{-}10\,\%$
in the growth phase and FVU
$\sim2\,\%{-}5\,\%$
in the saturated phase. Thus, our six-term (or indeed three-term) model seems to be largely sufficient for modelling the saturated phase, despite the presence of nonlinear phenomena like particle trapping and soliton-like phase-space electron holes.
3.4. Comparison with constraints from linear theory
During linear decay or growth, the plasma should be well described by linear collisionless theory. As explained in Appendix A, this gives us two predictions relating the different closure coefficients, namely

where we are using shorthand notation

Let us first consider some limiting cases. First, take a marginally stable perturbation with
$\gamma \to 0$
, where (3.3) simplifies to

with
$\omega = \omega _r$
real. We can immediately see that one solution of the first of these equations is
$A_2 = A_3 = A_6 = 0$
, in agreement with the relationship
$A_{2,3,6} \sim \gamma$
found via SR. The second equation is less straightforward to interpret. If
$\omega \sim \omega _{\mathrm{pe}} \sim \lvert {k}\rvert \bar {v}_{\mathrm{th}}$
, like in most of our simulations, we expect
$A_1 \sim -3 + \frac {1}{2}\, \textrm {sgn}(k) A_5$
– and this should hold as a rule of thumb even when
$\gamma$
is non-zero but small compared with
$\omega _r$
. Qualitatively, this agrees decently with our results, even those from the growth phase where
$\gamma \gt 0$
(but in most cases
$\lt \omega _r$
). Generally, both
$A_1$
and
$A_5$
are order unity, and
$A_1$
is shifted a bit upwards from
$-3$
in figures 2 and 3 for all set-ups we consider except the symmetric two-stream set-up, matching the fact that
$\textrm {sgn}(k A_5) = +1$
. That this case should disagree with our rule of thumb is not surprising, since it has
$\omega _r \approx 0$
, while
$\gamma$
is finite.
The growth rate is truly negligible mainly during parts of the saturated phase of our two-stream simulations. However, the saturated phase is generally dominated by nonlinear physics, thus linear theory should not be expected to give accurate predictions. Therefore, we restrict our comparison with (3.5) to the start at the saturated phase, before the created electron holes start to merge. Specifically, let us examine the time around when the peak average
$\boldsymbol{E}$
-field energy is reached, marked in figure 7(a) for the case where
$n_b/n_e = 0.1$
. Performing SR over the region where the instantaneous growth rate satisfies
$\lvert {\gamma (t)}\rvert /\omega _{\mathrm{pe}} \lt 0.02$
near this peak for each two-stream simulation and comparing the SR value of
$A_1$
with the value predicted by (3.5) yields figure 7(b). As we can see, the agreement is good for weaker beam strengths, but becomes less accurate when approaching
$n_b/n_e = 0.5$
, where the physics involved is more nonlinear by virtue of the larger perturbation amplitudes.

Figure 7.
(a) Time range around peak
$\boldsymbol{E}$
-field energy density where
$\lvert {\gamma }\rvert /\omega _{\mathrm{pe}} \lt 0.02$
for
$n_b/n_e = 0.1$
. (b) Values of
$A_1$
found by SR in the two-stream simulations during the equivalent time range for all examined values of
$n_b/n_e$
, compared with the linear theory prediction at
$\gamma = 0$
given by (3.5), inserting the values of
$A_5$
found by SR.
If we instead consider the limit
$\omega _r \to 0$
, corresponding to a non-oscillatory – but possibly growing or decaying – perturbation, we get

Similar to the case with
$\gamma \to 0$
, the first equation allows for a solution where
$A_6 = A_5 = 0$
, consistent with the lack of wave propagation – in fact,
$A_4$
can also be set to zero. As for the second equation, if we insert inferred parameter values from the symmetric two-stream unstable set-up,

we get

giving a prediction of

The coefficient values found by SR in this case are

and if we insert our values for
$A_2$
and
$A_3$
into the approximate expression for
$A_1$
, we get
$A_1 = -3.79$
, which is reasonably accurate considering the quite broad
$k$
spectrum.
Performing a similar comparison between the values of
$A_1$
and
$A_6$
found by SR during linear decay/growth, and those predicted by (3.3) for all simulations, given the other coefficients as input, yields figure 8. The agreement is decent for both the Landau damping simulations and the two-stream instability ones. Interestingly, the two-stream instability simulations agree even better with linear theory, despite their more broad-spectrum nature. This is likely due to a combination of several factors. The instantaneous decay rate in the Landau damping case oscillates throughout the decay, which means that assigning the
$A_2$
,
$A_3$
and
$A_6$
coefficients a single value for the entire decay phase is less accurate than doing the same for the growth phase in the two-stream simulations. In addition, the space–time domain is larger in the two-stream simulations, yielding higher-resolution DFT spectra.

Figure 8. Values of
$A_1$
and
$A_6$
found by SR during the growth phase compared with the linear theory prediction given by (3.3), inserting the values of the other coefficients found by SR. The plot on the left contains the results from the Landau-damped Langmuir wave simulations, while the plot on the right contains the results from those with two-stream unstable set-ups.
3.5. A term
$\propto n v_{\mathrm{th}} V^2$
The terms making up the six-term model are in some cases not the only ones found consistently by SR. In particular, a term
$\propto n v_{\mathrm{th}} V^2$
appears consistently in regressions over the growth phase when
$\lvert {\gamma }\rvert$
is small – at
$\bar {v}_{\mathrm{th}}/c \lt {0.01}{}$
corresponding to
$\lvert {\gamma }\rvert /\omega _{\mathrm{pe}} \ll {0.15}{}$
for the Landau damping simulations and at
$n_b/n_e \lt 0.4$
, i.e.
$\lvert {\gamma }\rvert \lt {0.27}\mathrm {\omega _{\mathrm{pe}}}$
, in the two-stream simulations. Like the terms in
$q_{\mathrm{odd}}$
, this term is
$k$
-odd, switching sign depending on the propagation direction of the waves. Notably, the cases where the term shows up are precisely those where one would expect a
$k$
-odd trapping-related term to show up, corresponding to high
$\Delta t_{\textrm {L}}/t_b$
and a clear wave propagation direction (i.e. high
$\lvert {\omega /k}\rvert$
).
Examining how the importance of this term evolves over time, as measured by
$\delta _{{FVU}}$
, we consistently find that it is almost as important as the
$A_2$
and
$A_6$
terms. At a few instances during the simulations, however, it is significantly more important than these two terms and sometimes even more important than
$A_3$
. It is never as important as the
$A_1$
and
$A_5$
terms, however. Consistently, the periods where it is of higher importance occur towards the end of linear processes, when
$\lvert {\gamma (t)}\rvert$
is decreasing, as can be seen in figure 9(a), which shows the case
$n_b/n_e = 0.1$
. Because of this, it might be reasonable to refer to this term as a kind of ‘braking term’, working to damp ongoing growth or decay.
As for the value of the coefficient itself (which we can call
$C$
), it correlates well with
$\gamma (t)$
starting in the latter half of the growth phase. This can be seen in figure 9(b), where the time evolution of the
$C$
coefficient in the case where
$n_b/n_e = 0.1$
is shown. Unlike terms
$A_2$
,
$A_3$
and
$A_6$
, however, it only changes sign once for this value of
$n_b/n_e$
, going from positive in the growth phase to negative in the saturated phase. However, its magnitude continues to oscillate along with the growth-related terms even in the saturated phase. The high-frequency noise (and oscillation in
$\gamma$
) in the beginning of the growth phase causes the best fit value to oscillate wildly during this time span, and because of this, we plot
$C$
only after these oscillations start to die down. At other values of
$n_b/n_e$
, the time evolution of
$C$
is similar, but the average value of the coefficient in the saturated phase, around which it oscillates, varies. More specifically, both the average value and the oscillation amplitude of
$C$
seem to grow in absolute value as
$n_b/n_e$
increases, until the term becomes unimportant at
$n_b/n_e \to 0.5$
like the other
$k$
-odd terms.
This behaviour can be partly explained by the fact that
$n v_{\mathrm{th}} V^2$
is second-order in
$r$
since
$V \sim r v_{\textrm {ph}}$
, while both
$n$
and
$v_{\mathrm{th}}$
have non-zero unperturbed values. More specifically, because of its second-order nature, we expect this term to matter only when the processes of interest deviate significantly from linearity.

Figure 9. Variation over time of (a) the importance of the braking term
$C n v_{\mathrm{th}} V^2$
as measured by
$\delta _{{FVU}}$
and (b) the coefficient value
$C$
itself, for the two-stream unstable set-up with
$n_b/n_e = 0.1$
. In panel (a), we also show the
$\delta _{{FVU}}$
values corresponding to the
$A_2$
,
$A_3$
and
$A_6$
terms for reference, illustrating how the braking term is of similar importance. We highlight the time ranges where the
$\delta _{{FVU}}$
value is above the semi-arbitrary cutoff
${4\times 10^{-4}}{}$
, largely corresponding to decreasing
$\lvert {\gamma (t)}\rvert$
. We also show the time evolution of the
$\boldsymbol{E}$
-field energy in arbitrary units. In panel (b), we show the instantaneous growth rate
$\gamma (t)$
, illustrating its correlation with
$C$
. In the highlighted region, the prevalence of high-frequency noise and rapid oscillation in the growth rate causes the best-fit value of
$C$
to oscillate wildly. To improve legibility, we thus only show
$C$
at
$\omega _{\mathrm{pe}} t \gt 20$
.
4. Discussion and conclusions
Using methods from across the spectrum between physics fidelity on one hand and numerical tractability on the other is vital in exploring and understanding collisionless multi-scale plasma systems. In this context, collisionless fluid models play an important role, allowing global, long-time scale modelling of systems where this is not feasible with kinetic simulations. However, fluid models require closures to capture essential unresolved kinetic physics relevant to the plasma phenomena in question. Furthermore, theoretical closures are often derived by making idealised assumptions such as linearity or adiabatic invariance, which in many cases are broken by the dynamics of the system, motivating the use of data-driven approaches.
We have performed a theoretical and numerical study of how sparse regression can be used to discover local heat flux closures in 1-D electrostatic plasmas, examining Landau-damped Langmuir waves as well as two-stream instabilities. To ensure our inferred closures are able to capture kinetic effects, we generate our data using first-principles kinetic simulations – specifically, the OSIRIS code. The closures identified by sparse regression regularly account for more than 95% of the variation in the heat flux, while remaining limited in complexity. Thus, we demonstrate the utility of an SR-based approach to systematic closure discovery.
As noted in § 2, the high accuracy of the models found by SR suggests that our term library is large enough to capture the majority of the physics at play. To test whether there is room for improvement by exploring a larger function space (while having the closure remain local), one could employ neural network models and see how they perform compared with our SR models. No such analysis is included in this paper, however, since our models already reach FVU values of a few percent.
In addition to a local approximation of the Hammett–Perkins closure, we consistently find several additional closure terms in both scenarios. Notably, the three overall most important terms – often accounting for over 90% of the variation in
$q$
– do not include the local Hammett–Perkins term. As one of these is a constant term, most of the variation in the heat flux divergence is captured by only two terms in the
$q$
model: one
$\propto n v_{\mathrm{th}}^2 V$
and one
$\propto n v_{\mathrm{th}}^3$
. We further describe how the accuracy of the closure can be further improved by adding three more terms. These include a term
$\propto n v_{\mathrm{th}}^2 \partial _{x} v_{\mathrm{th}}$
, the local approximation of the Hammett–Perkins closure, along with terms
$\propto v_{\mathrm{th}}^3 \partial _{x} n$
and
$\propto n v_{\mathrm{th}}^2 \partial _{x} V$
. These terms are closely connected to the growth or decay of waves, their best-fit coefficients being approximately proportional to the growth rate.
Among these six terms, three are independent of propagation direction and three change sign depending on propagation direction (and are thus ‘
$k$
-even’ and ‘
$k$
-odd’, respectively). In a three-dimensional (3-D) setting, this likely corresponds to the absence or presence of a unit vector in the wave propagation direction in the tensorial expression for the closure terms, so that terms with
$k$
-odd expressions would appear as e.g.
$\propto \{ \hat {\boldsymbol{k}} \boldsymbol {\varOmega} \}$
for some two-tensor
$\boldsymbol {\varOmega}$
. This dependence on propagation direction for the
$k$
-odd terms also means that they can only exist when wave propagation or beam asymmetry breaks isotropy.
Having reached these results, we compared the closure coefficients with predictions from linear collisionless theory, overall with quite good agreement. The two constraints imposed by linear theory give relationships between the coefficients of the various terms found by SR and the wave parameters
$\omega _r$
,
$\gamma$
and
$k$
, as well as the plasma frequency
$\omega _{\mathrm{pe}}$
and ambient thermal speed
$\bar {v}_{\mathrm{th}}$
of the plasma. Not entirely unexpectedly, the appearance of frequency and wavenumber in these constraints suggests that fully capturing wave–plasma interactions requires a spatio-temporally non-local model.
Local approximations can still very much be used, however, provided one knows which parameter regimes are likely to be most relevant for the physics – cf. the local approximation of the originally non-local Hammett–Perkins closure, which needs to be supplied with a characteristic wavenumber and heat diffusivity. Already from the two constraints given by linear theory, various heuristics can be extracted depending on the parameter regime of interest. Ultimately, all parameter dependencies should be made explicit and absorbed in the closure terms, so that the coefficients are parameter-independent, giving the closure as wide a range of applicability as possible. Further elucidating the parameter dependencies of the closure coefficients is left for future work.
After examining these six terms, which are all zeroth- or first-order in the perturbation amplitude
$r = \tilde {n}/\bar {n}$
, we also studied one additional term with non-negligible influence on
$q$
. This term,
$\propto n v_{\mathrm{th}} V^2$
, is second-order in
$r$
, and was found to mostly be important when growth or decay is slowing down. The importance of each of these seven terms was then investigated by examining their respective contributions to lowering the fraction of variance unexplained, or FVU, of the model.
Despite the high accuracy of the models described in this work, expanding the term library used by SR does merit some further investigation. Apart from more complicated expressions involving
$n$
,
$V$
and
$v_{\mathrm{th}}$
, the
$\boldsymbol{E}$
and
$\boldsymbol{B}$
fields are also of interest – especially for higher-dimensional non-electrostatic set-ups. However, if the relevant physics is 2-D or 3-D, the number of relevant components of the vectors and tensors involved increases as well, necessitating even more careful selection of which terms to include in the SR term library. There is also reason to explore alternative algorithms for sparse regression such as SINDy-PI (Kaheman, Kutz & Brunton Reference Kaheman, Kutz and Brunton2020), which generalises PDE-FIND to allow for implicit expressions for the quantity of interest
$y$
. It should further be noted that using SR in Fourier space to directly identify non-local closures like the one originally proposed by Hammett and Perkins for Landau damping warrants more investigation.
As outlined in § 1, exploring ways of systematically discovering fluid closures is chiefly motivated by the enormous computational complexity of accurately modelling multi-scale processes in plasmas. In particular, one of the main envisioned use cases for closures of the type presented in this paper is sub-grid scale modelling of rapid, small-scale processes within larger simulations. When modelling instabilities in this way, the far future limit of the saturated regime (relative to the time scales of the instability) is the most important regime to model correctly.
Finally, evaluating the performance of – and fully benefiting from – data-driven closures of this type requires a flexible implementation of closure terms in collisionless fluid solvers, such as the 10-moment solver of Gkeyll (Hakim, Loverich & Shumlak Reference Hakim, Loverich and Shumlak2006; Hakim Reference Hakim2008), and comparing the results with equivalent kinetic simulations. Flexible closure prescription would also be a requirement for e.g. the robust – and challenging – approach demonstrated by Joglekar & Thomas (Reference Joglekar and Thomas2023). In that work, the free parameters of a fluid closure are learned by neural network models, and a differentiable fluid solver is used, enabling the calculation of the loss function gradient with respect to the neural network weights. The loss function in this case quantifies the difference between the long time predictions of physics observables as calculated by a kinetic and the fluid solver. The complexity of this training process limits the number of free parameters. We envision that the SR approach we explored here can inform a good (and interpretable) starting point for closure parametrisation. The parametric dependences of the coefficients of the model terms may then be be fine-tuned with a differentiable fluid solver.
It should be noted that implementing fluid closures in existing fluid codes is far from trivial, due to the possibility of unphysical instabilities arising unless the closure is chosen with this aspect in mind; indeed, arbitrarily small errors in model coefficients may lead to numerical instability. Enforcing long-time boundedness of discovered models is possible in systems with quadratic nonlinearities (Kaptanoglu et al. Reference Kaptanoglu, Callaham, Aravkin, Hansen and Brunton2021a
), but it is difficult more generally. In fact, many closures (e.g. the various existing ad hoc relaxation closures (Wang et al. Reference Wang, Hakim, Bhattacharjee and Germaschewski2015; Ng et al. Reference Ng, Huang, Hakim, Bhattacharjee, Stanier, Daughton, Wang and Germaschewski2015)) are in part used because of their ability to ‘diffuse’ anisotropies, reducing the complexity of dynamics and increasing the stability of the simulation. In our context, the signs of the closure coefficients (in particular, that of the Hammett–Perkins-like term
$A_3$
) found in the Landau damping scenario are such as to increase entropy and thus provide stability. However, they correspond to an instability in the two-stream unstable scenario. It should be emphasised that having such fundamentally instability-driving terms in our closure in this case is necessary to accurately model the growth phase solely because we are modelling this physically unstable scenario using only a single electron species. Treating the counter-streaming populations as separate fluid species may be applied to resolve this shortcoming, which is the subject of ongoing investigations.
Notably, however, closures found by SR such as those described in this paper hold an advantage when it comes to ensuring stability as compared with those based on e.g. neural networks. This is due to their interpretability and low complexity, which makes it possible to study the properties of the closures analytically – something which is generally not feasible for neural networks.
Acknowledgements
The authors are grateful to K. Steinvall, D. Graham and T. Fülöp for fruitful discussions. The computations used the OSIRIS particle-in-cell simulation code, and were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement No. 2022–06725.
Editor Nuno Loureiro thanks the referees for their advice in evaluating this article.
Funding
The work was supported by the Knut and Alice Wallenberg foundation (Dnr. 2022.0087) and the Swedish Research Council (Dnr. 2021-03943). This work was also supported by the National Science Foundation Grants No. PHY-2018087 and PHY-2018089.
Declaration of interests
Competing interests: The authors declare none.
Appendix A. Constraints from linear collisionless theory
In this section, we construct constraint relations between terms appearing in a local expression for the heat flux, which then can be applied to the closure terms found using SR.
Our starting point is the Vlasov–Maxwell system, i.e.

which is the most accurate self-consistent continuum description of collisionless plasmas. However, since this coupled system of partial differential equations (PDEs) is very expensive to solve over large domains while retaining high resolution, it is often necessary to simplify it. Most relevant for us is the fact that one can integrate the Vlasov equation over velocity space to instead get the fluid equations (Grad Reference Grad1949; Levermore Reference Levermore1996). Truncating these after the pressure equation yields the system sometimes referred to as the 10-moment model (Wang et al. Reference Wang, Hakim, Bhattacharjee and Germaschewski2015),

which needs to be closed by supplying an additional expression for
$\unicode{x1D666}_\sigma$
(or
$\boldsymbol {\nabla } \cdot \unicode{x1D666}_\sigma$
) in terms of the lower moments. Here,
$\left \{\cdot \right \}$
denotes symmetrisation, so that e.g.

and
$\unicode{x1D665}_\sigma \times \boldsymbol{B}$
should be interpreted as the two-tensor with elements
$\left [\unicode{x1D665}_\sigma \times \boldsymbol{B}\right ]_{ij} = \epsilon _{jkl} \, p_{\sigma ik} B_l$
.
In 1-D electron–proton set-ups like those of interest to us, where only the electron dynamics are important, the 10-moment fluid model (sans closure) together with Maxwell’s equations simplifies to

Note that the two remaining Maxwell’s equations imply the continuity equation. Here, we have taken the ions to be immobile with number density equal to the average electron density
$\bar {n}$
to ensure quasi-neutrality. Now, let us consider a small wave-like perturbation around equilibrium in the CoM frame, i.e.

Here, we assume the wavenumber
$k$
to be real, but the frequency
$\omega = \omega _r + i\gamma$
is allowed to be complex,
$\gamma$
being the growth rate. Of course, one could equally well set
$\omega = \omega _r - i\gamma$
, taking
$\gamma$
as the decay rate.
Inserting ansatz (A.5) into (A.4) and keeping only terms up to first-order in the perturbations, we get the relations making up 1-D linear collisionless theory:

or equivalently

To make the notation neater, we have introduced the (complex) phase velocity
$v_{\textrm{ph}} = \omega / k$
and the shorthand notation
$r$
for
$\tilde {n} / \bar {n}$
, quantifying the amplitude of the perturbation. We have also introduced the electron plasma frequency
$\omega _{\mathrm{pe}} = \sqrt {e^2 \bar {n} / m_e \varepsilon _0}$
. Note that when the ion dynamics is negligible, any heat flux closure would need to agree with the expression for
$\tilde {q}$
to first-order in
$r$
to be viable for modelling weak wave-like perturbations.
A.1 Evaluating the six-term closure
For a closure to be consistent with theory, the heat flux perturbation amplitude
$\tilde {q}$
given by the closure must agree with the final part of (A.7), i.e.

to first-order in
$r$
. To see whether this is indeed the case, let us calculate the contribution to
$\tilde {q}$
from each of the terms in the six-term model. For the terms in
$q_{\mathrm{even}}$
, we get

keeping only terms up to first-order in
$r$
. In other words,

As for the terms in
$q_{\mathrm{odd}}$
, the constant term
$A_4$
does not contribute to
$\tilde {q}$
, but the contribution from the other two terms is non-zero:

Thus, the contribution from
$q_{\mathrm{odd}}$
is

Defining

as well as

demanding (A.8) hold for our closure is equivalent to demanding

or equivalently

If we now define

and solve for
$A_1$
and
$kA_6$
, the result can be simplified to the form of (3.3), giving us two constraints on the coefficients which we can check.
Appendix B. Demonstration: recovery of the momentum equation
While the main use of sparse regression in this paper is to discover unknown approximate relations between the heat flux and lower-order fluid quantities, it is useful to verify that our workflow is able to identify known exact relations that the simulation data must obey. To illustrate that this is indeed the case, we here show how one can recover the density-normalised version of the 1-D momentum equation for electrons,

from our two-stream simulation data. Note that the arbitrary normalisation of
$n$
to the unperturbed total electron density
$\bar {n}$
does not affect the the logarithmic derivative. In the sparse regression, we now set
$\partial _t V$
as our target variable
$y$
and use a term library including all products of up to two terms from the set consisting of
$n$
,
$V$
,
$T$
and
$E$
, their spatial derivatives and a constant term.
We work with the
$n_b/n_e = 0.1$
two-stream instability simulation, sampling
${\sim} {2.5}\,\%$
of the data in each of the 10 cross-validation folds. This time, we take samples from almost the entire time domain, and not just e.g. the growth phase. The sparse regression algorithm yields the sequence of models for
$\partial _t V$
shown in figure 10(a).
As shown by the marker shapes – indicating whether models contain consistently the same terms (circles) or different ones (triangles) – SR correctly recovers the momentum equation and finds no further terms consistently. That the model is complete at four terms is also supported by the fact that the accuracy plateaus after this point, at an FVU of
${\sim} {3.26\times 10^{-4}}{}$
. This signifies that
${\sim} {99.97}\,\%$
of the variation in
$\partial _t V$
can be explained by the 1-D momentum equation as given in (B.1). SR also gives us the coefficients of the terms in the equation with an error of less than half a percent; these are listed in table 3.
Table 3. Optimal coefficient values for the terms in the various models found by sparse regression to approximate the momentum equation as given in (B.1). The numbering of the models is the same as in figure 10. The theoretical value of almost all coefficients multiplying the terms shown in the first row is
$1$
. The only exception is the non-physical
$\partial _{x} E$
term, which is only found consistently when the staggering of the
$E$
field data is not corrected for (model 5). This coefficient should have the value
$5\times 10^{-4}$
to provide a first-order correction for the half-cell size shift of the field data.


Figure 10. Sequence of models found by sparse regression to approximate
$\partial _t V$
, recovering the momentum equation as given in (B.1). The consistently found models are labelled by the new consistently found term which is added to the model compared with the next most simple one. The coefficients for all terms in each model are provided in table 3 (with the same numbering of models as here). Circles, unlike triangles, denote consistently found terms, and blue (orange) marker colour corresponds to testing (training) data (note that they overlap almost completely here). The two panels show the results (a) when using
$\boldsymbol{E}$
-field data which has been corrected for the half-cell grid shift with respect to the fluid quantities, which finds only the expected terms, and (b) when using uncorrected
$\boldsymbol{E}$
-field data, which also finds a non-physical
$\partial _{x} E$
term adding the correction back in.
Next, we illustrate the importance of appropriately aligning the
$\boldsymbol{E}$
-field data with the fluid data in space. Misalignment between particle and field data naturally occurs in PIC schemes using a staggered Yee grid (Yee Reference Yee1966), where the
$\boldsymbol{E}$
- and
$\boldsymbol{B}$
-field nodes appear in cell edges and cell faces, respectively, while the particle data are often cell-centred.Footnote
5
This discrepancy then propagates through to any fluid and electromagnetic field data exported from the simulation.
In figure 10(b), we show the result of performing SR on the same simulation data without correcting for the misalignment of the electric field data. The first four consistently found terms are the same as before, but now this four-term model is less accurate, with an FVU
$\gt 10^{-3}$
. A fifth term
$\propto \partial _{x} E$
is also consistently identified, however, and if retained, increases the accuracy to match that achieved with corrected
$\boldsymbol{E}$
-field data. This term corresponds to performing a first-order Taylor expansion to evaluate
$E$
half a grid cell further towards
$-x$
, since
${E}\Big |_{x - \frac {1}{2} \Delta x} = {E}\Big |_{\mathrm{x}} - \frac {1}{2} \Delta x {\partial _{x} E}\Big |_{\mathrm{x}}$
. Specifically, if we insert our spatial resolution
$\Delta x = {10^{-3}}{\delta _e}$
, we find that

corresponding exactly to the correction introduced by SR, since
$\delta _e$
is our unit of length.