The coronavirus disease 2019 (Covid-19) pandemic is a high-risk infectious type of pneumonia [Reference Zhu1], whose epicentre was Wuhan, Hubei Province, China [Reference Zhu1, Reference Huang2] and, affected across the world, causing a global public health emergency [Reference Sohrabi3]. The pandemic is caused by severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2), which is a member of the β-CoV genus [Reference Cui, Li and Shi4]. The genome of this virus is + ssRNS (single-stranded positive-sense RNA virus) [Reference Cui, Li and Shi4, Reference Fehr and Perlman5], having approximately 30 kb genome size [Reference Shi6] with a size 100–160 nm when it is in the compact form [Reference Fehr and Perlman5, Reference Corman7, Reference Andersen8]. However, there is no acceptable standard theory of why SARS-CoV-2 is highly infectious to human cells and its fast human-to-human transmission capabilities. One possible hypothesis in this direction could be that the variability of viral mutations may cause the virulence nature as mutations are generally considered the basic units of species evolution [Reference Duffy9]. These mutations are sometimes harmful and sometimes beneficial [Reference Loewe and Hill10], but play an important role in genetic diversity via natural selection [Reference Baer11]. The mutation rates of RNA viruses are generally high (10−4−10−6) and prone to mutations during virus−host interaction [Reference Loewe and Hill10]. The high mutation rate enhances the degree of viral genome replication kinetics [Reference Vignuzzi12, Reference Fitzsimmons13] and intensifies the virus's virulence and coevolution with the host [Reference Duffy9]. It helps the viral adaptability to the host [Reference Vignuzzi12]. This high mutation rate of the viral genome may lead to an increase in their virulence, causing rich diversity, which could be one of the pandemic's main causes. In this letter, we report the study of the diversity in the SARS-CoV-2 virus genome induced by pressure (quantified by sea level) and regions' health index across the world.
We have mined publicly available complete genomic data of various SARS-CoV-2 isolates (42 isolates in total) from seven countries [14]. We then calculated mutation rates (mutation per unit nucleotide) of all mined SARS-CoV-2 virus isolates. The mutation rate (m r) of an isolate can be defined by, m r = p r/G r, where, p r and G r are the point mutations and the total number of nucleotides of rth isolate, respectively. Taking Wuhan seafood market isolate as the reference genome, we calculated the mutation rates (mutation per unit nucleotide) of the mined isolates as a function of height above sea level (in metres) and health index of the place from where the viral isolates were extracted (Fig. 1). The height above sea level or altitude (Λ) and dependent atmospheric pressure (P[Λ]) can be related to Λ by using (dP[Λ]/dΛ) = − dg at hydrostatic equilibrium, and ideal gas equation, P = (d/N)RT where d is the air density, g is the acceleration due to gravity, N is the air molar mass, R is the universal gas constant and T is the standard temperature [Reference Quick15], and is given by, $P\lsqb {\rm \Lambda }\rsqb = P_0e^{-\int_0^{\rm \Lambda } {Mg\lpar {\rm \Lambda } \rpar {\rm dA}/{\rm RT}} }$, where P 0 is the atmospheric pressure at sea level. The function g(Λ) at any height Λ can be calculated by g(Λ) = (g/r 2)(r + Λ)2, where r is the radius of the earth. Considering a linear change in T (T = T 0 + LΛ), where T 0 is the sea level standard temperature, and L is the temperature lapse state of the air, we can get
where α = (Mg/r 2LRT)[r 2 − (T 0/L)(2r − (T 0/L))], β = (Mg/2r 2LRT) and γ = (Mg/r 2LRT)[2r − (T 0/L)]. Equation (1) clearly shows that pressure P[Λ] decreases quite fast as Λ increases. The mutation rate of the isolates (mr) is found to be relatively high (mr ∈ (10−4−10−5)), and increases as Λ increases (decrease in pressure P[Λ]) upto a maximum value (at around Λ ~ 1.7 × 102m), and then decreases as Λ increases (Fig. 1, upper panel). This implies that the diversity of the viral genomes is quite dependent on host cells adapted to a certain pressure P. Further, the dependence of mr on P[Λ] indicates that viral mutation rates can evolve with pressure, adapted with the host cells, and could optimise the rate at a selective pressure. This could be because virus−host cell interaction mechanisms, which alter viral replication kinetics, proofreading, etc., might depend on the adapted pressure. The pressure-dependent richness in viral mutation diversity indicates that there is a high chance of multiple virus−host-dependent processes [Reference Sanjuan and Domingo-Calap16] and could lead to more infectious nature of the SARS-CoV-2 virus.
The health index of a place characterises the overall health of the population at that particular place. The calculated mutation rate is strongly dependent on the health index (Fig. 1, lower panel). If Γ indicates health index parameter, then the relationship of mr with Γ can be obtained by fitting the data and is found to obey, mr ~ aeb Γ, where a = 1.3713 and b = 0.3681 are constants. The Pearson correlation coefficient value of the fitted function to the data points is r 2 = 0.8313.
Next, we study the impact of Λ and Γ on the genome complexity of the SARS-CoV-2 isolates by calculating the complexity measurement parameters, Hurst exponent (H), and generalised fractal dimension (D) [Reference Mandal17, Reference Kantelhardt18]. The biological and cellular processes of macro and micro-organisms are significantly changed by variations in atmospheric pressure and temperature characterised by the altitude parameter Λ [Reference Pradillon and Gaill19]. These variations could lead to a change in genetic expression [Reference Bartlett, Kato and Horikoshi20–Reference Iizuka and Murakami22], affecting mutation frequency [Reference Kaushal23]. From the data analysis, it has also been reported that atmospheric pressure can induce a change in mutational frequency of SARS-CoV-2 virus [Reference Kaushal23, Reference Zhu24], and temperature can trigger SARS-CoV-2 infection rate [Reference Conenello25]. Moreover, variation in this parameter Λ observed at various geographical locations across the world provides different environmental impacts with diversity in ecosystems causing variation in the infection rate of Covid-19 [Reference Scafetta26, Reference Coro27]. Hence, this change in Λ could induce perturbation to the complicated dynamics of human−SARS-CoV-2 virus interaction, which may cause the mutational profile of this virus, causing a change in genome organisation and regulation. The procedure of calculating the effect of Λ on any SARS-CoV-2 genome at any P and Γ is as follows. First, we converted the symbolic genome sequence to time series like DNA walk by taking purine (A or G) as step-up (+1) and pyrimidine as step down (−1) [Reference Peng28]. By construction, DNA walk is a map of the genome's cumulative sum, which carries the complex information of the genome. Then DNA walks of 42 SARS-CoV-2 isolates collected from the patients' data are calculated as a function of Λ and Γ. Now consider a DNA walk of length N of a particular isolate. From this DNA walk, we explain the procedure briefly for calculating various multifractal parameters discussed as follows [Reference Kantelhardt18]. First, we calculated the profile function, $Y\lpar i\rpar = \sum\nolimits_{j = 1}^i {\lsqb \lambda _j-\langle \lambda \rangle \rsqb }$ by constructing a series of length segments λ j's from the DNA walk of length N, with j = 1, 2, 3, …, N, such that, λ j = 0 is considered to be insignificant. Then, this function Y(i) is divided into N x = int(N/x) equal non-overlapping segments of length x. To incorporate the end effects of Y(i), 2N x segments of length x are considered by taking into account opposite end repetition in the simulation. The local trend of fluctuation of each 2N x was estimated from the variance, which was calculated by the least-squares fitting procedure for each segment series. Averaging over all the calculated local fluctuations of all 2N x segments, the q (order parameter) dependent fluctuation function Fq(x) for the considered virus isolate, and found to follow fractal nature Fq(x) ~ xHq, where, Hq is the q order Hurst exponent [Reference Peng28, Reference Halsey29]. Since Fq(x) is both q-dependent (inter-event dependent [Reference Sanjuan and Domingo-Calap16]) and x - dependent (local domains), each genome exhibits multifractal property [Reference Mandal17, Reference Kantelhardt18, Reference Calvert, Fisher and Mandelbrot30]. Then, H of each genome is obtained by H = <Hq>. The calculated values of H of all SARS-CoV-2 isolates are found to be quite sensitive to q, indicating rich heterogeneous structures in the genomes. Further, it is also found that 1 > H>0.5 (Fig. 2, upper two panels), characterising genomic signal in each isolate due to long-range positive correlations in their topology [Reference Norouzzadeh and Rahmani31]. This could be the evidence of strong self-organisation in each virus isolate [Reference Mandal17, Reference Heylighen32]. It is also found that H decreases with Λ till it attains minimum value H min = 0.9151 at Λ ~ 887m, and then increases with Λ following, H ~ uΛ2 + vΛ + w, where, fitted parameter values are u = 10−8, v = − 2.4 × 10−6 and w = 0.916 with Pearson's correlation coefficient value r 2 = 0.3135. From equation (1), taking (L/T 0) < 1, one can approximate the factor $\lsqb 1 + \lpar L/T_0\rpar {\rm \Lambda }\rsqb ^{-\alpha }\sim e^{-\alpha \lpar L/T_0\rpar {\rm \Lambda }}$, such that after simplification, we have, ${\rm \Lambda } = {-}{s \over 2} \pm \sqrt {-{s \over 2} + {1 \over \beta }{\rm ln}\left[{{{P_0} \over P}} \right]}$, where s = (1/β)(r + α(L/T 0)). Further, for positive Λ > 0, it can be shown that P 0 > P. Now, putting the expression for Λ to the equation of H, and after simplification, we arrive at
Now, Figure 2, upper left panel and equation (2) show that the virus isolates in hosts at pressure regions P > Pm and P < Pm (Pm → Λ = 887m) exhibit complicated divergence indicating more virulence attempting to establish long-range correlation within the genome during the virus−host interaction. However, the virus may likely cause minimal harm to the host adapted to the regions at and around P → Pm. The pressure variation driven by the change in altitude Λ can perturb gene expression and physiological changes of the organisms and micro-organisms [Reference Bartlett, Kato and Horikoshi20–Reference Iizuka and Murakami22]. It may cause variation of mutational frequency in the SARS-CoV-2 virus genome [Reference Kaushal23]. Our data analysis indicates that initially, the mutation rate increases as height above the sea level (Λ) increases till Λm i.e. as pressure decreases till P m (Fig. 1, upper panel), where the mutation rate is maximum, causing an increase in viral virulence [Reference Zhu24]. Hence it increases in infection rate as well [Reference Conenello25]. Afterward, the mutation rate decreases as Λ increases i.e. pressure decreases, as evident from [Reference Scafetta26, Reference Heylighen32]. This impact of pressure is always associated with the change in temperature supplementing changes in gene expressions and mutation rates, triggering the Covid-19 infection rate [Reference Conenello25, Reference Smit33]. Moreover, the change in pressure also causes variation in human lung epithelial tissue cells due to variation in respiration rate to intake oxygen, which may cause physiological changes and even various other diseases [Reference West34, Reference Wickramasinghe and Anholm35]. Hence, even at the gene-expression level, these physiological changes may cause complications in human−SARS-CoV-2 virus interaction dynamics, driving changes in the mutation rate, leading to variation in infection rate. Now, qth-order generalised fractal dimension Dq for each isolate can be calculated from the corresponding DNA walk by
where, $T_k^q \lpar z\rpar = \lim _{N\to \infty }\lpar N_k/N\rpar$ is the probability that kth DNA walk segment of length scale z will have Nk observations, and obeys Tkq(z) ~ zν [Reference Halsey29], where ν is Holder exponent [Reference Mallat and Hwang36]. The fractal dimension D can be obtained by D = <D>. Similar to H, D is also quite sensitive to q, indicating rich heterogeneous structure in each virus isolate [Reference Mandal17], where local topologies might have significant functions. D is found to be maximum at Λ = 887m (Fig. 2, lower left panel) and decreases for Λ < 887m and Λ > 887m. This indicates that virus virulence is quite significant to hosts adapted at high- and low-pressure regions, and may cause the least harm to the hosts adapted at regions at and around Λ ~ Λ0( = 887). The degree of viral complexity measured by H and D is dependent on health index IH (Fig. 2, panels of right-hand column). The behaviour of H is found to be linearly dependent on IH, H = εIH + δ, where the fitted parameter values are ε = 0.9139, δ = 0.0016, and Pearson's correlation coefficient value is r 2 = 0.3512. Similarly, D also has a similar nature as in H, and found that D depends linearly with IH, D = ηIH + σ, where, η = −0.001369, σ = 1.0109 and r 2 = 0.4831. Since the virus complexity increases as IH increases, it may be the case where viral diversity might have increased in healthy hosts in order to survive in the host. We also found multiple isolates found in some countries, where, USA and Wuhan have many isolates with multiple values of fractal dimension and Hurst exponent corresponding to a particular health index assigned for a country. In general, H and D are multiple-valued functions (dependent on local trended fluctuations, which are dependent on order parameter q) because of multifractality of the SARS-CoV-2 viral genomes. However, here, H and D are calculated average values of each genome.
In conclusion, it is quite evident that the mutation rate and rich diversity in SARS-CoV-2 genome complexity are strongly driven by pressure levels of the hosts' inhabited regions and health index. Our results show clear exponential dependence of health index with mutation rate, which may trigger the SARS-CoV-2 virus's virulence, which may increase the infection rate. From our analysis, since higher health index people have high mutation rates causing higher infection rates, proper precautions like WHO guidelines should be strictly followed, and the immune system has to be kept strong to fight the virus infection. Our analysis of the viral isolates data show a critical pressure at which the virus causes minimal harm to the host and beyond which the viral evolution preserves rich diversity relating to virulence. The virus's complexity increases as the hosts' health index, probably for its survival and then attacks the hosts. The variation in atmospheric pressure triggers significant changes in gene expressions leading to indicative biological and physiological changes in the macro and micro-organisms [Reference Bartlett, Kato and Horikoshi20–Reference Iizuka and Murakami22]. In the Covid-19 pandemic case, there is a complicated human−SARS-CoV-2 virus interaction dynamics driven by pressure, which is associated with temperature [Reference Scafetta26]. Our data analysis showed that as with a decrease in pressure (increase in height above sea level), the mutation rate of SARS-CoV-2 virus increases until it reaches a maximum value. The virus showed maximum virulence, which may have a maximum tendency to spread in the population [Reference Zhu24, Reference Conenello25, Reference Peng28]. Then as pressure increases further, the mutation rate starts decreasing, indicating less virus virulence, which may cause a decrease in the infection rate in the population. Hence, we propose that these two parameters could be of high concern for analysis to intervene in the fast progressing Covid-19 pandemic.
Acknowledgement
MZM is financially supported by Department of Health and Research, Ministry of Health and Family Welfare, Government of India under young scientist FTS No. 3146887. RKBS acknowledges UPE-II, sanction no. 101, India, for providing financial support.
Author contributions
RKBS conceptualised and designed the model. RKSS, MZM and RKBS did the computational experiment and prepared the figures. RKBS, RKSS and MZM wrote the paper. All authors read, checked and approved the paper.
Conflict of interest
The authors declare that they have no competing interests.
Data availability statement
All data generated and/or analysed during the current study are available from the corresponding author on reasonable request.