Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in the late 2019s, causing coronavirus disease 2019 (COVID-19). The SARS-CoV-2 virus spread rapidly throughout the world with an unsettling effect on human livelihood and economy. Globally, since 31 December 2019 and as of week 2021-5, 106 472 660 cases of COVID-19 have been reported, including 2 323 103 deaths. Since 31 December 2019 and as of the fifth week, 2021, >106 472 660 confirmed cases, including 2 323 103 deaths, have been reported worldwide, affecting >343 countries as indicated by the European Centre for Disease Prevention and Control [1]. The World Health Organization (WHO) has declared a global health emergency at the end of January 2020 [Reference Mahase2]. In Sri Lanka, the first local case of COVID-19 was recorded on 11 March 2020 in a 58-year-old male. As of 23 February 2021, there were 80 517 confirmed cases of COVID-19, including 450 deaths reported to the WHO [3].
A robust surveillance system, understanding the molecular epidemiology of the virus, and mapping can help to understand the evolution of the epidemics and apply quick control measures [Reference Cheng4, Reference Wang5]. Currently, based on the genomic epidemiology mapping of the COVID-19 virus, it has been demonstrated that the virus is undergoing mutations [Reference Pachetti6]. Therefore, in addition to confirmation of the presence of the virus, the WHO recommends regular sequencing of a percentage of specimens from clinical cases to monitor viral genome mutations that might affect the medical countermeasures, including diagnostic tests [Reference Wang5]. The whole genome of four SARS-CoV-2 virus strains obtained from COVID-19 positive local patients was sequenced and deposited in the Global Initiative on Sharing All Influenza Data (GISAID) EpiCoV™ database. This study was conducted to investigate the evolution and genetic relatedness of SARS-CoV-2 strains in Sri Lanka, with other reported SARS-CoV-2 strains.
Materials and methods
Sri Lankan SARS-CoV-2 sequences
The whole-genome sequence of four SARS-CoV-2 virus strains obtained from COVID-19 positive patients, including the first local positive case reported on 11 March 2020, was deposited in the GISAID EpiCoV™ database5 were used for this study (Table 1).
Selection of SARS-CoV-2 isolates
For further understanding of the molecular epidemiology of the COVID-19 outbreak in Sri Lanka, 46 isolates were selected from GenBank, National Center for Biotechnology Information (NCBI) using Basic Local Alignment Search Tool nucleotide (BLASTn) tool based on the highest identity and lowest expected value (E-Value) with Sri Lankan isolates. The sequence datasets of 46 selected SARS-Cov-2 complete genomes from different countries in Asia, Africa, Australia, Europe, North America and four Sri Lankan isolates retrieved from GISAID5 by 28 April 2020 were used for this analysis. The strain isolated from Wuhan in December 2019 with the NCBI accession number NC_045512.2 was used as the reference genome.
Whole-genome sequence alignment and phylogenetic analysis
Sequence alignment was performed using Multiple Sequence Comparison by Log- Expectation (MUSCLE) software [Reference Edgar7]. Following alignment, single nucleotide polymorphisms (SNPs) and amino acid variations analysis were conducted using Molecular Evolutionary Genetics Analysis version ten (MEGA X) [Reference Kumar8], taking the first SARS-CoV-2 reference sequence (GenBank Accession number NC_045512) deposited December 2019 in GenBank from Wuhan, China. The evolutionary history was inferred using the neighbour-joining method with the maximum composite likelihood method and the Hasegawa–Kishino–Yano model (HKY) as the best fitting model [Reference Hasegawa, Kishino and Yano9] after 1000 bootstrap replication using MEGA X [Reference Kumar8].
Results
Phylogenetic tree analysis
The maximum likelihood phylogenetic tree in Figure 1 shows that two of the SARS-CoV-2 isolates from Sri Lanka (GISAID accession IDs: EPI_ISL_428671 and EPI_ISL_428672) collected on 10 March 2020 and 19 March 2020, respectively, are clustered in the group with the isolates from Italy, Germany, France and Mexico that were collected before 10 March 2020. The EPI_ISL_428673 Sri Lankan isolates collected on 31 March 2020 was clustered with isolates obtained on 9 February 2020 from England while EPI-ISL_428670 Sri Lankan isolates collected on 16 March 2020 showed the highest evolutionary distance to the SARS-CoV-2 sequence originated in Wuhan, China (GenBank Accession number NC_405512).
SNPs analysis
Fifteen SARS-CoV-2 genome sequences that are mainly clustered with the four Sri Lankan strains were compared with the Wuhan reference to observe the viral genome mutations and amino acid variations. The SNPs presented along with the whole genome are indicated in Table 2 (positions referred with respect to the reference sequence; GenBank accession number: NC_045512). The genome sequence of EPI_ISL_428671 from the first local patient has differed in six nt positions compared to the reference genome, while the rest of the three Sri Lankan sequences EPI_ISL_428670, EPI_ISL_428672 and EPI_ISL_428673 showed variations in six nt positions, five nt positions and four nt positions, respectively (Table 2). Both EPI_ISL_428671 and EPI_ISL_428672 strains, which clustered with the main group in the phylogenetic tree with European isolates, have shown three similar SNPs at the positions of bps3037, bps14408 and bps23403 (Table 2).
N: nucleocapsid protein; ORF: open reading frame; ORF1ab: ORF encoding polyprotein; S: surface glycoprotein; SARS-CoV-2: severe acute respiratory syndrome coronavirus; SNP: single nucleotide polymorphism; UTR: untranslated region.
SNPs are indicated in red colour.
a SNPs are shown according to nucleotide positions in the genome sequence and gene location.
Amino acid variations
Table 3 indicates the respective changes in the amino acid positions of the derived proteins (positions referred with respect to the reference sequence; GenBank accession number: NC_045512). SNPs occurred only in the open reading frame (ORF) 1ab gene, S gene, ORF 3a gene, M gene and N gene of four Sri Lankan whole-genome strains have resulted in amino acid changes at the corresponding positions of the translated proteins, while the rest of SNPs in the genes did not result in any changes in amino acid sequence (Table 3).
N: nucleocapsid protein; ORF: open reading frame; ORF1ab: ORF encoding polyprotein; S: surface glycoprotein; SARS-CoV-2: severe acute respiratory syndrome coronavirus; SNP: single nucleotide polymorphism; UTR: untranslated region.
*Stop codon.
? Possible sequencing error Amino acid variations are indicated in red colour.
a SNPs are shown according to nucleotide positions in the genome sequence and gene location. The amino acid positions refer to those in each respective protein sequence of the Wuhan reference (GenBank accession number: NC_405512), starting from the first methionine.
Except for the first Sri Lankan isolate collected on 10 March (EPI_ISL_428671), the other three Sri Lankan isolates presented a total of six mutations in the ORF 1ab protein with respect to the reference (Table 3). Mutations can be observed in the S protein at the same position AA614 (bps23403) in both Sri Lankan strains EPI_ISL_428671 and EPI_ISL_428672. A single mutation was observed in ORF 3a protein in strain EPI_ISL_428673 at the positions AA251 and bps26144. In the EPI_ISL_428670 strain, the amino acid sequence of N protein shows one mutation at the position AA398 at bps29465, while EPI_ISL_428671 strain had mutations at the positions AA203 (bps28882) and AA204 (bps28883) compared to the reference strain.
Discussion
In this study, the virus strain of the first local patient (EPI_ISL_428671) collected on 10 March 2020, who was a tour guide and had direct contact with Italian tourists [Reference Cheng4], is clustered together with isolates from Italy, Germany and Mexico. This evolutionary evidence revealed the first sequence of SARS-CoV-2 showed the highest genomic similarity to Italy, and European isolates confirming the history of exposure of the first patient who has been exposed to tourists who came from Italy. Even though the history and origin of the infection of the remaining isolates were not reported, the clustering of other isolates from Sri Lanka with isolates from the database has provided a clue about the possible source of the infection. Furthermore, genomic relatedness of the SARS-CoV-2 virus genome sequences of Sri Lankan isolates further confirmed the exposure history of the patients presented in the Epidemiology Unit, Ministry of Health, Sri Lanka [Reference Edgar7]. More importantly, this study has indicated the importance of tracking the history of the infection to trace the contacts of the infected person, particularly for the asymptomatic patients in Sri Lanka. The mutations found in the virus identified in Sri Lanka, compared with the reference Wuhan strain and the recognise amino acid changes, should be further monitored to understand whether those changes affect the virulence of the virus or clinical manifestations of the disease. Though this study had limitations mainly due to the lack of epidemiological information on the available genome in the database and a limited number of sequence genome available in Sri Lanka at the time of analysis, the information obtained from this study might assist in understanding the evolutionary dynamics and local transmission of circulating SARS-CoV-2 in Sri Lanka.
Conclusion
In this section, the results of this study indicated that the SARS-CoV-2 sequences from Sri Lanka have the highest genomic similarity to isolates from Italy, Germany and England. This study was conducted as a preliminary study in Sri Lanka; further studies are necessary to be performed to increase our knowledge regarding SARS-CoV-2 isolates. Since the mutational variants can alter the presentation of COVID-19 infection, the robust molecular epidemiological tools indicated in this study can be used to trace possible exposure, epidemiological analysis and develop an effective treatment, including vaccines.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0950268821000583.
Acknowledgements
We gratefully acknowledge the authors, the Originating and Submitting Laboratories, for their sequence and metadata shared through GISAID and NCBI, on which this research is based. All submitters of data may be contacted directly via www.gisaid.org. The acknowledgments table for GISAID and NCBI is reported as Supplementary material.
Author contributions
Dilan Amila Satharasinghe: Conceptualisation, design of the work, analysis, interpretation of data, resources, supervision, writing – original draft, writing-review and editing. Parakatawella Mudiyanselage Shalini Daupadi Kumari Parakatawella: Analysis, writing – original draft. Jayasekara Mudiyanselage Krishanthi Jayarukshi Kumari Premarathne: Design of the work, writing – review and editing. L. J. P. Anura P. Jayasooriya, Gamika A. Prathapasinghe: Writing – review and editing. Swee Keong Yeap: Supervision, writing – review and editing.
Financial support
No funding was received for this work
Conflict of interest
The authors declare there are no financial or non-financial competing interests.
Ethical standards
Not applicable
Consent for publication
The authors declare consent for publication
Data
Data and material are provided in the supplementary material and available at www.gisaid.org.