Introduction
Vitex trifolia L., Vitex bicolor Willd. and Vitex rotundifolia L.f. are part of the V. trifolia group, along with V. negundo, V. agnus-castus, V. benthamiana and V. pseudo-negundo (de Kok, Reference de Kok2007). The three species are native to parts of Africa, Asia and the Pacific (POWO 2022). Historically, these taxa are often confused with each other resulting in numerous nomenclatural revisions. Lam (Reference Lam1919) classified V. bicolor under V. negundo and grouped V. rotundifolia under V. trifolia; however, Moldenke (Reference Moldenke1957) later revised this to the extent of grouping the three species under V. trifolia but still recognizing their respective infraspecific differences by establishing separate botanical varieties. de Kok (Reference de Kok2008) elevated this infraspecific classification to subspecies level, grouping V. trifolia and V. bicolor types into V. trifolia subsp. trifolia and separately classifying V. rotundifolia as V. trifolia subsp. littoralis due to differences in growth habit and distribution. Most recently, de Kok and Sengun (Reference de Kok, Sengun and Bramley2020) reclassified these taxa as three separate species – V. trifolia with the usual 3-foliate leaves, V. bicolor with usual 3-5-foliate leaves, and V. rotundifolia with always 1-foliate leaves.
On the other hand, chloroplast DNA sequences or genes, such as matK, rbcL and trnL, have long been used to identify species and elucidate phylogenetic relationships (Palmer et al., Reference Palmer, Jansen, Michaels, Chase and Manhart1988; Santos and Pereira, Reference Santos and Pereira2018; Linh et al., Reference Linh, Hang, Hue, Ha, Hanh and Ton2022); however, the ability to assemble the complete chloroplast genomes have led to its utility as super-barcodes that improve species discrimination in various plant genera such as Dracaena (Zhang et al., Reference Zhang, Zhang, Song, Guan and Ma2019), Acyranthes (Xu et al., Reference Xu, Shen, Liao, Xu and Hou2020) and Styrax (Song et al., Reference Song, Zhao, Xu, Li and Zhang2022). In the Lamiaceae family, a large-scale phylogeny has been inferred through chloroplast DNA sequences which strongly supported 12 primary clades within the family (Li et al., Reference Li, Cantino, Olmstead, Bramley, Xiang, Ma, Tan and Zhang2016). Recently, an updated tribal classification of Lamiaceae through chloroplast genome data became available, although it was indicated that an increased taxon sampling could provide deeper insights into the relationships within the family (Zhao et al., Reference Zhao, Chen, Salmaki, Drew, Wilson, Scheen, Celep, Bräuchler, Bendiksby, Wang and Min2021). Particularly, in re-examining generic limits of Vitex in Southeast Asia, Bramley et al. (Reference Bramley, Forest and de Kok2009) suggested that molecular data could be a source of evidence that could provide systematic means of species identification in these ‘troublesome’ mints showing confusing morphological traits. Bramley et al. (Reference Bramley, Forest and de Kok2009) further identified the genus Vitex as the most problematic group. This genus, specifically species belonging to the V. trifolia group, had been discovered to produce medicinal compounds and has a long-established use in traditional systems of medicine as an antioxidant, anti-inflammatory, antimicrobial, hepatoprotective, analgesic antihistamine and antiasthmatic (Meena et al., Reference Meena, Niranjan, Rao, Padhi and Babu2011). In the Philippines, these taxa, particularly V. bicolor, are often collectively referred to as ‘lagundi’ or ‘lagunding dagat’ which is often misidentified to be V. negundo – a closely related taxon that has an approved therapeutic claim and had been recommended by the Department of Health-Philippines (Philippine Pharmacopeia 1, 2004; Zarsuelo et al., Reference Zarsuelo, Zordilla and Anacio2018).
Hence, in this study, we attempted to elucidate the differences in the chloroplast genomes of this important species complex by establishing a de novo assembled V. bicolor chloroplast genome and comparing this to earlier established chloroplast genomes of V. trifolia s. str. and V. rotundifolia to provide additional evidence that would improve genotypic delineation which may help the Philippine herbal industry standardize the source of raw materials being used to produce these herbal medicines. We also characterized the morphology of the reference germplasm utilized to generate the chloroplast genome data of V. bicolor to ensure the continued utility of the research outputs in case of future taxonomic revisions.
Materials and methods
Plant materials, herbarium preparation and morphological characterization
We utilized a single accession of V. bicolor (PBN 2018-674) registered at the National Plant Genetic Resources Laboratory and conserved at the Institute of Crop Science, University of the Philippines Los Baños (UPLB) field genebank for the chloroplast genome assembly. To ensure that the genotype is associated with a particular reference specimen conserved in the genebank and preserved in the herbarium, we then characterized the morphology of the accession based on the identified delineating morphological markers within the V. trifolia group (de Kok and Sengun, Reference de Kok, Sengun and Bramley2020) in case there would be future taxonomic revisions. Leaves, flowers and fruits of the accessions were also photographed using the Olympus SZX7 stereo microscope. The voucher specimen (ICROPS 1399) of this accession was earlier prepared and deposited at the Philippine Herbarium of Cultivated Plants, UPLB (https://cafs.uplb.edu.ph/icrops/, Renerio P. Gentallan Jr., [email protected]) with a type locality at Siquijor, Philippines. A plastome of a separate accession (PBN 2019-138) from Los Baños, Laguna, Philippines, a different type locality, was also assembled and characterized to serve as a validation set for the chloroplast genome assembled.
DNA extraction, sequencing, assembly, annotation and visualization
Fresh leaves of V. bicolor (PBN 2018-674) were extracted using a slightly modified CTAB protocol of Doyle and Doyle (Reference Doyle and Doyle1987). The DNA sample was sent to NovogeneAIT Genomics Singapore PTE LTD, Singapore, for sequencing using the HiSeq-PE150 platform (Illumina Inc., San Diego, CA, USA). This generated 25,653,706 cleaned reads. The chloroplast genome was assembled using the GetOrganelle v1.7.5+ (Jin et al., Reference Jin, Yu, Yang, Song, DePamphilis, Yi and Li2020), generating a circular genome. The circularized genome was then annotated and mapped using GeSeq (Tillich et al., Reference Tillich, Lehwark, Pellizzer, Ulbricht-Jones, Fischer, Bock and Greiner2017) and CPGAVAS2 (Shi et al., Reference Shi, Chen, Jiang, Wang, Wu, Huang and Liu2019). The annotated plastome was visualized using OGDRAW (Greiner et al., Reference Greiner, Lehwark and Bock2019). The assembled chloroplast genome sequence was submitted to GenBank with the accession number ON526805. Using the same protocol, a separate V. bicolor accession (PBN 2019-138) was sequenced, assembled and annotated for validation. Because a different assembly pipeline was used in the elucidation of the chloroplast genome of V. rotundifolia which could result in potential differences due to errors in the assembly protocol, we downloaded the raw sequences of the V. rotundifolia (SRX14066491), reassembled its plastome and, subsequently, annotated it using the protocol provided above. The genome sequence data that support the findings of this study are openly available in GenBank of NCBI at https://www.ncbi.nlm.nih.gov under accession no. ON526805. The associated BioProject and Bio-Sample numbers are PRJNA824823 and SAMN30201904, respectively.
Genome comparison
V. trifolia L., V. bicolor Willd. and V. rotundifolia L.f. chloroplast genomes were compared based on their individual genotypic characteristics- total length, base content, genes present and size of the segments in their quadripartite plastome. Nucleotide statistics, annotations and other genotypic descriptive parameters were checked using the Geneious Prime software version 2022.0.2 (https://www.geneious.com). The simple sequence repeat (SSR) variations within the three Vitex species were determined using MISA, (http://misaweb.ipk-gatersleben.de/) a web server for microsatellite prediction, which included mononucleotides, dinucleotides, trinucleotides, tetranucleotides and pentanucleotides with minimum numbers of 10, 5, 4, 3, 3 and 3, respectively (Beier et al., Reference Beier, Thiel, Münch, Scholz and Mascher2017). The minimum threshold length applied in the literature for locating cpSSRs from DNA sequence searches is typically either eight or ten uninterrupted mononucleotide repeats (Ebert and Peakall, Reference Ebert and Peakall2009). The differences across species were summarized using a bar graph. Palindromic repeats, forward repeats, reverse repeats and complementary repeats were calculated using REPuter set at a minimal repeat size of 30 bp and a hamming distance of 3 (Kurtz et al., Reference Kurtz, Choudhuri, Ohlebusch, Schleiermacher, Stoye and Giegerich2001).
The plastomes of three Vitex species and their respective genotypic differences were visualized across the whole genomic length using the LAGAN mode in mVISTA (Frazer et al., Reference Frazer, Pachter, Poliakov, Rubin and Dubchak2004). To calculate the nucleotide polymorphism (Pi) among the three Vitex species, we performed a sliding window analysis, with a window length of 500 bp and step size of 500 bp, using DnaSP v6.0 (Rozas et al., Reference Rozas, Ferrer-Mata, Sánchez-DelBarrio, Guirao-Rico, Librado, Ramos-Onsins and Sánchez-Gracia2017). IRscope was then used to visualize if there are differences in the inverted repeat (IR) boundaries across the three genomes (Amiryousefi et al., Reference Amiryousefi, Hyvönen and Poczai2018).
Phylogenetic analysis
The ingroup, sister group and outgroup were pre-identified using the updated tribal classification of Lamiaceae based on plastome phylogenomics (Zhao et al., Reference Zhao, Chen, Salmaki, Drew, Wilson, Scheen, Celep, Bräuchler, Bendiksby, Wang and Min2021). Based on sequence availability, chloroplast genome sequences of seven other Vitex species (subfamily Viticoideae) as part of the ingroup, one Congea species of the closely related subfamily Symphorematoideae Briq., and two Salvia species of the subfamily Nepetoideae as part of the outgroup, were downloaded from the NCBI database. The downloaded sequences, together with the assembled chloroplast genome sequences of the three Vitex species, were aligned using MAFFT (Katoh and Standley, Reference Katoh and Standley2013). Using MEGA-X software (Kumar et al., Reference Kumar, Stecher, Li, Knyaz and Tamura2018), we identified the best model for phylogenetic analysis using Bayesian information criterion, and subsequently generated a maximum likelihood (ML) tree using the General Time Reversible, Gamma-Invariant (GTR + G + I) model (Nei and Kumar, Reference Nei and Kumar2000) with 1,000 bootstraps.
Results
Morphological characteristics of reference materials
PBN 2018-674 (V. bicolor) is a small tree, 2-3 m high. Leaves (Fig. 1(a)) with entire leaf margin, adaxially strong olive green, abaxially weak yellow-green, plane transverse posture, straight longitudinal posture, composed of 3-5 leaflets but rarely unifoliate, with highly pubescent velutinous lamina under surface (Fig. 1(b) and (c)), with 50.561 ± 7.69 mm long petioles; central leaflet elliptic, acuminate apex, cuneate base, 123.338 ± 2.57 × 35.264 ± 0.72 mm. Inflorescence botryoid, with a single indeterminate terminal floral meristem and pedunculate flowering unit; flowers (Fig. 1(d)) blue-violet, erect tube habit, with five-lobed calyx, with non-persistent bracteoles, 8.803 ± 0.18 mm corolla length, 6.825 ± 0.19 corolla diameter; 4.645 ± 0.17 mm × 3.314 ± 0.12 mm lower lip petals, 2.167 ± 0.06 mm × 2.029 ± 0.06 mm lower fused lateral lobes, 1.975 ± 0.07 mm × 1.747 ± 0.03 mm upper lip petal, with 8.33 ± 0.22 mm long style; stamens exserted, didynamous, with 5.423 ± 0.22 mm longer filaments and 4.433 ± 0.19 mm shorter filaments. Fruits (Fig. 1(e)) spherical, rounded apex, brown to black, covered by about 50% or less by the calyx.
Genome assembly and annotation
The assembled chloroplast genome of V. bicolor (PBN 2018-674) was 154,460 bp long, and it follows the characteristic quadripartite circular structure of a chloroplast genome exhibiting a pair of IR regions of 25,687 bp each, a short single-copy region (SSC) of 17,928 bp, and a long single-copy (LSC) region of 85,158 bp (Fig. 2). The base composition of the genome is 30.5% A, 19.4% C, 18.8% G and 31.3% T yielding a GC content of 38.4%. It encodes 131 genes, comprising 87 mRNA genes, 36 tRNA genes and 8 rRNA genes. Among these, 45 genes are for photosynthesis, 28 genes are for self-replication and 6 genes code for other proteins. In the validation set, using a separate V. bicolor accession (PBN 2019-138), a completely identical chloroplast genome was assembled. We were also able to re-assemble the genome of V. rotundifolia to produce a 154,447-bp long circular plastome.
Genome comparison
Sequence variation in the plastomes of the three Vitex species was observed in terms of length, and not more than 9-bp differences were observed in terms of their base composition resulting in a relatively conserved GC content of 38.3% across the three species. No differences were observed in the number of genes and their respective composition among the taxa. The species, however, showed size variations in the short single copy region and the long single copy region only. Vitex bicolor had the longest LSC region of 85,158 bp while V. rotundifolia had the shortest with 85,142 bp. On the other hand, V. rotundifolia had the longest SSC region of 17,931 bp while V. trifolia had the shortest with 17,922 bp. Total length variations of the plastome sequences did not correlate to the type of variations observed within the quadripartite genome segments across species. These structural variations did not translate to variations in the IR boundaries across the three genomes (Fig. 3).
In the repeat analysis, we identified 45 long repeats, comprising 22 forward, 1 reverse, and 22 palindromic repeats, in the chloroplast genome of V. bicolor. This pattern, however, was also observed in V. trifolia and V. rotundifolia. For SSRs, no differences were observed in dinucleotide, trinucleotide, tetranucleotide and pentanucleotide SSRs across the three species; however, slight differences in the mononucleotide repeats with minimum numbers of 10 were identified across the three taxa (Figure S1).
The sliding window analysis showed that the average nucleotide diversity (Pi) was higher in the LSC, and SSC regions compared to the IR regions (Fig. 4(a)). We also identified three intergenic regions and five protein-coding regions that had Pi values greater than 0.0025. The mVISTA plot further visualized the differences in the genic and intergenic regions among the species of interest (Fig. 4(b)). In particular, we observed single nucleotide polymporphisms in 13 protein-coding genes, including rbcL, matk, atpF, rps11 and rps16, which could be used as markers to delineate the three genotypes (Table 1).
Phylogenetic analysis
The phylogram corresponds to the most recent updated tribal classification of Lamiaceae based on plastome phylogenomics (Zhao et al., Reference Zhao, Chen, Salmaki, Drew, Wilson, Scheen, Celep, Bräuchler, Bendiksby, Wang and Min2021), where the Viticoideae subfamily, which now includes the newly assembled and re-assembled plastome of Vitex species, formed a distinct clade while showing a close relationship to the Symphorematoideae subfamily. The phylogram indicated that V. bicolor is indeed closely related to V. trifolia and V. rotundifolia; however, we infer that V. bicolor is closer to V. trifolia than V. rotundifolia based on their plastome sequences (Fig. 5). The three species form a close relationship with V. negundo which is also part of the V. trifolia group (de Kok, Reference de Kok2007). These species, along with V. tripinnata, form a highly similar and distinct clade within Vitex. Despite this, the relationship among the species belonging to the V. trifolia group could not be fully elucidated since the plastomes of other species, such as V. benthamiana and V. agnus-castus, is yet to be assembled.
Discussion
Morphological traits of the reference germplasm used for the assembly of PBN 2018-674 fall within the range of characters established for V. bicolor (Willdenow, Reference Willdenow1809). Morphological variations between V. trifolia s. str., V. rotundifolia and V. bicolor are recently delineated by de Kok and Sengun (Reference de Kok, Sengun and Bramley2020) which treated V. bicolor as separate species from the earlier circumscription to V. trifolia s. lt. (de Kok, Reference de Kok2008) through its characteristic tree or shrub habit that does not root at nodes, usual 3–5 foliate leaves, distinctly petiolulate terminal leaflet with acuminate to acute apex, cuneate base and ovoid-elliptic to narrowly elliptic shape. The specific epithet bicolor may have been derived from the distinct difference in the abaxial and adaxial leaf colour when dried.
This is the first reported chloroplast genome of V. bicolor. The assembled chloroplast genome was completely conserved in the two separately assembled accessions, indicating that the derived chloroplast genome could be a distinct character of this species. At the plastome level, we identified minimal yet significant differences among the three species. Although their gene composition, IR junctions, GC content and overall genome features are similar, their genic regions, particularly those that often exhibit conserved nature and had been used as barcode markers to delineate species, have shown differences within the group. In particular, out of the five chloroplast markers used by Li et al. (Reference Li, Cantino, Olmstead, Bramley, Xiang, Ma, Tan and Zhang2016) in drafting a large-scale phylogeny of the Lamiaceae family, three chloroplast genes, matk, rps16 and rbcL, showed variations between the three Vitex species. This indicates that SNP variations across the 13 protein-coding regions, albeit relatively minute, could have potential evolutionary implications. As an example, the differences in the rbcL gene could be resultant of environmental changes that occurred through time as it had been hypothesized that this gene has experienced bursts of adaptations in response to the changing concentration of CO2 in the atmosphere (Sen et al., Reference Sen, Fares, Liang, Gao, Wang, Wang and Su2011). These subtle differences could also have significant impacts on the photosynthetic productivity of the species (Christin et al., Reference Christin, Salamin, Muasya, Roalson, Russier and Besnard2008). Furthermore, we identified three polymorphic genes, atpF, rpoC2 and ycf3, which could be used to differentiate the three species or genotypes (Table 1). These sequences could be prioritized in generating barcode markers utilized as supporting evidence for species- and genotype-level identification. SNPs in the chloroplast genome had been used in developing species-level authentication systems, such as in the case of the identification of the medicinal Panax ginseng from counterfeits (Kim et al., Reference Kim, Lee, Lee, Lee, Joh, Kim and Yang2015; Giang et al., Reference Giang, Waminal, Park, Kim, Jang, Lee and Yang2020; Linh et al., Reference Linh, Hang, Hue, Ha, Hanh and Ton2022), species within the Saccharum complex (Li et al., Reference Li, Duan, Zhao, Jing, Feng, Kuang and Yang2022) and edible Rubus species (Park et al., Reference Park, Park, Kang, Lee and Yang2021).
The chloroplast genomes evolve slowly within species resulting in the use of noncoding regions, such as the trnH-psbA and psbK-psbI intergenic spacers, in species identification like in the case of Anthurium, Ficus and Prunus (Roy et al., Reference Roy, Tyagi, Shukla, Kumar, Singh, Chaudhary, Datt, Bag, Singh, Nair and Husain2010; Pang et al., Reference Pang, Liu, Shi, Liu, Liang, Cherny and Li2012; Suzuki et al., Reference Suzuki, Matsumoto, Keith and Myers2014; Amar Reference Amar2020). Similarly, these types of variations are also observed in the three genomes compared (Fig. 4). The chloroplast genomes are highly conserved that even the slightest variations in their structure, size and sequence could indicate phylogenetic significance.
On the other hand, the prevalence of mononucleotide repeats in the chloroplast genomes of the three species is expected; however, the differences observed in the mononucleotides across the species are not definite evidence of interspecies variations as chloroplast SSRs are likely to show variations even within species and are considered to be good markers to track infraspecies differences, as population markers and barcodes used to identify genotypes within species (Ebert and Peakall, Reference Ebert and Peakall2009; Wheeler et al., Reference Wheeler, Dorman, Buchanan, Challagundla and Wallace2014). Nonetheless, the presence of this type of variation in the examined plastomes provides further indication of their genotypic differences and potential taxonomic divergence.
The differences in the plastome between species, through phylogenetic analysis, translated to a high bootstrap value (>88%) in the elucidated phylogram, even within the clade of the V. trifolia group. This further supports the division of the earlier synonymized species into three distinct taxa – V. bicolor, V. rotundifolia and V. trifolia s. str. Thus, through this study, we also provide additional evidence for the recent re-classification of these taxa through de novo assembly of the complete chloroplast genome of V. bicolor germplasm accession and its subsequent comparison to V. trifolia s. str. and V. rotundifolia.
These genotypic variations summarized and emphasized in the study, together with the whole chloroplast genome sequence as a super-barcode, could be a potential source of markers that could help delineate the three closely related species. Our research provides clear evidence of genetic variation across the individuals examined. These genetic differences are critical in V. trifolia since this taxon, in its broad sense, is considered to be an important medicinal plant, particularly in Southeast Asia (Capareda, Reference Capareda2016). The elucidated variations further implies that there is a need for standardization of plant genetic resources utilized to generate herbal drugs, since the effect of an herbal medicine (phenotype) is a function of its respective genotype, environment and their genotype-by-environment interaction. This is a problem presented by the Philippine herbal industry as quality control issues arise in the source of raw materials, which refers to ‘the correct variety, species, chemotype, ecotype, and part and stage development of the plant to be processed’ (Hipolito, Reference Hipolito2012). Hence, the recorded genotypic variations, which could serve as markers that would delineate genotypes within this species complex, may help the herbal industry provide a more stable source of quality herbal medicines through identification and standardization of genotypes that are used in production and research of this species complex.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1479262123000370.
Acknowledgements
The authors would like to thank the Department of Science and Technology-Philippine Council for Agriculture, Aquatic and Natural Resources Research and Development (DOST-PCAARRD), Department of Science and Technology-Philippine Council for Health Research and Development (DOST-PCHRD) for their support. The authors would also like to thank Ronil Beliber and Arvin Medrano for cultivating the germplasm, Eddelaine Joyce Bautista and Edna Mercado for processing the paperwork needed for the research project, and Josel Mansueto, Irish Alysa Herlao and June Kristine Ando for their help in collecting the specimen.