Introduction
Environmental DNA (eDNA) metabarcoding is an increasingly popular and effective tool for investigating biodiversity within an area of interest (Shokralla et al. Reference Shokralla, Spall, Gibson and Hajibabaei2012, Poisot et al. Reference Poisot, Péquin and Gravel2013, Porter & Hajibabaei Reference Porter and Hajibabaei2018, Pawlowski et al. Reference Pawlowski, Apothéloz-Perret-Gentil and Altermatt2020). eDNA metabarcoding is a method that uses high-throughput sequencing technologies (i.e. massive parallel sequencing technologies) to identify organisms through traces of DNA left in the environment such as in soil, water, faeces or adhered to actual organisms (Pietramellara et al. Reference Pietramellara, Ascher, Borgogni, Ceccherini, Guerri and Nannipieri2009, Kircher & Kelso Reference Kircher and Kelso2010, Bohmann et al. Reference Bohmann, Evans, Gilbert, Carvalho, Creer and Knapp2014, Tanaka et al. Reference Tanaka, Hino, Tsai, Palomares-Rius, Yoshida and Ogura2014, Boyer et al. Reference Boyer, Cruickshank and Wratten2015, Lynggaard et al. Reference Lynggaard, Nielsen, Santos-Bay, Gastauer, Oliveira and Bohmann2019, Ruppert et al. Reference Ruppert, Kline and Rahman2019, Thomsen & Sigsgaard Reference Thomsen and Sigsgaard2019). eDNA metabarcoding can be used to detect single or multiple taxa (Bakker et al. Reference Bakker, Wangensteen, Chapman, Boussarie, Buddo and Guttridge2017, Alexander et al. Reference Alexander, Bunce, White, Wilkinson, Adam and Berry2020, Rota et al. Reference Rota, Canedoli, Ferrè, Ficetola, Guerrieri and Padoa-Schioppa2020, Schütz et al. Reference Schütz, Tollrian and Schweinsberg2020, Zhang et al. Reference Zhang, Pavlovska, Stoica, Prekrasna, Yang and Slobodnik2020, Topstad et al. Reference Topstad, Guidetti, Majaneva and Ekrem2021), for environmental samples originating from varied ecosystems (Edwards et al. Reference Edwards, Alsos, Yoccoz, Coissac, Goslar and Gielly2018, Mariani et al. Reference Mariani, Baillie, Colosimo and Riesgo2019, Clark et al. Reference Clark, Pilditch, Pearman, Ellis and Zaiko2020, Fraija-Fernández et al. Reference Fraija-Fernández, Bouquieaux, Rey, Mendibil, Cotano and Irigoien2020, Webster et al. Reference Webster, Emami-Khoyi, van Dyk, Teske and van Vuuren2020, Carrasco-Puga et al. Reference Carrasco-Puga, Díaz, Soto, Hernández-Castro, Contreras-López and Maldonado2021), using fresh or old eDNA samples (Barnes & Turner Reference Barnes and Turner2016, Williams et al. Reference Williams, Huyvaert and Piaggio2016, Collins et al. Reference Collins, Wangensteen, O'Gorman, Mariani, Sims and Genner2018, Foucher et al. Reference Foucher, Evrard, Ficetola, Gielly, Poulain and Giguet-Covex2020) and to study ancient ecosystems (Jørgensen et al. Reference Jørgensen, Kjaer, Haile, Rasmussen, Boessenkool and Andersen2012, Gugerli et al. Reference Gugerli, Alvarez and Tinner2013, Alsos et al. Reference Alsos, Sjögren, Edwards, Landvik, Gielly and Forwick2016, Ruppert et al. Reference Ruppert, Kline and Rahman2019).
Unlike morphological identification methods, eDNA metabarcoding (henceforth referred to in this paper as ‘eDNA’ solely) often does not require taxonomic expertise for the identification of organisms (Deiner et al. Reference Deiner, Bik, Mächler, Seymour, Lacoursière-Roussel and Altermatt2017). Instead, eDNA aims to achieve taxonomic assignment via comparison of sequences obtained from an environmental sample with those available in reputable databases housing sequences from known organisms (Johnson et al. Reference Johnson, Zaretskaya, Raytselis, Merezhuk, McGinnis and Madden2008, Coissac et al. Reference Coissac, Riaz and Puillandre2012). Furthermore, and unlike other methods for monitoring biodiversity, eDNA is considered a ‘non-invasive’ monitoring tool as it does not require capturing or detecting whole specimens in the field (Fernandes et al. Reference Fernandes, van der Heyde, Bunce, Dixon, Harris, Wardell-Johnson and Nevill2018). Thus, eDNA has been shown to be a sensitive tool for detecting low concentrations of eDNA of rare or elusive species - such as endangered and invasive species or those visually confounded with morphologically similar species (Bickford et al. Reference Bickford, Lohman, Sodhi, Ng, Meier and Winker2007, Dejean et al. Reference Dejean, Valentini, Miquel, Taberlet, Bellemain and Miaud2012, Ji et al. Reference Ji, Ashton, Pedley, Edwards, Tang and Nakamura2013, Sigsgaard et al. Reference Sigsgaard, Carl, Møller and Thomsen2015, Pfleger et al. Reference Pfleger, Rider, Johnston and Janosik2016, Blackman et al. Reference Blackman, Constable, Hahn, Sheard, Durkota, Hänfling and Lawson Handley2017, Xia et al. Reference Xia, Zhan, Gao, Zhang, Haffner and MacIsaac2018, Harper et al. Reference Harper, Lawson Handley, Carpenter, Ghazali, Di Muri and Macgregor2019, Holman et al. Reference Holman, de Bruyn, Creer, Carvalho, Robidart and Rius2019, Thomsen & Sigsgaard Reference Thomsen and Sigsgaard2019, Nester et al. Reference Nester, De Brauwer, Koziol, West, DiBattista and White2020).
An area of research that could greatly benefit from eDNA methods is biodiversity monitoring in remote, difficult-to-access areas (Lacoursière-Roussel et al. Reference Lacoursière-Roussel, Howland, Normandeau, Grey, Archambault and Deiner2018, Howell et al. Reference Howell, LaRue and Flanagan2021). Sampling such areas can be logistically difficult and financially constrained (Schiermeier Reference Schiermeier2008, Ghosh & Rubly Reference Ghosh and Rubly2015, Mallory et al. Reference Mallory, Gilchrist, Janssen, Major, Merkel, Provencher and Strøm2018), whereas collecting samples for eDNA research generally takes only minutes once on site. Isolated areas are distinctive by being distant from densely inhabited areas, but also, in many cases, by experiencing harsh environmental conditions (Orsi et al. Reference Orsi, Whitworth and Nowlin1995, Prospero et al. Reference Prospero, Ginoux, Torres, Nicholson and Gill2002, McKay et al. Reference McKay, Friedmann, Gómez-Silva, Cáceres-Villanueva, Andersen and Landheim2003), making biodiversity research challenging (Brandt et al. Reference Brandt, Griffiths, Gutt, Linse, Schiaparelli and Ballerini2014, Kennicutt et al. Reference Kennicutt, Kim, Rogan-Finnemore, Anandakrishnan, Chown and Colwell2016). The available literature suggests that sampling limitations in areas such as Antarctica, under ice and the deep sea make it difficult to find and document the organisms that are present (Grant & Linse Reference Grant and Linse2009, Griffiths Reference Griffiths2010, Grant et al. Reference Grant, Griffiths, Steinke, Wadley and Linse2011). Furthermore, isolated areas are often relatively pristine, with high natural values, and sampling collection plans often need to conform to strict management protocols and permitting requirements (Hanessian Reference Hanessian1960, James et al. Reference James, Balmford and Gaston1999).
The Antarctic terrestrial macrobiota is dominated by invertebrates, such as springtails and mites, and plants such as liverworts, lichens and mosses (Bednarek-Ochyra et al. Reference Bednarek-Ochyra, Vana, Ochyra and Smith2000, Øvstedal & Smith Reference Øvstedal and Smith2001, Ochyra et al. Reference Ochyra, Lewis Smith and Bednarek-Ochyra2008, Hogg et al. Reference Hogg, Stevens, Wall and Cowan2014). Much of Antarctica's continental life evolved in isolation for millions of years, leading to highly localized genetic signatures (Lawver & Gahagan Reference Lawver and Gahagan2003, Rogers Reference Rogers2007, Convey et al. Reference Convey, Gibson, Hillenbrand, Hodgson, Pugh, Smellie and Stevens2008, Boger Reference Boger2011), and the Antarctic biota - like that of other remote areas - remains relatively poorly understood (Convey Reference Convey2010, Costello et al. Reference Costello, Coll, Danovaro, Halpin, Ojaveer and Miloslavich2010, Griffiths Reference Griffiths2010, Luypaert et al. Reference Luypaert, Hagan, McCarthy, Poti, Jungblut, Liebich and Bode-Dalby2020). As eDNA approaches do not require the identification of organisms on site and are a fast and non-invasive, they represent an excellent option for enabling biodiversity research in isolated areas. Optimizing research outcomes from such high-value (and often irreplaceable) samples is of paramount importance.
Multiple protocols have been reported for assessing biodiversity through eDNA (Thomsen & Willerslev Reference Thomsen and Willerslev2015, Lear et al. Reference Lear, Dickie, Banks, Boyer, Buckley and Buckley2018). The choice of protocol for collecting and processing samples (in both the laboratory and the computer) is important and can have substantial impacts on the biological results obtained (Dineen et al. Reference Dineen, Aranda, Anders and Robertson2010, Smith & Peay Reference Smith and Peay2014, Deiner et al. Reference Deiner, Walser, Mächler and Altermatt2015, Clarke et al. Reference Clarke, Beard, Swadling and Deagle2017, Alberdi et al. Reference Alberdi, Aizpurua, Gilbert, Bohmann and Mahon2018, Jeunen et al. Reference Jeunen, Knapp, Spencer, Taylor, Lamare and Stat2019, Schenekar et al. Reference Schenekar, Schletterer, Lecaudey and Weiss2020, Castro et al. Reference Castro, Meyer, Shapiro, Shirazi, Cutler, Lagos and Quiroga2021, Coutant et al. Reference Coutant, Cantera, Cilleros, Dejean, Valentini, Murienne and Brosse2021, Swenson & Gemeinholzer Reference Swenson and Gemeinholzer2021). Although eDNA research has increased dramatically over the last decade (Lear et al. Reference Lear, Dickie, Banks, Boyer, Buckley and Buckley2018), the effects of protocol choice on assessments of diversity using eDNA from soil samples are not well characterized in the published literature. Usually, studies have focused on interpreting eDNA sequencing results, and less attention has been given to investigating the effects of eDNA method selection (Table S1; Drummond et al. Reference Drummond, Newcomb, Buckley, Xie, Dopheide and Potter2015, Horton et al. Reference Horton, Kershner and Blackwood2017, Lanzén et al. Reference Lanzén, Lekang, Jonassen, Thompson and Troedsson2017, Walker et al. Reference Walker, Leys, Dunham, Oliver, Schiller and Stephenson2017, Sikder et al. Reference Sikder, Vestergård, Sapkota, Kyndt and Nicolaisen2020, Calderón-Sanou et al. Reference Calderón-Sanou, Münkemüller, Zinger, Schimann, Yoccoz and Gielly2021, Kirse et al. Reference Kirse, Bourlat, Langen and Fonseca2021, Lopes et al. Reference Lopes, Baêta, Sasso, Vanzetti, Raquel Zamudio, Taberlet and Haddad2021, Pansu et al. 2021). Currently, there are a few typical methods for analysing eDNA (Table S1; Lear et al. Reference Lear, Dickie, Banks, Boyer, Buckley and Buckley2018). In the laboratory, commercial extraction kits, such as PowerSoil® DNA Isolation Kit (Qiagen GmbH) or NucleoSpin® Soil (MACHERY-NAGEL GmbH & Co. KG) are often used. Some soil eDNA studies have indicated that the use of different protocols might influence the results obtained (Young et al. Reference Young, Weyrich, Clarke and Cooper2015). For example, sample preservation, sample volume, sample replicability, primer choice, DNA extraction and library preparation are key steps in processing eDNA soil samples and thus can affect biodiversity estimates (Taberlet et al. Reference Taberlet, Prud'Homme, Campione, Roy, Miquel and Shehzad2012, Dopheide et al. Reference Dopheide, Xie, Buckley, Drummond and Newcomb2019, Guerrieri et al. Reference Guerrieri, Bonin, Münkemüller, Gielly, Thuiller and Francesco Ficetola2021, Kirse et al. Reference Kirse, Bourlat, Langen and Fonseca2021).
In this study, we aimed to optimize and standardize the workflow for low-biomass soil eDNA studies by assessing the effectiveness, cost-efficiency and time-efficiency of different sampling strategies, DNA extraction protocols and library preparation methods, focusing on the Antarctica environment. Firstly, we evaluated the heterogeneity of soil diversity signals within a small area and determined the feasibility of sample pooling to reduce the number of extractions needed to capture small-scale variations. Secondly, we compared the DNA yield and diversity obtained from soil samples treated with four different DNA extraction protocols. Finally, we compared the diversity between the two library preparation methods.
Materials and methods
Different sampling, extraction and quantitative polymerase chain reaction (qPCR) methods were tested to establish a cost- and time-effective standardized method for obtaining metabarcoding data from soil samples from Antarctica. We evaluated the biological heterogeneity and representativeness of soil sampling by comparing four samples from a quadrat individually and as a combined sample that effectively reduced the number of DNA extractions needed per site. We also compared the effectiveness of four different DNA extraction protocols. Finally, we compared the performance of two library preparation methods - a one-step qPCR (in which indexed primers are directly added to the sample before PCR) and a two-step qPCR (in which indices are added only after amplification of the target marker, potentially eliminating the need for indexed marker-specific primers) - on all of the extractions made (Fig. 1).
Soil sampling
Soil samples were collected in December 2018 from East Antarctica at Stornes Peninsula (69°24.011'S, 76°05.382′E), the largest and westernmost peninsula within the Larsemann Hills (Fig. 2), a coastal ice-free area on the Ingrid Christensen Coast.
To assess variation in diversity detected across small spatial scales, ~50 g soil from 0–10 cm depth was collected from each corner of a 1 m2 quadrat (Fig. 1; Lee et al. Reference Lee, Barbier, Bottos, McDonald and Cary2012). The soil was collected using an autoclaved stainless-steel spoon (a different spoon was used for each sample). Spoons were stored in a sealed and sterile wrapping following autoclaving. Nitrile gloves and surgical facemasks were worn during the sampling to avoid contamination of the samples. Immediately after their collection, samples were transported in an insulated container filled with ice to Davis Station, where they were stored at -20°C. Subsequently, they were relocated to the University of Otago, New Zealand, where they were stored in a dedicated PCR-free laboratory facility at -80°C.
Sample processing
To determine the heterogeneity of the DNA signal within the 1 m2 sampling area and the feasibility of sample pooling to reduce the number of DNA extractions required, the four soil samples were processed both individually and as a combined sample. Combined samples comprised a soil mixture made of ~10 cm3 from each of the four original samples.
DNA extraction
To determine which extraction method could yield the most information (greatest number of reads and taxonomic diversity relative to other methods trialled here when analysed in the same way), each sample was extracted using four approaches involving two commonly used commercial extraction kits: 1) PowerSoil® DNA Isolation Kit and 2) NucleoSpin® Soil, and each one of the approaches mentioned above was also used in combination with saturated phosphate buffer: 3) PowerSoil® + saturated phosphate buffer (PS® + buffer) and 4) NucleoSpin® Soil + saturated phosphate buffer (NS® + buffer). The saturated phosphate buffer was used to increase the amount of starting material targeting only extracellular DNA (Pietramellara et al. Reference Pietramellara, Ascher, Borgogni, Ceccherini, Guerri and Nannipieri2009, Taberlet et al. Reference Taberlet, Prud'Homme, Campione, Roy, Miquel and Shehzad2012; see also the DNA extraction protocols in the Supplemental Material).
Individual samples were extracted in duplicate, while combined samples were extracted in triplicate (i.e. eight individual extractions and three combined extractions for each extraction protocol). The DNA concentration of each sample was quantified via the Qubit® dsDNA HS Assay Kit on a Qubit® 2.0 Fluorometer (ThermoFisher Scientific).
To decrease the risk of PCR contamination, samples were processed in dedicated PCR-free rooms that were physically separate from PCR laboratories. Furthermore, all bench surfaces and laboratory equipment were treated with bleach, irradiated with ultraviolet light and rinsed with ultrapure water before and after use (Prince & Andrus Reference Prince and Andrus1992). Extraction-negative controls were also included for each DNA extraction protocol that was tested.
Library preparation and sequencing
Two library preparation methods - a one-step qPCR protocol (Berry et al. Reference Berry, Osterrieder, Murray, Coghlan, Richardson and Grealy2017) and a two-step qPCR protocol (Miya et al. Reference Miya, Sato, Fukunaga, Sado, Poulsen and Sato2015) - were tested to determine whether they influenced the biodiversity results. In a one-step qPCR, DNA is synthesized and amplified in a single reaction tube. In a two-step qPCR protocol, DNA is synthesized in a primary reaction, and subsequently a portion of the synthesized DNA is used for amplification in a second reaction tube (Wacker & Godard Reference Wacker and Godard2005). For both methods, the 1391f/EukBr assay targeting the V9 region of the 18S rRNA gene, amplicon size of ~130 bp (range between 87 and 186 bp), was used (Amaral-Zettler et al. Reference Amaral-Zettler, McCliment, Ducklow and Huse2009, Stoeck et al. Reference Stoeck, Bass, Nebel, Christen, Jones, Breiner and Richards2010).
Before library preparation, DNA for each sample was optimized using a dilution series (neat, 1/10, 1/100) to reduce the impact of inhibitors and low amounts of template (Murray et al. Reference Murray, Coghlan and Bunce2015). Amplification was carried out in 25 μl reactions, prepared with 1× SensiFAST™ SYBR® Lo-ROX Kit (Bioline, Meridian Bioscience), 0.2 μmol/l of each primer (Integrated DNA Technologies) and 2 μl of DNA. The qPCR conditions included an initial denaturing step at 95°C for 10 min, followed by 50 cycles of 45 s at 95°C, 1 min at 60°C, 1.5 min at 72°C and a melt curve of 15 s at 95°C, 1 min at 60°C and 1 s at 95°C.
The one-step library preparation followed the protocol described in Berry et al. (Reference Berry, Osterrieder, Murray, Coghlan, Richardson and Grealy2017) and Stat et al. (Reference Stat, Huggett, Bernasconi, DiBattista, Berry and Newman2017) using fusion primers containing a modified Illumina sequencing adapter, a barcode tag (6–8 bp in length) and the template-specific primer (Fig. S5). Each sample was amplified in duplicate and assigned a unique barcode combination to allow pooling of samples post-qPCR. One qPCR control was included in each column and row, which contained ultrapure water (UltraPure™ DNase/RNase-Free Distilled Water, Invitrogen) instead of the sample template.
The two-step library preparation followed the protocol described in Miya et al. (Reference Miya, Sato, Fukunaga, Sado, Poulsen and Sato2015). The primers of the first qPCR round were modified to contain Illumina sequencing primer tails. Amplification was carried out as described above but with the cycle number reduced (i.e. 20 instead of 50 cycles). Negative controls were added in each column and row. Prior to the second-round qPCR, qPCR products were diluted tenfold and used as templates in the second-round qPCR. The second-round primers consisted of an Illumina adapter, index and Illumina sequencing primer (Fig. S6). The second-round qPCR was carried out using the same reaction and thermal profile as in the first round.
qPCR duplicates of each sample showed consistency in their threshold cycle (Ct) value and end-point qPCR fluorescence, so they were pooled to reduce stochastic effects due to PCR amplification. Where there was no consistency (n = 2 out of 200 reactions), the qPCR duplicate with the lowest Ct value was used. Next, and to avoid over- or under-representation of the samples within the libraries, samples within each qPCR reaction were pooled into mini-pools to approximately equal molarity based on Ct value and end-point qPCR fluorescence. Mini-pools were normalized based on DNA concentration measurements using a Qubit® 2.0 Fluorometer (Qubit® dsDNA HS Assay Kit) and molarity measurements obtained using a QIAxcel Advance System device (© QIAGEN 2013–2020) to produce a single library for each library preparation method. Both libraries were then size-selected and purified using AMPure XP Beads (BioLabs, Inc.). Finally, both libraries were quantified using a Qubit® 2.0 Fluorometer (Qubit® dsDNA HS Assay Kit) and a QIAxcel Advance System device (© QIAGEN 2013–2020).
Sequencing was performed on an Illumina MiSeq® (one-step: 200-cycle V2, single-end; two-step: 300-cycle V2, paired-end) at the Otago Genomics Sequencing Facility, following the manufacturer's protocols and with 7.5% PhiX to minimize issues associated with low-complexity libraries.
Bioinformatic analysis
Both libraries were quality-checked using FastQC (v 0.11.9; Andrews Reference Andrews2010). Due to the use of inline indices in the one-step library, demultiplexing was performed using cutadapt (v 1.18; Martin Reference Martin2011) and Geneious Prime ® (v 2020.1.2; Kearse et al. Reference Kearse, Moir, Wilson, Stones-Havas, Cheung and Sturrock2012). First, the P7 adapter was removed using cutadapt. Next, reads were demultiplexed and assigned to samples using the ‘separate reads by barcode’ function in Geneious Prime®, without allowing mismatches.
For the two-step library, sequences were demultiplexed and assigned to samples via the MiSeq® Reporter software (Illumina). Sequences were merged using the default settings in PEAR (v 0.9.10; Zhang et al. Reference Zhang, Kobert, Flouri and Stamatakis2014). Individual fastq files per sample were imported into Geneious Prime®.
Both libraries were further processed similarly, with forward and reverse primer sequences being identified and removed from each sequence using the ‘annotate & predict’ function in Geneious Prime®, without allowing mismatches. At this stage, the libraries’ sequences were relabelled and further analysed in combination. Sequences shorter than 80 base pairs and longer than 160 base pairs were discarded using the ‘--fastq_filter’ function in vsearch (v 2.15.0; Rognes et al. Reference Rognes, Flouri, Nichols, Quince and Mahé2016). Sequences were further quality filtered based on total expected errors (--fastq_maxee 1.0) and the presence of ambiguous bases (--fastq_maxns 0). Sequences passing quality filtering were pooled into a single file, quality checked again in FastQC and dereplicated into unique sequences using the ‘--derep_fulllength’ function in vsearch. At this stage, singleton sequences were removed to minimize inflation of diversity estimates caused by errors during PCR amplification or sequencing (Brown et al. Reference Brown, Veach, Rigdon-Huss, Grond, Lickteig and Lothamer2015). Sequences were clustered at 97% to generate an operational taxonomic unit (OTU) list using the ‘--cluster_size’ function in vsearch with default settings. Chimera sequences were identified and removed using the ‘--uchime3_denovo’ function in vsearch with default settings. Finally, an OTU table was constructed using the ‘--usearch_global’ function in vsearch.
Taxonomic assignment of OTUs passing the quality filtering was queried using BLAST against the NCBI database (NCBI 2021). BLAST hits with 100% for both identity and query cover were retained per OTU. The lowest taxonomic level across the remaining BLAST results was found for further statistical analysis.
Before statistical analyses, the OTU table underwent a second filtering process. For each library, any OTU that was represented in the negative control with more than nine sequencing reads was deleted to decrease false-positive signals originating from fieldwork and/or laboratory work. Next, and to account for barcode hopping, OTUs represented by fewer than ten sequencing reads were considered unreliable and set to zero in each sample for both libraries (Guenay-Greunke et al. Reference Guenay-Greunke, Bohan, Traugott and Wallinger2021). Subsequently, OTUs not adequately represented in any sample for each library (i.e. zero reads across all of the samples for each library) were discarded. Finally, the OTU tables of each library were combined for the subsequent statistical analyses.
Statistical analyses
Rarefaction curves were used to evaluate the relative diversity among the samples of both libraries. The function ‘rarecurve’ from the package ‘vegan’ was used in R (Oksanen et al. Reference Oksanen, Blanchet, Friendly, Kindt, Legendre and McGlinn2020, R Core Team 2020). Each OTU table was transformed to binary data (i.e. presence and absence data) using the ‘vegan’ function ‘decostand', method ‘pa'.
The following statistical analyses were carried out using the presence/absence table of OTUs and both total and eukaryotic taxonomic assignment. Principal coordinate analysis (PCoA) was used to investigate the diversity of eukaryotes and OTUs across the different treatments (sample processing, DNA extraction protocols and library preparation approaches). The Jaccard index was used to visualize patterns of similarity in the presence and absence of eukaryotes and OTUs across the treatments. To analyse dissimilarities among treatments, Jaccard (binary) pairwise distances within the data matrix were calculated using the ‘vegdist’ function from the ‘vegan’ package. The ‘vegan’ package was used for all following analyses unless otherwise stated. To visually analyse the dissimilarity results, the distance matrix results were plotted in a two-dimensional space. PCoAs were conducted using the function ‘wcmdscale'.
Significant differences among groups were investigated using a permutational multivariate analysis of variance (PERMANOVA) and analysis of similarities (ANOSIM) using ‘adonis’ and ‘anosim’ (Clarke Reference Clarke1993, Anderson Reference Anderson2017). To analyse homogeneity of variances within groups (i.e. dispersion effects), the function ‘betadisper’ was used with Tukey honestly significant difference (HSD) post-hoc tests to identify differences between groups.
Taxonomic and OTU richness values within treatments were compared using the non-parametric Kruskal-Wallis rank sum test using the function ‘kruskal.test'. Significant differences (P < 0.05) within treatments were identified using the ‘pairwise.wilcox.test’ function. The average OTU richness was plotted with 95% confidence intervals (CIs). Indicator species analyses were run to identify characteristic species within levels of treatments. Indicator values and frequency (P < 0.05) were calculated using the ‘indval’ function of the ‘LABDSV’ package for Eukaryota and OTU data. Venn diagrams and pie charts were also used to visually compare taxonomic and OTU coverage in both libraries.
Results
Sequencing reads
After bioinformatic quality filtering, 3 856 576 and 6 015 089 sequencing reads were returned for the one-step and two-step approaches, respectively (Table I). After the second filtering process, 1 667 347 and 2 793 916 sequencing reads were retained for the one-step and two-step libraries, respectively. Rarefaction curves (Fig. S1) showed that few new OTUs were observed after 50 000 reads. Sequencing was performed to adequate depth on both libraries, as shown by the flattening of the rarefaction curves.
Diversity across treatments
Significant differences were found between the qPCR approaches in the PERMANOVA analysis (R 2 = 0.034, F 1,78 = 3.07, P < 0.001); however, these differences were not visible in the PCoA plot (Fig. 3a). Thus, permutational multivariate analysis of dispersion (PERMDISP) analyses were performed to test for homogeneity of variance, producing non-significant results (F1,82 = 0.37, P < 0.6). The extraction protocol factor was significant when tested via PERMANOVA analysis (R 2 = 0.08, F 3,78 = 2.22, P < 0.001). The NucleoSpin® Soil extraction protocol showed a cluster and the other three extraction protocol samples showed no patterns of similarity and overlapping distribution in the PCoA plot. No statistically significant difference was found in PERMANOVA analysis among sampling processing methods. Individual and combined samples did not show dissimilar distributions in the PCoA plot. Similar PERMANOVA outcomes were provided for taxonomically agnostic analyses (Supplemental Material).
OTU diversity
The Kruskal-Wallis test provided no evidence for significant differences in the average of eukaryote OTU diversity between the one- and two-step qPCR approaches (χ21, n = 88 = 0.48). On average (±95% CI), 7.0 ± 1.2 and 7.5 ± 1.5 eukaryote OTUs were found in samples from the one- and two-step approaches, respectively (Fig. 4). Similarly, no significant differences were found between the diversity of eukaryote OTUs in the individual and combined samples (χ21, n = 88 = 0.0004, P = 0.99; 7.2 ± 1.6 and 7.3 ± 3.1 OTUs, respectively). Strong evidence for significant differences were indicated by the Kruskal-Wallis tests for diversity estimates among extraction kits samples (χ23, n = 88 = 39.288, P < 0.001). The post-hoc Wilcox test for pairwise comparisons showed that the NS® + buffer protocol presented the greatest differences compared with the other three extraction kits, with a mean of 12.13 ± 1.37 Eukaryota OTUs found using the NS® + buffer kits, compared to 4.72 ± 1.65, 6.84 ± 2.10 and 5.30 ± 3.02 using the PS® + buffer, PowerSoil® and NucleoSpin® Soil protocols, respectively.
Indicator species analysis
Indicator species analysis of eukaryotic OTUs found three organisms to be significantly related (P < 0.05) in the one-step qPCR approach (one fungus and two chlorophytes) and four such organisms in the two-step approach (a rotifer (Bdelloidea), a protist (Basidiomycota) and two unknown eukaryotes; Table II). The protist ‘Cochliopodium’ and the fungus ‘Microbotryomycetes’ were found to be significantly related (P < 0.05) in the PowerSoil® and PS® + buffer extraction protocols, respectively. Eleven organisms were classified as significantly related in the NucleoSpin® Soil extraction kit, including two plants, four fungi, four protists and one unknown eukaryote. No indicator species were detected for both the individual and combined sampling methods.
OTU and taxonomic coverage
Taxonomic coverage in both libraries was similar. For both libraries, bacteria formed the dominant group, followed by eukaryotes and Archaea (Fig. 5). There was a 38% overlap in OTUs detected by the one-step (2973 OTUs) and two-step (2642 OTUs) approaches (Fig. 6), and all 4050 OTUs were assessed via BLAST against the GenBank database. After quality filtering, 309 OTUs were assigned to taxa: 227 and 258 for the one- and two-step approaches, respectively, with 57% (187 OTUs) found in both libraries. The two-step approach had a higher number of taxa assigned than the one-step approach. The Eukaryota were represented by a total of 98 organisms overall. In both libraries, protists were the most abundant group, followed by fungi and, finally, animals and plants. Some 29% of the total eukaryote taxa could not be assigned below this domain. Acutuncus antarcticus (a tardigrade) and Geotria australis (wide-mouthed lamprey) were among the species found in both libraries (Table S5). OTUs from Arachnida, Collembola, Nematoda, Ulvophyceae and Trebouxiophyceae were present in both libraries, and the two-step approach also yielded OTUs from Diptera, Rotifera and Bivalvia.
Discussion
Our results show that the choice of eDNA protocol can influence biodiversity estimates. Markedly higher OTUs assigned to eukaryotic taxa were recovered with NS® + buffer compared to the other protocols tested (i.e. PowerSoil®, NucleoSpin® Soil and PS® + buffer), reinforcing the notion that choice of extraction method is critical in experimental design. The amount of soil used with the NS® + buffer method (as well as the PS® + buffer method) was 5 g, compared to ~0.25 g used for the PowerSoil® and NucleoSpin® Soil methods. The volume of soil used by the kits could have influenced the diversity estimates, as more material could have increased the chance of picking up rarer/more patchily distributed macro-organisms. However, no evidence for this is provided by the results for PS® + buffer, where similar and lower values in the one- and two-step approaches, respectively, were found compared to the other extraction protocols (Fig. S4). Other studies have anticipated advantages of using the NucleoSpin® Soil kit combined with the phosphate buffer to increase diversity findings (Taberlet et al. Reference Taberlet, Prud'Homme, Campione, Roy, Miquel and Shehzad2012, Pansu et al. Reference Pansu, De Danieli, Puissant, Gonzalez, Gielly and Cordonnier2015, Calderón-Sanou et al. Reference Calderón-Sanou, Münkemüller, Zinger, Schimann, Yoccoz and Gielly2021, Kirse et al. Reference Kirse, Bourlat, Langen and Fonseca2021), but to our knowledge the buffer approach has not previously been tested with the PowerSoil® kit.
Overall, taxonomic assignments were similar across libraries, indicating no influence of the qPCR protocol used (Table S5). Importantly, the finding of potential Antarctic organisms, such as A. antarcticus (Antarctic tardigrade), Antarctic Bdelloidea (Rotifera), Collembola, Nematoda, Tyrophagus (mites) and Trebouxiophyceae (green algae; Convey & Stevens Reference Convey and Stevens2007) validates the use of our protocols to identify Antarctic organisms. The detection of edible plants of the genera Brassica (e.g. cauliflower) and Daucus (e.g. carrot) with the one-step library was a somewhat surprising result. This could reflect common metabarcoding error/bias in the amplification process, genetic databases or bioinformatics or also indicate contamination (van der Loos & Nijland Reference van der Loos and Nijland2021). Currently, only two vascular plants have been reported to be living in Antarctica (i.e. Colobanthus quitensis and Deschampsia antarctica, for both of which 18S gene sequences can be found in GenBank), and these are found only in the Antarctic Peninsula (Holtom & Greene Reference Holtom and Greene1967). Therefore, contamination by Brassica and Daucus occurred either in the laboratory or the field. In the laboratory, strict procedures were performed to avoid contamination, so this source seems unlikely (but not impossible). In the field, personnel also followed strict protocols, including bleaching and rinsing footwear immediately before access by foot to the sampling site, and while on-site, field personnel were forbidden from walking over the soil collection area. Perhaps most plausibly, contamination of the site itself could have occurred due to the people/facilities around Stornes in the Larsemann Hills. In the Larsemann Hills, along a ~15 km transect, there are four research stations, including the Bharati, Law-Racovita, Zhongshan and Progress 2 (COMNAP 2019). Regular human activities occur around these stations, including the use of plant equipment and aircraft. Furthermore, anthropogenic pollution is often found around research stations in Antarctica (Fig. S7; Campbell et al. Reference Campbell, Claridge and Balks1994, Bruni et al. Reference Bruni, Maugeri and Monticelli1997, Sheppard et al. Reference Sheppard, Claridge and Campbell2000, Tin et al. Reference Tin, Fleming, Hughes, Ainley, Convey and Moreno2009, Aronson et al. Reference Aronson, Thatje, McClintock and Hughes2011, Chu et al. Reference Chu, Dang, Kok, Ivan Yap, Phang and Convey2019), and birds could potentially carry foodstuffs some distance from bases.
The animal and plant taxa found (~0.5% out of the total OTUs BLASTed) represent relatively low numbers in the analysis compared to other eDNA studies that have also used soil to study plants and animals (Jørgensen et al. Reference Jørgensen, Kjaer, Haile, Rasmussen, Boessenkool and Andersen2012, Kisand et al. Reference Kisand, Talas, Kisand, Stivrins, Reitalu and Alliksaar2018), including a previous study in Antarctica (Czechowski et al. 2016). However, in these studies, higher amounts of starting material and more replication were used. To some extent, the low eukaryotic diversity found in this study might be an artefact of there being relatively few 18S Antarctic sequences available in public genetic databases (Clarke et al. Reference Clarke, Suter, Deagle, Polanowski, Terauds, Johnstone and Stark2021). Although several Antarctic species can be found in genetic databases, some authors have also speculated that most of the Antarctic organisms taxonomically classified have not been genetically identified, and that many more are yet to be discovered (Stevens & Hogg Reference Stevens, Hogg, Bergstrom, Convey and Huiskes2006, Grant et al. Reference Grant, Griffiths, Steinke, Wadley and Linse2011, Schiaparelli et al. Reference Schiaparelli, Danis, Wadley, Stoddart, Verde and di Prisco2013, Velasco-Castrillón et al. Reference Velasco-Castrillón, Gibson and Stevens2014, Chown et al. Reference Chown, Clarke, Fraser, Cary, Moon and McGeoch2015, Brasier et al. Reference Brasier, Wiklund, Neal, Jeffreys, Linse, Ruhl and Glover2016, Gutt et al. Reference Gutt, Isla, Bertler, Bodeker, Bracegirdle and Cavanagh2018). Lear et al. (Reference Lear, Dickie, Banks, Boyer, Buckley and Buckley2018) showed that the 18S gene is the most popular marker for targeting eukaryotes in metabarcoding research. 18S primers have been shown to outperform primers from other genes (von Ammon et al. Reference von Ammon, Wood, Laroche, Zaiko, Tait and Lavery2018, Sawaya et al. Reference Sawaya, Djurhuus, Closek, Hepner, Olesin and Visser2019, Tytgat et al. Reference Tytgat, Nguyen, Nguyen, Pham, Long, Vanreusel and Derycke2019, Hestetun et al. Reference Hestetun, Lanzén and Dahlgren2021). The V9 subregion of the eukaryotic SSU (18S) rRNA gene - used in this research - has both highly conserved and highly variable sites, thus showing great taxonomic resolution potential (Amaral-Zettler et al. Reference Amaral-Zettler, McCliment, Ducklow and Huse2009, Wu et al. Reference Wu, Xiong and Yu2015). An improved 18S reference database for Antarctic taxa would enhance the capacity of eDNA studies to resolve taxonomic assignments.
This research both answers and raises important questions regarding the use of eDNA to study Antarctic biodiversity. We have shown that small amounts of sample material (soil) can yield good results and that subsamples can be combined to improve efficiency (reduce the numbers of extractions needed) in the laboratory. However, choice of markers was not an aspect directly tested here, and understanding what genetic information (markers and taxa) is already available for Antarctic terrestrial biodiversity will be an important part of optimizing results for future Antarctic eDNA work. A potential alternative for the further improvement of eDNA methods seeking to target valuable and limited samples is to use artificially constructed communities to help refine and optimize metabarcoding approaches (Gonzalez et al. Reference Gonzalez, Portillo, Belda-Ferre and Mira2012). Previous research has already shown the utility of metabarcoding in artificial communities of different taxonomical groups (Reva et al. Reference Reva, Zaets, Ovcharenko, Kukharenko, Shpylova and Podolich2015, Creedy et al. Reference Creedy, Ng and Vogler2019, Thomas et al. Reference Thomas, Dittami, Brunet, Le Duff, Tanguy, Leblanc and Gobet2020), but as yet the functionality of this approach for the study of Antarctic terrestrial taxa is unknown. Using multiple-marker approaches, building comprehensive databases, decreasing genetic information loss during sample analyses and increasing collaborative research, among others, will underpin the successful application of eDNA to Antarctic research (von Ammon et al. Reference von Ammon, Wood, Laroche, Zaiko, Tait and Lavery2018, Adamowicz et al. Reference Adamowicz, Boatwright, Chain, Fisher, Hogg and Leese2019, Ruppert et al. Reference Ruppert, Kline and Rahman2019).
Antarctic ice-free areas are home to unique biota. Comprising < 0.5% of Antarctica, ice-free areas are highly threatened by a changing environment and human activities on the continent (Tin et al. Reference Tin, Lamers, Liggett, Maher, Hughes, Tin, Liggett, Maher and Lamers2014, Fretwell Reference Fretwell2016, Pertierra et al. Reference Pertierra, Hughes, Vega and Olalla-Tárraga2017, Brooks et al. Reference Brooks, Jabour, van den Hoff and Bergstrom2019, Lindsay & Yoon Reference Lindsay and Yoon2021, Shan et al. Reference Shan, Xiang, Feng, Wu, Yang and Zhu2021). Understanding the past and current diversity and distribution of Antarctic organisms is key to predicting the future of biodiversity for the continent (Chown et al. Reference Chown, Clarke, Fraser, Cary, Moon and McGeoch2015). Filling gaps in knowledge is vital for policymaking and consequently the protection of fragile Antarctic environments (Convey Reference Convey2011, Chown et al. Reference Chown, Clarke, Fraser, Cary, Moon and McGeoch2015). The use of automated, non-invasive and remote techniques may increasingly be necessary for Antarctic research. As in the present study, other investigations have already shown the effectiveness of eDNA metabarcoding in aquatic and terrestrial Antarctic environments, such as the use of sponges as natural samplers in the Southern Ocean and many studies on the endemic biota of the continent (Lee et al. Reference Lee, Barbier, Bottos, McDonald and Cary2012, Czechowski et al. 2016, Cowart et al. Reference Cowart, Murphy and Cheng2018, Fraser et al. Reference Fraser, Connell, Lee and Cary2018, Mariani et al. Reference Mariani, Baillie, Colosimo and Riesgo2019, Câmara et al. Reference Câmara, Carvalho-Silva, Pinto, Amorim, Henriques and da Silva2021, Carvalho-Silva et al. Reference Carvalho-Silva, Rosa, Pinto, Silva, Henriques, Convey and Câmara2021, Howell et al. Reference Howell, LaRue and Flanagan2021). The capacity of eDNA metabarcoding to provide valuable information for relatively little effort in Antarctica and remote areas is changing the nature and speed of our acquisition of biodiversity knowledge. Our research on the efficiency of different eDNA metabarcoding protocols for studying Antarctic terrestrial organisms provides important insights into the development of tools for Antarctic conservation.
Here, we showed that both one- or two-step qPCR approaches can yield good results, but that efficiencies can be increased and optimizations made in the way samples are extracted. Our findings provide useful methodological insights for future Antarctic research, opening the door to important questions to solve and bringing new methods to bear for the study of Antarctic biodiversity.
Disclaimer
Stornes is protected under the management plan of Antarctic Specially Protected Area (ASPA) number 174 (Antarctic Treaty Secretariat 2014), and sampling was carried out with appropriate permits (permit number 18-19-4370, Australian Antarctic Division), in line with the management plan (Antarctic Treaty Secretariat 2014).
Acknowledgements
For field support, we thank Marty Benavente and Andrew Harrison. For laboratory advice and assistance, we thank Sara Ferreira, Olga Kardailsky, Eden Zhang, Belinda Ferrari and Masaki Miya. The authors would also like to thank two anonymous reviewers for their feedback.
Financial support
Field logistics were supported by the Australian Antarctic Division, AAS Project 4370. PO-R was supported by a University of Otago Divisional Strategic Doctoral Scholarship. Genomic work was funded by a Royal Society of New Zealand Rutherford Discovery Fellowship to CIF (RDF-UOO1803) and by PhD student research support from the Departments of Marine Science and Anatomy at the University of Otago. PO-R, CIF and ML are also supported by the Antarctic Science Platform, MBIE New Zealand (ANTA1801).
Author contributions
PO-R, G-JJ and CIF designed the study. PO-R and JT collected the soil samples. PO-R performed the laboratory, bioinformatic and statistical analyses. G-JJ, CIF and NG provided significant input to the laboratory, bioinformatic and statistical analyses. AT provided advice on statistical analyses. JT provided advice on sample collection. ML provided advice on planning, concepts and writing. PO-R wrote most of the first draft of the manuscript, and all other authors contributed to subsequent drafts.
Supplemental material
A supplemental results section containing seven supplemental figures and six supplemental tables, a supplemental protocols section and a supplemental statistical analyses section will be found at https://doi.org/10.1017/S0954102022000384.