Summary
Adenovirus type 41 (Ad41) is a double-stranded DNA virus with non-enveloped capsid and etiological agent of diarrhoea in children aged below 2 years. It often results in fatal systemic disseminated disease in immunocompromised individuals and has more recently been associated with (possibly as a helper virus of Adeno-associated virus type 2) hepatitis of unknown origin in the same demographic [Reference Gutierrez Sanchez2, Reference Ho3, Reference Liu5–Reference Servellita7]. Adenoviruses are members of the genus Mastadenovirus, family Adenoviridae, which are classified into seven species (Human mastadenovirus A-G) and contain ~100 types. Mastadenovirus types 40 and 41 are the only members of the species Human mastadenovirus F and are transmitted via the faecal-oral route, shed in the faeces of infected persons, and are stable in wastewater (WW) for weeks [Reference Gerba1, Reference Liu5].
Using long-range polymerase chain reaction (PCR), WW-based epidemiology (WBE), and pathogen sequencing, we explored Ad41 diversity in WW by analysing 58 samples collected (over 24 h using time- or flow-weighted automated samplers) from 10 different sites in two municipalities (population: ~700,000) in Maricopa County, Arizona (USA) between October 2019 and March 2020. For each of the 6 months (except March 2020), 10 archived samples per month were recovered from the freezer and thawed overnight. Only eight sites were sampled in March 2020 because two of the locations were not collected for logistic reasons attributed to the onset of the COVID-19 pandemic. Samples were size fractionated, and both filtrate and filter-trapped solids (FTS) were concentrated (~1,000×) using 10,000 Da molecular weight cut-off centrifugal filters (see Supplementary Materials for detailed methods). Hence, for each of the 6 months, we studied two concentrates, one for filtrate and one for FTS.
We subjected all 12 concentrates to nucleic acid extraction using the QIAamp viral RNA Mini Kit following the manufacturer’s instructions and amplified the complete Ad41 genome in eight overlapping ~5 kb fragments via two multiplex PCR assays (Supplementary Tables S1 and S2). We confirmed Ad41 presence using a second-round (nested) PCR assay (targeting the fibre gene), which included the pooled first-round amplicons as a template (Tables S1 and S2). Amplicons from the nested assay were Sanger sequenced and the sequences confirmed that all had Ad41. Their respective first-round amplicons were then sequenced on an Illumina sequencer (see Supplementary Materials).
Illumina sequencing of the first-round amplicons from the 12 concentrates yielded 26,191,460 raw reads. Of these, 13,561,483 (52%) mapped to a reference Ad41 complete genome (MW567966, Table S3) identified in a patient in France in 2018 [Reference Lefeuvre4]. Reference-guided assembly showed that one of the eight 5 kb first-round PCR assays (which covers the genomic region containing the penton protein complete coding sequence) failed. Thus, we recovered about 87.5% of the Ad41 genome.
Subsequently, we investigated the variant profile of hexon coding sequences and the two (small and large) fibre protein genes. Variant analysis of the genes showed seven, one, and three unique profiles for the hexon, small fibre, and long fibre genes, respectively (Table 1, Tables S4 and S5). The variability detected in the hexon gene region was primarily within the hypervariable region (HVR) (Table 1).
To understand how the variant profiles of Ad41 found in WW in this study track with those present in variants publicly available in GenBank, we collected the hexon protein gene complete coding sequences of the top 100 variants downloaded from a GenBank search using MW567966 as the query. Precisely, we extracted 59 of the variants from complete genomes, whereas the remaining 41 were not part of complete genomes but had the complete hexon protein coding sequence publicly available in GenBank. Phylogenetic and pairwise similarity analyses of these Ad41 hexon gene sequences showed they clustered into six (L1–L6) phylogenetic lineages (Figure S1). Intra-lineage divergence was less than 0.4% (Figure S1), and each lineage had a unique combination of amino acid substitutions in the HVR (Table 1 and Figure S2).
Comparison of amino acid variation profiles of the hexon genes we found with those in GenBank, showed that hexon variant 1 (H1) = L1, H5 = L2, H3 is a subset of L3, and H7 = L5 + L6 (Table 1). We also found that the amino acid variation profiles H2, H4 (possibly H2 + H3; see Table 1), and H6 were not represented in lineages L1–L6 (Table 1). The seven variant profiles (H1–H7) detected in the concentrates analysed in this study contained between 1 and 22 amino acid substitutions (Table 1). Only in the January 2020, sample did the hexon gene variant profile of the FTS match the filtrate. In the all other months, they were different (Figure 1).
To understand how the Ad41 variants recently identified in children with hepatitis of unknown origin fit into this schema, we aligned the published [Reference Gutierrez Sanchez2] hexon genes (ON565007–ON565011) with representatives of the six lineages. Our data show that they belong to lineages L1, L2, and L6 with both genomes identified in children with acute liver failure belonging to lineage L6 (Figure S2).
For the long-fibre protein, the F251V substitution was the only amino acid substitution detected in the variant analysis (Table S4 and Figure S3), and when present, was attributed to 38%–55% of mapped reads in all samples. Hence, suggesting that the F251 was equally present in the population. Variant analysis also showed the presence of a variant with a 45 nt (15aa) deletion in the fibre gene in January 2020 (Table S4 and Figure S4). However, it was only found in FTS and not the filtrate (Table S5). For the small fibre gene, the L362F substitution was the only amino acid substitution detected by variant analysis and was present in more than 99% of the raw reads in all samples.
In conclusion, coupling long-range PCR detection with WBGE, we show that all three Ad41 hexon protein lineages recently associated with hepatitis of unknown origin [Reference Gutierrez Sanchez2] were present in the United States in 2019 (Table 1 and Figure S2). Our data also suggest that there may be circulating Ad41 variants (H2 and H4, Table 1) whose complete genomes have either not been sequenced or sequenced but not publicly available in GenBank. Here, we demonstrate the use of WBGE to monitor Ad41 diversity on a population scale. Specifically, our data show that by targeting a protein-coding region under evolutionary pressure from the immune system (like the hexon protein gene), we can capture diversity (both amino acid substitutions (Table 1) and gross deletions like the 15aa deletion, Figure S4), and we also can use variant profiling coupled with case-based surveillance data to determine which lineages are likely circulating in the population at any point in time (Table 1 and Figure S2).
Our results highlight a limitation of the tiled-amplicon approach for WBGE. Analysis of Ad41 complete genome sequence data publicly available in GenBank (all of which have similarity >98%) showed that recombination is ongoing between the hexon gene and the fibre genes (Figure S3), which are >8 kb apart. However, since the two amplification pools in the tiled amplicon approach are run in different tubes, there is no guarantee that the template genome(s) in both pools are from the same variant; that is, that overlapping tiles are from the same variant and consequently contiguous. Furthermore, short-read sequencing makes it difficult to unambiguously ascertain the co-evolution of distant Ad41 genomic regions (and consequently amino acid substitutions). It might therefore be necessary to implement long-range PCR assays that amplify penton-to-long-fibre or at least hexon-to-long-fibre and couple with long-read sequencing strategies. Such an approach could enable scientists to study the co-evolution of distant Ad41 genomic regions using variants recovered from WW. It might be necessary to also consider this approach for the WBGE of other DNA viruses of clinical significance.
Our data (Figure 1) also showed that commonly performed size fractionation of WW samples prior to ultrafiltration does impact our perception of virus presence and diversity. Hence, prioritizing one sample fraction over the other might result in an incorrect representation of viral diversity. We therefore recommend virus recovery from both partitions (filtrate and FTS) for a more accurate representation of virus presence and diversity in WW samples.
Funding statement
The research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under Award Number U01LM013129 to RUH, MS, and AV. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Competing interest
R.U.H. is the founder of OneWaterOneHealth, a non-profit project of the Arizona State University Foundation. R.U.H. and E.M.D. are the co-founders of AquaVitas, LLC, an ASU startup company operating in the field of WBE.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/S095026882400133X.
Data availability statement
The sequences described in this study have been deposited in SRA under accession numbers SRR21987059 – SRR21987066 and SRR21987070 – SRR21987072.
Acknowledgements
The authors would like to thank the City of Tempe for sample collection. The authors would also like to thank the Genomics Core at Biodesign Institute, Arizona State University for help with library preparation, Illumina and Sanger sequencing.
Author contribution
Investigation: A.S., A.E., A.Y., E.M.D., N.K., P.S., S.A., T.O.C.F., T.P.; Writing – review & editing: T.O.C.F., A.S., A.E., A.V., A.Y., E.M.D., N.K., P.S., R.U.H., S.A., T.P., M.S.; Funding acquisition: A.V., R.U.H., M.S.; Validation: A.V., T.O.C.F., M.S.; Resources: R.U.H., M.S.; Formal analysis: T.O.C.F.; Methodology: T.O.C.F.; Visualization: T.O.C.F., A.V.; Writing – original draft: T.O.C.F.; Supervision: M.S.