Introduction
Most species are poorly studied, with the sum of knowledge often measured by the number of specimens in museums and other biological collections (Ponder et al., Reference Ponder, Carter, Flemons and Chapman2001; Roberts et al., Reference Roberts, Taylor and Joppa2016). This creates a challenge for conservation decision-makers in terms of how to allocate limited resources (Chadès et al., Reference Chadès, McDonald-Madden, McCarthy, Wintle, Linkie and Possingham2008). The greatest dilemma arises for those species whose persistence is uncertain, which could be either extinct, and thus conservation resources devoted to them would be potentially wasted, or extant, and most likely highly threatened, and therefore potentially in need of urgent conservation action (Elphick et al., Reference Elphick, Roberts and Reed2010).
Probably the most enigmatic and challenging of the poorly known species are those whose existence has been established based on a single specimen, potentially collected many years ago. This gives rise to two questions. Firstly, does the species still exist? Secondly, as often occurs with such species, is it really a so-called good species at all? Although we do not consider the second question further here, it raises a dilemma regarding the precautionary principle in conservation, and whether this also applies to the taxonomic status of a species: should an uncertain species be considered as a good species in case it turns out that it is, including those taxa which some consider to be merely synonyms?
Species known from a single record appear to be frequent in biological collections. For example, within the Global Biodiversity Information Facility database (GBIF, 2019), as much as 21% of all species in the database have only a single recorded occurrence (212,911 species; Troudet et al., Reference Troudet, Grandcolas, Blin, Vignes-Lebbe and Legendre2017). Several statistical methods have been developed to aid either the inference or dating of extinctions based on biological collections such as museum specimens, and sightings such as visual observations or oral accounts (see Solow, Reference Solow2005, for a review of methods). The term sighting refers to an observation of a taxon, whether it is a museum specimen or visual observation. However, as we focus here specifically on herbarium specimens, we use the term collection. These statistical methods have often been applied to species for which there are ≥ 5 collections (Solow, Reference Solow2005), although Solow & Roberts (Reference Solow and Roberts2003) proposed a non-parametric test that used only the last two collections or sightings. Here we examine the case of species that are only known from a single specimen and thus represent a challenge for the inference of extinction. We use the orchids (Orchidaceae) of the island of Madagascar as a case study. The orchid flora of Madagascar is relatively well-studied, with good taxonomic resolution, and has been the subject of intense surveys to document specimens held within herbaria (Hermans et al., Reference Hermans, Hermans, Du Puy, Cribb and Bosser2007). The high level of species (nearly 90%) and generic endemism also extends to other taxa on Madagascar (Hermans et al., Reference Hermans, Hermans, Du Puy, Cribb and Bosser2007; Cribb & Hermans, Reference Cribb and Hermans2010) and, with extensive habitat loss, the country is considered a threatened global biodiversity hotspot (Myers et al., Reference Myers, Mittermeier, Mittermeier, Da Fonseca and Kent2000).
Methods
Inference of extinction
Solow (Reference Solow1993) proposed one of the first statistical methods for the inference of extinction. Specifically the method tests for the likelihood of species persistence (p) in relation to the number of times a species has been recorded (n) within the whole observation period (T), where t n is the time of the last collection within the observation period:
In line with the discrete nature of most collection records (Solow, Reference Solow2005; Rivadeneira et al., Reference Rivadeneira, Hunt and Roy2009) and the method discussed below (McCarthy, Reference McCarthy1998), we treat time as discrete, with multiple collections allowed within each time unit. A discrete-time form of Solow equation was established by Burgman et al. (Reference Burgman, Grimson and Ferson1995):
where the time period (0,T) is partitioned into C T equally-sized time units and C e is the number of time intervals between the start of the observation period and the last collection (Burgman et al., Reference Burgman, Grimson and Ferson1995, Reference Burgman, Maslin, Andrewartha, Keatley, Boek, McCarthy, Scott and Burgman2000).
This gives rise to the question of when is the start of the observation period (t 0)? Often the first collection of the species is used as t 0, and therefore n reduces by 1. However, other events may be used to denote the start of the collection period, such as the first year of an annual survey or the year the first specimen was collected (Roberts & Solow, Reference Roberts and Solow2003). Accordingly, species known from a single specimen require some other event, rather than their first collection, to denote the start of the observation period. Here we take the first year with collections, 1830, as the start of the collection period. An alternative option to selecting a dataset-specific starting date would be to use a global standard starting date; e.g. 1753 or 1758 as the start of the Linnaean binomial nomenclature.
In case of species known from a single collection n = 1 and therefore p reduces to t n/T. In other words, for an extinction to be inferred at some significance level (i.e. P < 0.05), the time since the specimen was observed has to be > 19 times greater than the time prior to the specimen being collected. This is not dissimilar to the non-parametric method described by Solow & Roberts (Reference Solow and Roberts2003).
McCarthy (Reference McCarthy1998) modified equation (1) to take into account annual collection effort (e i), which we consider here to be the annual number of collected specimens of all species. The so-called partial Solow equation is based on the relationship between the collection effort prior to the last collection and the total collection effort (McCarthy, Reference McCarthy1998):
As in Solow (Reference Solow1993), when n = 1 McCarthy's (Reference McCarthy1998) equation becomes simply the proportion of the effort prior to the specimen being collected to the total effort over the whole collection period. As a result, extinction may be inferred once p decreases below a certain statistical threshold (e.g. P < 0.05). In cases where the threshold is set at P < 0.05, the amount of effort that was expended (in terms of the number of specimens collected) after the specimen was collected has to be > 19 times greater than the effort expended before the collection.
This method assumes that there is no systematic trend in the likelihood of detection over time, caused for example by a gradually declining population prior to extinction (Roberts et al., Reference Roberts, Elphick and Reed2010). We address the effect of a declining population abundance and detectability prior to extinction in the sensitivity analysis, described below.
Specimen data
Using the Orchids of Madagascar Checklist (Hermans et al., Reference Hermans, Hermans, Du Puy, Cribb and Bosser2007) as a baseline taxonomy, we used data compiled from 21 herbaria, using all available herbarium specimens for the region (Rivers et al., Reference Rivers, Taylor, Brummitt, Meagher, Roberts and Lughadha2011), including non-endemics, totalling 4,342 specimens. The dataset comprises species that were considered to represent distinct species based on current knowledge (Hermans et al., Reference Hermans, Hermans, Du Puy, Cribb and Bosser2007). As the analysis was made at the species level, specimens recorded at the subspecies and lower taxonomic levels were categorized as species for this purpose. We excluded all records identified as erroneous (i.e. erroneous observation year or collection locations outside Madagascar, n = 8), specimens identified as hybrids (n = 4) and duplicates that represented the same species collected at the same site on the same date (n = 22). In addition, three specimens collected before 1830 (1682–1695) were excluded as they were temporally remote from the rest of the collections. A further 63 specimens collected after 2000 were also removed, to avoid the bias of the time taken for specimens to be incorporated into collections and the period taken to compile the dataset. The first collection (a specimen from 1830) was also excluded and used as the beginning of the collection period. The final dataset comprised 4,241 specimens, representing 762 species from 57 genera, collected over a 170-year period (1831–2000).
Estimating collection effort over time using museum specimens
One of the assumptions when using specimen collections as a measure of effort is that the species composition should remain constant over the collection period (i.e. there should be no significant changes in the number of species or their relative abundances over time). To account for a lack of information on the status, decline and potential extinction of species in the dataset, we used three subsets of collections as measures of collection effort (Fig. 1b): (1) all collections of all species (n = 4,241), (2) all collections of only the 10% most collected species, assuming that such species were less likely to be in decline or extinct and also better represent general effort (n = 2,018), and (3) all collections of species identified by the optimal linear estimator (OLE) method for the inference of extinction (Roberts & Solow, Reference Roberts and Solow2003; Solow, Reference Solow2005) as being extant in 2000 (n = 2,828). Each species was assessed separately, and the standard significance threshold was used to infer extinction (P < 0.05). The third approach was applied only to species with ≥ 5 collections (Solow, Reference Solow2005), with all other species (i.e. with < 5 collections) excluded from the dataset (513 species, 963 records). This method was applied using no more than the 10 most recent collections for each species (Rivadeneira et al., Reference Rivadeneira, Hunt and Roy2009), which excluded a further 37 species (450 records) from the dataset.
Using the dataset of Malagasy orchid specimens as a measure of effort, we assessed the extinction likelihood of the 236 species known from a single specimen (Supplementary Material 1). Extinction was inferred by the McCarthy (Reference McCarthy1998) method (equation 3), with α = 0.05. For species inferred by at least one of the three measures of effort as likely to be extant in 2000, we also tested whether the critical threshold (P < 0.05) would be reached by 2018 based on the expected annual collection effort (i.e. the annual number of collected specimens of all species) during 2001–2018. Collection effort for this period was estimated by using the effort observed over the last decade of the collection period (1991–2000), and the lower (mean – SD), mean and upper (mean + SD) collection effort estimate. A t test of the slope of the regression line over 1991–2000 indicated a lack of any trend over this period (P > 0.05); i.e. collection effort did not significantly differ from the previous decade. Although the method assumes uniform collection effort over the studied area, we did not conduct spatial analysis of effort. Supplementary Material 2 provides the R 3.4.3 (R Core Team, 2017) script for testing extinction of species known from a single specimen and for estimating the collection effort required to confirm extinction.
Sensitivity analysis
To test and compare the performance of the applied method (equation 3) under different extinction and sampling scenarios, we applied the method described by Rivadeneira et al. (Reference Rivadeneira, Hunt and Roy2009). A series of artificial collection records were generated and simulated in R. A total of 54 scenarios were simulated that differed based on the collection effort intensity and trend, collection period duration before the time of extinction, and the type of extinction. Simulated time of extinction was set to occur either 20, 50 or 100 years after the beginning of the collection period, and the collection effort was allowed to continue for up to 10,000 years, to provide sufficient amount of post-extinction effort for extinction to be inferred. Scenarios were developed with two processes of extinction (instant vs gradual), three levels of collection effort (10, 20 or 40 specimens/year in the initial year) and three collection effort trends (stable, increasing or decreasing effort over time). In scenarios with instant extinction, species collection probability remained the same over time until the time of extinction, whereas in gradual extinction scenarios it declined linearly over time. Stable collection effort was represented by a constant number of specimens collected over time, and the increasing and decreasing collection effort trends were represented by 1% annual rate of change in the collected number of specimens. Decreasing collection effort was allowed to drop to the minimum value of one collected specimen per year, after which it remained constant in all subsequent years. Target species collection probability was approximated in such a way as to obtain on average only a single specimen over the whole collection period within each simulation, and the simulations were repeated until 10,000 simulations with a single species collection were obtained for each scenario.
The upper bounds of a 95% confidence interval (CI) of the method were evaluated based on their statistical coverage (i.e. on the appropriateness of the 95% confidence intervals). An accurate and reliable method should have exactly 95% of simulated extinctions falling within the estimated CI (Rivadeneira et al., Reference Rivadeneira, Hunt and Roy2009). Lower coverage indicates tendency of the method to underestimate the true time of extinction (i.e. Type I error), and coverage above 95% indicates overly conservative predictions and the tendency to overestimate time of extinction (Type II error). For more information on this approach, see Rivadeneira et al. (Reference Rivadeneira, Hunt and Roy2009). The difference between the time of the last collection and the estimated CI was also assessed for each scenario.
Results
The final dataset of Malagasy orchids comprised 4,342 specimens with a median value of 13 specimens collected annually (Fig. 1b), 1–70 specimens per species, with a median value of 3, and 236 species that were represented by a single collection (Fig. 1a). Results based on equation 3 indicated nine of the 236 Malagasy orchid species could have been extinct by 2000 (Table 1). Six of those species were inferred as extinct (P < 0.05) by all three measures of collection effort, and three were inferred by the approach using OLE as potentially still extant by 2000, but likely extinct by 2002–2005 if no further observations were made by that time. Two additional species were inferred as potentially extinct by 2018 in the scenario of no further collections by that time and higher estimates of the annual collection effort over that period.
1 The year in which p was reduced below the 0.05 threshold, estimated by assuming that the period after 2000 was characterized by similar collection effort as that during 1991–2000 (range estimated as a mean ± SD).
2 Records of all orchid species used as a measure of collection effort.
3 Records of top 10% most collected species used as a measure of collection effort.
4 Records of species estimated as extant by 2000 (OLE, optimal linear estimator; Roberts & Solow, Reference Roberts and Solow2003) used as a measure of collection effort.
For more recently observed species known from a single collection, substantial collection effort would be needed to reach a decision on persistence, with the number of required collections being 583–78,900 specimens. We also tested the effect of an earlier starting year of collection record. However, since the period preceding 1830 was characterized by negligible collection effort, with only three specimens collected, during 1682–1695, choosing an earlier starting year did not change the results.
Results of the sensitivity analysis (Figs 2 & 3) suggest that the McCarthy method performs well when applied to species known from a single collection. It produced near perfect overall coverage (i.e. very close to 0.95), with a mean value of 0.953 across all scenarios (0.899–1.000). Among the scenarios assessed, it provided the most accurate predictions for those with an instant extinction, longer collection periods prior to extinction (i.e. 50 or 100 years) and either increasing or stable collection effort over time (Fig. 2). It was more prone to underestimate the true time of extinction in scenarios with a gradual extinction, and shorter collection periods prior to extinction time and decreasing collection effort increased its tendency to overestimate extinction. Results of the sensitivity analysis were sensitive to overall changes in collection effort over time but not to different levels of the overall collection effort, because collection effort was used to standardize annual collection probabilities in simulations. Consequently, Figs 2 and 3 present results only for the mid-level collection effort used in simulations (i.e. 20 individuals/year).
Discussion
Of the 236 species of Malagasy orchids known from a single specimen, the McCarthy method inferred the extinction of nine of these species by 2000, and a further two by 2018 assuming collection effort remained constant and the absence of new collections of these single-specimen species (5% of the assessed species in total). Sensitivity analysis indicated good accuracy and reliability of the McCarthy method, and that it was not prone to Type I errors (Rivadeneira et al., Reference Rivadeneira, Hunt and Roy2009; Fig. 2). On the other hand, it tended to be more conservative, and required continued collection efforts that seem to be unreasonably long from a management perspective (i.e. spanning centuries with the current levels of collection effort; Fig. 3). When species are known from only a single specimen, the start of the collection period has to be used, as opposed to the first collection of a species, commonly used for species characterized by more collections. In this case, the start of the collection period was 1830, the year of the first collection of any orchid species in the region, except for the three specimens collected during 1682–1695.
Increasing accessibility of museum specimens through digitization of collections will aid the assessment of threat status, particularly for data-poor species. However, a substantial number of specimens commonly exist within herbaria that await incorporation into main collections and digitization, in particular recently collected specimens. Specimens can also be inaccessible if they are held by researchers prior to digitization. As a result, although the dataset used here is one of the most comprehensive for Malagasy orchids, specimens are likely to exist that were not available to us. As such, we aimed to illustrate the issues around the inference of extinction, rather than make specific declarations about individual species.
Species are still being discovered and, at the point of discovery, they are generally known from a single specimen (Roberts et al., Reference Roberts, Taylor and Joppa2016). If they were not to be recollected, substantial effort would be necessary to reach a decision on continued species presence. Given that specimen collections in the database we used took place for 170 years (1831–2000) it would take 3,230 years before extinction could be inferred or, in the case of specimen collections and the projections based on the mean collection effort during 1991–2000, it would take up to 1,612 years to infer extinction. On the other hand, if the aim is to establish whether or not these species are likely to be extinct within a more reasonable time frame of 100 years, the method would require annual collection effort of 124–789 specimens. Based on the average collection effort observed during the last decade with sightings (1991–2000), it would be necessary to increase current collection efforts by 3–21 times, depending on the species. Deciding a course of action for species known only from a single specimen is therefore a significant challenge to conservation practitioners, as on the one hand such species may be amongst the most threatened species, and on the other hand they may be already extinct. Of the 236 species of orchids we examined, nine could be inferred as extinct by 2000, depending on the method used to estimate collection effort (Table 1). If we project the effort to the present time (2018 for this purpose) and no new collections were made, an additional two species would be inferred as extinct.
The McCarthy method was prone to underestimate the true time of extinction in scenarios with gradual extinction, whereas shorter collection periods prior to extinction and decreasing collection effort increased its tendency to overestimate extinction. As a result, when using the methods presented here an understanding of underlying trends can help supplement the findings. In the case of species known from a single specimen, these methods should only be considered as one line of evidence. Declarations of extinction should not be made solely on the basis of these findings. All species inferred as extinct would require, at least, an assessment of the habitat in which their specimens were collected, to determine whether there is a possibility they still exist.
Acknowledgements
We thank Phillip Cribb, Clare Hermans, Johan Hermans and David Houghton for compiling the data and providing access, and two anonymous reviewers for their helpful comments. IJ's research was supported by the J.E. Purkyně Fellowship of the Czech Academy of Sciences, Alexander von Humboldt Foundation, Federal German Ministry for Education and Research, and Ministry of Education, Science and Technological Development, Republic of Serbia (grant number 173045).
Author contributions
Study concept and dataset preparation: DLR; data analysis and sensitivity analysis: IJ; writing: DLR, IJ.
Conflicts of interest
None.
Ethical standards
This research abided by the Oryx guidelines on ethical standards, and did not involve human subjects, experimentation with animals or collection of specimens.