Introduction
Monitoring techniques are crucial for the conservation of species. Many field-based monitoring techniques, such as point counts, mist-netting and transect surveys (Ralph et al., Reference Ralph, Droege, Sauer, Ralph, Sauer and Droege1995; Dunn & Ralph, Reference Dunn and Ralph2004) are labour- and expertise-intensive, but the use of passive acoustic monitoring with automated recording units is gaining momentum (Teixeira et al., Reference Teixeira, Maron and Rensburg2019) for monitoring a variety of taxa. Recorders can be programmed and deployed for long periods without human intervention, ensuring high research productivity (Shonfield & Bayne, Reference Shonfield and Bayne2017). The acoustic signals of many species can be detected at greater distances than visual cues because they have evolved for use in breeding, foraging and alarm calling (Laiolo, Reference Laiolo2010), and with species-specific acoustic signatures passive acoustic monitoring can be ideal for monitoring many species (Brandes, Reference Brandes2008).
Automated recording units have been used for monitoring several threatened bird species (Williams et al., Reference Williams, O'Donnell and Armstrong2018; Dema et al., Reference Dema, Towsey, Sherub, Sonam, Kinley and Truskinger2020; Leseberg et al., Reference Leseberg, Venables, Murphy and Watson2020; Vu & Van Tran, Reference Vu and Van Tran2020), in particular cryptic birds (Frommolt & Tauchert, Reference Frommolt and Tauchert2014; Zwart et al., Reference Zwart, Baker, McGowan and Whittingham2014; Bobay et al., Reference Bobay, Taillie and Moorman2018; Pérez-Granados et al., Reference Pérez-Granados, Bota, Giralt and Traba2018) and those that call infrequently (Goyette et al., Reference Goyette, Howe, Wolf and Robinson2011). The success of passive acoustic monitoring primarily involves a robust combination of hardware (recorders) and detection software, often specific to a species–habitat configuration. In this study we used automated recording units to create a replicable detection framework for the Critically Endangered Jerdon's courser Rhinoptilus bitorquatus, a nocturnal and cryptic bird, endemic to the Eastern Ghats of India (BirdLife International, 2017).
Hardware considerations are critical for passive acoustic monitoring. This includes considerations of the recording radius of automated recording units, microphone sensitivity (Darras et al., Reference Darras, Batáry, Furnas, Grass, Mulyani and Tscharntke2019) and the nature of the habitat (Pérez-Granados et al., Reference Pérez-Granados, Bota, Giralt, Albarracín and Traba2019). It is also essential to have custom-built acoustic algorithms to detect the calls of the target species (Knight et al., Reference Knight, Hannah, Foley, Scott, Brigham and Bayne2017) and reject vocalizations of co-occurring species (Priyadarshani et al., Reference Priyadarshani, Marsland and Castro2018). This is particularly useful when there are many hours of recordings and it is impractical to examine them manually. Thus a species- and habitat-specific monitoring framework is required, especially for conservation purposes.
Jerdon's courser is one of the rarest birds (BirdLife International, 2017). After being considered extinct, without sightings since 1900, it was rediscovered in 1986 (Bhushan, Reference Bhushan1986) but has not been detected since 2008. Jerdon's courser prefers a type of scrub forest that is under great anthropogenic pressure (Jeganathan et al., Reference Jeganathan, Rahmani, Green, Norris, Vogiatzakis, Bowden and Pain2008). The species was previously detected using labour-intensive methods such as night searches (Bhushan, Reference Bhushan1994), tracking strips and camera traps (Jeganathan et al., Reference Jeganathan, Green, Bowden, Norris, Pain and Rahmani2002). The detection methods used involved listening for spontaneous calls in places where Jerdon's courser is known to have occurred, and walking transects while listening for calls in response to playback of the species’ calls. But the rate of spontaneous calls and the response rate to call playback was poor (Jeganathan, Reference Jeganathan2006). So far, Jerdon's courser has been recorded only in a small area of scrubland in the Sri Lankamalleswara Wildlife Sanctuary. Because of its Critically Endangered status and highly restricted range, we conducted this study to develop a site-specific detection framework using automated recording units and sound analysis software. We did this first by testing potentially suitable commercially available recorders, and then by creating and testing a call detection analysis pipeline using two commonly available sound analysis programmes. For the latter, we provide a flowchart to validate detections.
Study area
We conducted this study in the 464 km2 Sri Lankamalleswara Wildlife Sanctuary in Andhra Pradesh, India, which comprises scrubland type habitat considered suitable for Jerdon's courser (Jeganathan et al., Reference Jeganathan, Green, Norris, Vogiatzakis, Bartsch and Wotton2004; Fig. 1).
Methods
Call characteristics
Jerdon's courser has only one known call type, a di-syllabic kwick-koo repeated in a series (Jeganathan & Wotton, Reference Jeganathan and Wotton2004). The only known recordings of Jerdon's courser were obtained by PJ on 11 occasions during June 2001–November 2002, at dusk, from the Wildlife Sanctuary (Fig. 1). The calls were recorded from three locations, most likely from multiple individuals, at a distance of 10–100 m from the birds. These calls were recorded using analogue tapes and converted to digital format by the Macaulay Library, Cornell University (accession numbers 274533–43).
To check for variability in Jerdon's courser calls, we analysed 145 clear di-syllabic calls from the 312 calls available in the recordings (faint calls were excluded). The call has a fundamental note at the lowest frequency between mean 654.55 ± SD 54.68 Hz and highest frequency between mean 1,238.05 ± SD 56.48 Hz with two visible harmonics above this (Fig. 2). The second harmonic appears to have the highest energy and can be heard at the farthest distance (see attenuation experiment below). The mean call duration is 0.4667 ± SD 0.042 s with a mean inter-syllable-gap of 0.090 ± SD 0.027 s (Fig. 2). A spectral cross-correlation matrix was created for each of the two syllables using the batch correlator function in Raven Pro 2.0 beta (Center for Conservation Bioacoustics, Cornell University, Ithaca, USA). Normalized spectral cross-correlation values range from 0, denoting least similarity, to 1, denoting the maximum possible similarity between two notes (Charif et al., Reference Charif, Waack and Strickman2010).
Field protocol
Attenuation experiment
We conducted an attenuation experiment using Jerdon's courser playback calls to determine the grid size for the deployment of the recorders (Supplementary Material 1, Supplementary Fig. 1). We used four types of commercial recorders: SongMeter4 (Wildlife Acoustics, Maynard, USA), Swift (Cornell Center for Conservation Bioacoustics, Cornell University, Ithaca, USA), Rugged Swift (modified Swift in a Pelican casing, Torrance, USA), and AudioMoth (Open Acoustics, Eastleigh, UK). We detected the call at up to 700 m, using all recorders, and conservatively fixed the field sampling grid cell size at 1 × 1 km.
Deployment
We created a grid of 1 × 1 km cells along the eastern boundary of the Sanctuary using a preliminary habitat suitability map (Jeganathan, Reference Jeganathan2006). We deployed 17 recorders (Swift, Rugged Swift and SongMeter4), one at the centre of each selected grid cell, and included all locations where Jerdon's courser had previously been recorded. We conducted four recording cycles (c. 30 days each, after which batteries in all recorders were replaced) during November 2019–March 2020 as the bird is thought to be most vocal during this period (Jeganathan & Wotton, Reference Jeganathan and Wotton2004). All recorders were randomly distributed across the grid. We deployed two AudioMoth recorders on the same tree as other recorders, for further testing (Supplementary Table 1); data from these were not used in the analysis (but see below). As Jerdon's courser is known to vocalize within an hour after sunset (Jeganathan, Reference Jeganathan2006), we recorded continuously during 17.00–06.00 (with a 48,000 Hz sampling rate, in wav format) on 17 recorders × 4 cycles, giving 68 recording instances (one recording instance is c. 290 h of data collected from one recorder per battery cycle). We also analysed an additional dataset of 83 h from an AudioMoth recorder that malfunctioned and recorded only daytime calls.
Call detection analysis pipeline
Creating a recognizer
A recognizer is an algorithm to detect a single specific category of sounds. Recognizers can operate either via two steps (first detect a broad range of sounds, then classify those) or in a single step (only detect sounds that match a template) (Knight et al, Reference Knight, Hannah, Foley, Scott, Brigham and Bayne2017). To build a detector (an algorithm to detect sounds of potential interest in a continuous recording), we considered only the frequency band of the second harmonic of the call, which ranges from a mean minimum of 1307.25 ± SD 76.68 Hz to a mean maximum of 2375.65 ± SD 89.89 Hz (Fig. 2). We chose the second harmonic as the first and third harmonic attenuated faster at increasing distances from the source (Supplementary Fig. 1). For the call detection we created an analysis pipeline with Raven Pro and Kaleidoscope (Wildlife Acoustics, 2019).
In Raven Pro the template detector function works on the principle of spectrogram cross-correlation (Ulloa et al., Reference Ulloa, Gasc, Gaucher, Aubin, Réjou-Méchain and Sueur2016). Calls most similar to the template are detected based on a given threshold cut-off (Knight et al., Reference Knight, Hannah, Foley, Scott, Brigham and Bayne2017) and use certain specific spectrogram parameters (Hann spectrogram window = 512 samples, discrete Fourier transform size = 512, hop size = 256 samples, grid spacing = 93.8 Hz, overlap = 50%). For Raven Pro we chose two clear and loud FoxPro Predator speaker (FOXPRO, Lewistown, USA) playback (from our attenuation experiment) di-syllabic Jerdon's courser calls as a template and used a selection around each syllable as an independent template, resulting in four templates. We chose the playback version as the signal-to-noise ratio was better in the playback call and it was recorded on an automated recording unit. The templates were used in the template detector function of Raven Pro on a training dataset of 160 Jerdon's courser calls. The template detector settings (frequency range = ± 5 Hz, threshold value = 0.55) were set to maximize the number of true positives.
In Kaleidoscope, detection of target species works on K-means clustering using the hidden Markov model in which similar sounds are clustered together (Pérez-Granados & Schuchmann, Reference Pérez-Granados and Schuchmann2020). In the first step, we conducted a basic cluster analysis on a training dataset containing 119 Jerdon's courser di-syllabic calls and calls of co-occurring species (common hawk-cuckoo Hierococcyx varius and Jerdon's nightjar Caprimulgus atripennis, using the co-occurring species-elimination flowchart, as explained below). We used particular settings (fast Fourier transform size = 512, maximum distance from cluster centre = 0.1, maximum states = 12, maximum distance to cluster centre = 0.5 and maximum clusters = 10) and fixed signal detection parameters (frequency band = 1,500–2,300 Hz, detection length = 0.05–0.1 s, inter-syllable gap = 0.35 s) that resulted in clusters of the target Jerdon's courser calls and non-target calls of the co-occurring species. Calls were then manually annotated and the resulting file was used as a template to rescan the training data via a second round of cluster analysis. The second round of clusters was then manually re-validated and re-annotated to create the recognizer.
Assessing the performance of the recognizer
The efficacy of the recognizers created with Raven Pro and Kaleidoscope were evaluated on a test dataset by calculating three performance metrics (Knight et al., Reference Knight, Hannah, Foley, Scott, Brigham and Bayne2017; Priyadarshani et al., Reference Priyadarshani, Marsland and Castro2018):
An F beta score balances precision and recall. As the target species is a rare bird with an infrequent calling pattern, we wanted to minimize the false negatives and therefore set the β value = 2, to prioritize recall (Knight et al., Reference Knight, Hannah, Foley, Scott, Brigham and Bayne2017). We created a synthetic test dataset using 10 h of recordings overlaid with 415 di-syllabic Jerdon's courser calls (from Macaulay Library (Cornell University, accession numbers 274533–43) and our FoxPro playback from our attenuation experiments, as described in Supplementary Material 1).
Screening automated recording unit data
To segregate calls of Jerdon's courser from those of co-occurring species, we used a species elimination flowchart (Fig. 3) that included using eBird (2021) checklists for the region and our software recognizer. In the first step, we obtained from eBird an up-to-date list of all bird species present in Sri Lankamalleswara Wildlife Sanctuary and segregated species that vocalized within the call frequency band of Jerdon's courser. We then used the Raven Pro and Kaleidoscope recognizers to screen the collected recorder data with our recognizers. All detections were manually cross-checked both visually and aurally (by VJ and CA) with species’ calls from the eBird shortlist. Detected calls were compared with recordings of shortlisted species available on eBird and Xeno-canto (Xeno-canto, 2021). In the case of unclear detections, spectrogram audio segments of 60 s before and after the detections (to examine the context of the occurrence of the call) were taken into consideration to identify the species (by PJ and VJ). Detections were categorized into known and unknown bird species. The unknown calls were sent to 11 experts in bird acoustics who are familiar with the calls of birds of this region or South India, in the form of a questionnaire for species identification. The questionnaire consisted of videos of a moving spectrogram of the unknown calls (Supplementary Material 2). The experts were asked to rate each call on a five-point Likert scale that ranged from being least likely to most likely to belong to Jerdon's courser. For the calls identified as most likely to belong to Jerdon's courser, a spectral cross-correlation was performed in Raven Pro to quantify the similarity between the detected call and the known Jerdon's courser di-syllabic call.
Results
Performance metrics of the recognizers
The Raven Pro recognizer gave us the best results on the test dataset, with precision = 0.341, recall = 0.490 and F beta score = 0.451. The recognizer created with Kaleidoscope resulted in precision = 0.116, recall = 0.340 and F beta score = 0.245. Higher recall values indicate fewer false negatives. In our F beta score, the β values are set to favour recall over precision and higher scores indicate a greater minimization of false negatives.
Detection of Jerdon's courser call
We obtained 24,432 h of recordings. Data from four recording instances, three from the first recording cycle and one from the second cycle, were not available as a result of hardware technical issues.
In these recordings we detected 2,384,404 putative Jerdon's courser calls with Raven Pro, and 5,167,561 with Kaleidoscope. The majority of false positives were of co-occurring species that vocalize in the same bandwidth as Jerdon's courser (Supplementary Table 2), including the common hawk-cuckoo, Jerdon's nightjar and the Indian nightjar Caprimulgus asiaticus, which were segregated using our species elimination analysis pipeline (Fig. 3). We did not detect any Jerdon's courser di-syllabic vocalizations that unambiguously matched the known vocalization.
However, we shortlisted four calls that were similar to that of Jerdon's courser (Supplementary Fig. 2) and presented these to the 11 experts. One of these calls was identified as most likely to belong to Jerdon's courser by eight of the 11 experts. To check for similarity in spectral features of this call with that of Jerdon's courser we calculated spectral cross-correlation coefficients (Supplementary Table 3). We found that the first note of this call and that of Jerdon's courser had the highest similarity (cross-correlation score = 0.591), and note three of this call was most similar to note two of Jerdon's courser (cross-correlation score = 0.534). Both cross-correlation scores were within the range of the cross-correlation scores among PJ's field recordings of Jerdon's courser.
The context of this call was further examined manually by screening 1 h of recordings before and after the call. This call was recorded accidentally (as a result of incorrectly synchronized recorder clock settings), which resulted in recordings during the day (at 09.30) rather than at night. We found calls of other species (bay-backed shrike Lanius vittatus, Indian thick-knee Burhinus indicus and red-vented bulbul Pycnonotus cafer) in close succession to this putative Jerdon's courser call (Supplementary Fig. 3). The full call of Jerdon's courser was not detected in this recording, and we believe the call was potentially a series of mimicked calls by the bay-backed shrike (discussed below).
Discussion
We used bioacoustics to create a detection framework to monitor the Critically Endangered, cryptic Jerdon's courser, the last confirmed occurrence of which was in 2008. The framework comprised a field protocol using automated recording units and an analysis protocol using two brands of sound analysis software.
Recommended field protocol
We found the Rugged Swift recorders to be the most suitable for long deployments (> 2 months on a 13-h nocturnal recording schedule). This is because they can hold 12 D-cell batteries, whereas the other tested recorders can carry only up to four D-cell batteries (Supplementary Table 1). Although we did not detect Jerdon's courser, we recommend retaining the grid size of 1 × 1 km for recorder deployment, based on our attenuation study, and expanding the sampling to surrounding viable habitats for this species (Supplementary Fig. 4). Differences in vegetation cover and wind direction can cause differences in attenuation properties (Yip et al., Reference Yip, Bayne, Sólymos, Campbell and Proppe2017; Thomas et al., Reference Thomas, Speldewinde, Roberts, Burbidge and Comer2020) and may contribute to the limitations of our study as a result of variability in the amplitude of natural calls. There could be differences between attenuation of calls from a live bird and that of playback calls; however, recording on a grid should detect the species given that it is likely to range across multiple grid cells.
Recommended software protocol
Both Raven Pro and Kaleidoscope have been used previously to create recognizers, with variable success rates (Knight et al., Reference Knight, Hannah, Foley, Scott, Brigham and Bayne2017). Their user-friendly interface does not require any prior programming knowledge, making it accessible to conservation management agencies and non-technical personnel. Our comparison of simulated data indicated that the recognizer created with Raven Pro performed better, with a higher F beta score and a higher recall. This is similar to comparisons for other species (Knight et al., Reference Knight, Hannah, Foley, Scott, Brigham and Bayne2017). The stereotypic short calls of Jerdon's courser appear to be more efficiently detected with spectrogram cross-correlation methods used by Raven Pro (Ulloa et al., Reference Ulloa, Gasc, Gaucher, Aubin, Réjou-Méchain and Sueur2016) compared to the K-mean clustering method used by Kaleidoscope (Joshi et al., Reference Joshi, Mulder and Rowe2017).
Previous studies have also observed variable performance between manual and automated recognizers (Swiston & Mennill, Reference Swiston and Mennill2009; Digby et al., Reference Digby, Towsey, Bell and Teal2013; Sidie-Slettedahl et al., Reference Sidie-Slettedahl, Jensen, Johnson, Arnold, Austin and Stafford2015). The choice of method appears to depend on the target species (Joshi et al., Reference Joshi, Mulder and Rowe2017). In an automated approach, the most common challenges are the high number of false positives as a result of weak signals (Joshi et al., Reference Joshi, Mulder and Rowe2017), the absence of a good initial template, and any similar-sounding sympatric species (Schroeder & McRae, Reference Schroeder and McRae2020). In our study, although the number of false positives was high, following a systematic analysis pipeline (Fig. 3) greatly reduced the manual screening time and was less time-consuming than screening the entire set of audio files, as also reported by Schroeder & McRae (Reference Schroeder and McRae2020). Automated classification approaches such as random forests (Ross & Allen, Reference Ross and Allen2014) can sometimes reduce the number of false positives that need to be manually reviewed but we did not examine this possibility.
We detected one call that both our analysis and experts indicated was similar to that of Jerdon's courser. However, we considered the following additional points: (1) the putative call was during the day, (2) calls of the bay-backed shrike (a known mimic) were recorded at a similar time as the putative call, (3) there were also calls of multiple species in close succession to the putative call with similar amplitudes, (4) there was no other instance of the putative call in the sequence, and (5) the notes of the putative call had some degree of similarity in their spectral cross-correlation scores compared to those of the Jerdon's courser call yet they were not identical (potentially a modification of the call by the mimic of the model; Zollinger & Suthers, Reference Zollinger and Suthers2004). Jerdon's courser is nocturnal and calls were previously heard or recorded only during the night and on a few occasions at dusk. Their calls are also repetitive, with multiple calls in rapid succession. We believe therefore that the putative call is likely to be mimicry of a Jerdon's courser call by a bay-backed shrike, a known mimic (Yosef et al., Reference Yosef, International Shrike Working Group, de Juana, del Hoyo, Elliott, Sargatal, Christie and de Juana2020). Mimics can often sound similar to the model in acoustic characteristics, and are difficult to discern without context (Agnihotri et al., Reference Agnihotri, Sundeep, Seelamantula and Balakrishnan2014). In the absence of additional recordings, it is difficult to assess definitively whether the detected call was of Jerdon's courser or of a mimic. Nevertheless, any mimicry would indicate exposure of the mimic to the model, either directly or culturally. Although mimicry can confound detections (Crisoglo et al., Reference Crisologo, Joshi and Barve2017), in the case of an endemic bird with a restricted range mimicry can be a clue for its detection. The presence of Jerdon's courser in this location cannot therefore be either confirmed or ruled out without more extensive year-round automated recording.
Our attempt to detect Jerdon's courser was based on the premise that it has only one call. Additional information on the call diversity of the species would potentially improve the ability to detect it. The most conclusive evidence for the occurrence of Jerdon's courser comes from camera traps (Jeganathan et al., Reference Jeganathan, Green, Bowden, Norris, Pain and Rahmani2002), which could be used to narrow down the location where the species occurs following any detection with automatic recording units. Although a comparison of the probability of detection with camera traps and acoustic recorders cannot be made without additional information, acoustic recorders typically have much larger detection areas than cameras (unless the species has a very low-amplitude call).
Our species-specific acoustic detection framework could be scaled up and implemented by conservation agencies for the detection and long-term monitoring of this Critically Endangered bird. Knowledge of the local bird community is important for such research, as behaviours such as mimicry by other species need to be considered. Local enthusiasm for the conservation of Jerdon's courser appears to be high, as local newspaper reports erroneously carried stories that the species was detected (Supplementary Fig. 5), based on our study. We recommend an expanded acoustic survey for the species across all potentially suitable habitats. The acoustic monitoring protocols described here could be adapted to monitor other threatened, cryptic species whose vocalizations have been previously recorded.
Acknowledgements
We thank the Andhra Pradesh Forest Department, including D. Nalini Mohan, Indian Forest Service (Principal Chief Conservator of Forests Wildlife), K. Gopinatha, Indian Forest Service (Additional Principal Chief Conservator of Forests), M. Siva Prasad, Indian Forest Service (District Forest Officer Kadappa), K. Prasad (Regional Forest Officer Siddavatam), Subhash Reddy (Regional Forest Officer Badvel), Ramana, Rajendra Prasad and Yogesh Pasul, for logistical support in Sri Lankamalleswara Wildlife Sanctuary; Mike Prince of the British Trust for Ornithology, UK, for lending us AudioMoth recorders; Matthew Meddler for archiving Jerdon's courser calls in the Macaulay Library; Nandini Rajamani and C. K. Vishnudas for additional logistical support; the RSPB for supporting PJ; the 11 experts (Ashwin Viswanathan, Chris Bowden, Mike Prince, Prashanth M.B., Praveen J., Rajah Jayapal, Ramit Singal, Rhys Green, Samira Agnihotri, Simon Wotton and T. R. Shankar Raman) who reviewed the unknown calls; staff of the K. Lisa Yang Center for Conservation Bioacoustics, Cornell University, Nature Conservation Foundation Mysuru, and the Ecology and Evolution Labs at Indian Institute of Science Education and Research Tirupati for their comments; and the volunteers from Indian Institute of Science Education and Research Tirupati (Amrutha Rajan, Anway Sarkar, Meera M., Senan D'Souza, Raja Bandi and Vinay K.L.) for help with the field surveys. This study was funded by Wildlife Conservation Trust—Small Grants (2019–2020 Phase I), Indian Institute of Science Education and Research Tirupati, Nature Conservation Foundation, and Science and Engineering Research Board (SB/S3/EEE/296/2016 dated 10 January 2017).
Author contributions
Conceptualization and design: CA, VJ, PJ, VVR; data collection and analysis: CA, VJ, with input from RC, PJ, VVR; writing: CA, with input from all other authors.
Conflicts of interest
None.
Ethical standards
This research abided by the Oryx guidelines on ethical standards.