Impact Statement
The global increase in forest dieback threatens vital ecosystem services, including habitat provision and carbon sequestration. Traditional field and drone-based monitoring techniques are not cost-effective at large scales. We leverage advancements in low-cost drones and deep learning to assess crown dieback from drone-captured RGB aerial data, eliminating the need for costly tools like LiDAR and laborious fieldwork. We achieve high accuracy in matching predicted crown footprints with actual forest inventory, demonstrating the effectiveness of these methods for non-expert use in conservation. Our vegetation index-based dieback estimates aligned well with expert field-based estimates, validating this approach. This work enhances forest monitoring by improving coverage, accuracy and speed, and reducing costs, thereby providing a promising tool to address large-scale conservation challenges.
1. Introduction
Forest dieback, the unseasonal loss of crown foliage or mortality of many trees (Mueller-Dombois, Reference Mueller-Dombois1988), is an early indicator of declining health in forest ecosystems. This degradation directly affects ecosystem services and functions such as carbon sequestration (Baccini et al., Reference Baccini, Walker, Carvalho, Farina, Sulla-Menashe and Houghton2017) and habitat provision (Watson et al., Reference Watson, Evans, Venter, Williams, Tulloch, Stewart, Thompson, Ray, Murray, Salazar, McAlpine, Potapov, Walston, Robinson, Painter, Wilkie, Filardi, Laurance, Houghton and Lindenmayer2018). Various factors, including the spread of pests, pathogens, and the intensification of drought conditions due to climate change, contribute to the increasing prevalence of crown dieback around the world (Carnicer et al., Reference Carnicer, Coll, Ninyerola, Pons, Sánchez and Peñuelas2011; Liebhold et al., Reference Liebhold, McCullough, Blackburn, Frankel, Von Holle and Aukema2013; Senf et al., Reference Senf, Buras, Zang, Rammig and Seidl2020). These changes in forest health and structure can result in ecologically significant shifts in composition and function over relatively short time periods (Allen et al., Reference Allen, Breshears and McDowell2015), which can be difficult to detect using traditional sampling strategies (McMahon et al., Reference McMahon, Arellano and Davies2019).
The global increase in dieback and its potential impacts on ecosystem services and functions highlight the urgent need for large-scale, high-frequency monitoring. Large-scale, high-frequency monitoring of forest health and structure would enhance our ability to detect issues such as pest outbreaks, disease, or stress caused by climate change (McMahon et al., Reference McMahon, Arellano and Davies2019) at early stages—which would, in turn, support better-informed decision-making and policy development related to forest management, climate change mitigation, and biodiversity conservation (Asner and Martin, Reference Asner and Martin2016). Moreover, large-scale monitoring techniques enable practitioners to better understand the factors driving dieback, informing more effective and targeted management strategies, and increasing resilience to these threats before they emerge. The instruments and methods to gather and process data for large-scale, rapid monitoring at the level of individual trees have, however, only been recently developed (Diez et al., Reference Diez, Kentsch, Fukuda, Caceres, Moritake and Cabezas2021; Kattenborn et al., Reference Kattenborn, Leitloff, Schiefer and Hinz2021; Ecke et al., Reference Ecke, Dempewolf, Frey, Schwaller, Endres, Klemmt, Tiede and Seifert2022).
Remote sensing can provide high-resolution spatiotemporal data coverage to meet these needs. Leaf reflectance, including within the visible spectrum, is sensitive to changes in plant physiology, chemistry, and structure due to insect and pathogen infestation as well as water stress (Huang et al., Reference Huang, Anderegg and Asner2019). Such changes in leaf reflectance are observable in the canopy from remotely sensed data (Huang et al., Reference Huang, Anderegg and Asner2019) so high-resolution satellite imagery can likely detect defoliation missed by field-based surveys—for example, in low-severity areas (Bright et al., Reference Bright, Hudak, Egan, Jorgensen, Rex, Hicke and Meddens2020). Many existing approaches, however, rely on large spatial averages of stress indicators—at the approximately 10 m spatial scale of common non-commercial satellite data products such as Sentinel 1/2 (Lastovicka et al., Reference Lastovicka, Svec, Paluba, Kobliuk, Svoboda, Hladky and Stych2020) or Landsat (Zhang et al., Reference Zhang, Ling, Wang, Foody, Boyd, Li, Du and Atkinson2021)—and therefore cannot capture information on individual tree health. Tree level information is crucial to understand future forest dynamics—particularly where individual trees may exhibit significantly different responses, such as for disease (Fensham and Radford-Smith, Reference Fensham and Radford-Smith2021; Hurel et al., Reference Hurel, de Miguel, Dutech, Desprez-Loustau, Plomion, Rodríguez-Quilón, Cyrille, Guzman, Alía, González-Martínez and Budde2021; Kännaste et al., Reference Kännaste, Jürisoo, Runno-Paurson, Kask, Talts, Pärlist, Drenkhan and Niinemets2023) or drought (Teskey et al., Reference Teskey, Wertin, Bauweraerts, Ameye, Mcguire and Steppe2014; Chen et al., Reference Chen, Li, Wan and Liu2022; Fernández-de-Uña et al., Reference Fernández-de-Uña, Martínez-Vilalta, Poyatos, Mencuccini and McDowell2023). New approaches, leveraging the increasing availability of high-resolution data ( $ \le 0.3 $ m), are required to identify individual trees within imagery for monitoring.
Traditional segmentation methods—based on handcrafted algorithms and manual feature engineering (Baatz and Schäpe, Reference Baatz, Schäpe, Strobl, Blaschke and Griesbner2000)—have been used to extract Individual Tree Crown (ITC) polygons from aerial imagery for use in some downstream tasks useful for monitoring. For example, Onishi and Ise (Reference Onishi and Ise2021) applied unsupervised multiresolution segmentation (Baatz and Schäpe, Reference Baatz, Schäpe, Strobl, Blaschke and Griesbner2000) to RGB (Red, Green, and Blue) data of a dense forest canopy in urban Japan, combined with Digital Surface Models (DSMs) to segment ITCs, and classified the results by species or functional type. Although the automatically extracted ITCs contained mostly canopy of the same species, the segmentation algorithm fragmented or merged 76% of tree crowns with neighboring individuals within that species. This method did not require manual labeling, but the low intra-specific segmentation accuracy means it is not comparable to traditional inventory-based monitoring, as the ITC polygons cannot be used to accurately extract measurements relating to individual trees—such as measurement of stress responses including dieback. Since these responses vary substantially, even within species (Hurel et al., Reference Hurel, de Miguel, Dutech, Desprez-Loustau, Plomion, Rodríguez-Quilón, Cyrille, Guzman, Alía, González-Martínez and Budde2021; Fernández-de-Uña et al., Reference Fernández-de-Uña, Martínez-Vilalta, Poyatos, Mencuccini and McDowell2023), accurate segmentation is crucial. We suggest that a deep learning-based approach is likely to yield better results on data with structurally complex canopies from natural forests, where crowns intersect substantially both within and across species.
Many approaches applying deep learning to segment ITCs exist (Diez et al., Reference Diez, Kentsch, Fukuda, Caceres, Moritake and Cabezas2021). Deep learning was applied to perform segmentation through bounding box delineation (typically referred to as object detection in machine learning literature—for example, in Ren et al., Reference Ren, He, Girshick and Sun2016) of trees in both open and closed forest canopies from RGB images in Weinstein et al. (Reference Weinstein, Marconi, Bohlman, Zare and White2019). Object detection was performed with a single-stage deep learning detector (He et al., Reference He, Zhang, Ren and Sun2015; Lin et al., Reference Lin, Goyal, Girshick, He and Dollár2018), achieving a precision of 0.69 and recall of 0.61 in forests from California, at a National Ecological Observation Network (NEON) site. The network was pre-trained using a very large number of noisy labels extracted from LiDAR data and fine-tuned on a smaller number of manual annotations. Precise delineation of individual crowns (typically referred to as instance segmentation in machine learning literature—for example, in He et al., Reference He, Gkioxari, Dollár and Girshick2017) through deep learning has also been applied produce the polygons required to quantify stress accurately at the individual level (Chiang et al., Reference Chiang, Barnes, Angelov and Jiang2020; Hao et al., Reference Hao, Lin, Post, Mikhailova, Li, Chen, Yu and Liu2021; Şandric et al., Reference Şandric, Irimia, Petropoulos, Anand, Srivastava, Pleşoianu, Faraslis, Stateras and Kalivas2022; Sani-Mohammed et al., Reference Sani-Mohammed, Yao and Heurich2022; Yang et al., Reference Yang, Mou, Liu, Meng, Liu, Li, Xiang, Zhou and Peng2022; Ball et al., Reference Ball, Hickman, Jackson, Koay, Hirst, Jay, Archer, Aubry-Kientz, Vincent and Coomes2023). Throughout this work, we refer to this type of segmentation, when applied to tree crowns, as “ITC delineation” to provide clarity to practitioners unfamiliar with machine learning literature, although “instance segmentation” could be used interchangeably. Hao et al. (Reference Hao, Lin, Post, Mikhailova, Li, Chen, Yu and Liu2021), for example, used Mask R-CNN to perform ITC delineation of tree crowns from multispectral 2D imagery coupled with a photogrammetry-derived Canopy Height Model (CHM) in a Chinese fir (Cunninghamia lanceolata) plantation, and achieved an F1 score of 0.85. It was demonstrated that these delineations could be used successfully for a simple downstream task—the extraction of individual tree heights by superposing segmentation on the CHM data. The use of data from plantations, however, is unlikely to yield results that are representative of those from natural woodland; stems are typically evenly aged and spaced and do not have structurally complex canopy. Şandric et al. (Reference Şandric, Irimia, Petropoulos, Anand, Srivastava, Pleşoianu, Faraslis, Stateras and Kalivas2022) similarly segmented trees with Mask R-CNN in partly artificial contexts—across five different species in temperate and Mediterranean orchards. Yang et al. (Reference Yang, Mou, Liu, Meng, Liu, Li, Xiang, Zhou and Peng2022) demonstrated that ITC delineation is possible in the canopy of greater structural complexity, showing some crown intersection and overlap, in the heavily managed environment of Central Park in New York City, USA—although delineation was not verified using ground data—and further demonstrated that a simple structural measurement (crown area) could be replicated compared to manual measurement from aerial data by using these delineations downstream. Ball et al. (Reference Ball, Hickman, Jackson, Koay, Hirst, Jay, Archer, Aubry-Kientz, Vincent and Coomes2023) further demonstrated accurate ITC delineation in tropical forests with a structurally complex canopy, using Mask R-CNN, although differences in leaf reflectance and canopy structure corresponding to species differences may facilitate segmentation more easily than in monospecific canopy, where spectral variation may be relatively lower. Sani-Mohammed et al. (Reference Sani-Mohammed, Yao and Heurich2022) were similarly able to perform ITC delineation using Mask R-CNN in natural temperate forest in Bavaria although only segmented dead trees, which may be easier to identify due to lack of foliage. The use of automatically extracted crown footprints for complex applications such as forest health monitoring, however, is less comprehensively explored.
Several studies exploiting deep learning for automated forest health measurement have emerged in recent years. Some work explores the use of classical computer vision, based on manually engineered algorithms, to perform ITC delineation and then apply deep learning to classify damage levels. Safonova et al. (Reference Safonova, Tabik, Alcaraz-Segura, Rubtsov, Maglinets and Herrera2019), for example, applied manual filtering and thresholding to extract rectangular patches of treetops and applied a number of CNN-based classifiers downstream to classify damage into one of four levels. Nguyen et al. (Reference Nguyen, Lopez Caceres, Moritake, Kentsch, Shu and Diez2021) similarly applied a manually engineered algorithm for patch extraction, instead based on normalized Digital Surface Models (nDSMs). The extracted patches were classified by damage categorically. Although the classification of tree health into multiple damage levels is a promising approach, as dieback is a symptom with several continuous stages (Ciesla and Donaubauer, Reference Ciesla and Donaubauer1994), such approaches may be limited by their segmentation accuracy, particularly where individual trees differ significantly in their response, within and across species. Other works use deep learning-based segmentation for forest health assessment. Schiefer et al. (Reference Schiefer, Schmidtlein, Frick, Frey, Klinke, Zielewska-Büttner, Junttila, Uhl and Kattenborn2023) used a U-net to perform semantic segmentation (per-pixel classification of all pixels within a target image) of standing deadwood in UAV imagery and extrapolated this to the landscape level predictions on time series satellite imagery using Long Short Term Memory Networks (LSTMs). Chiang et al. (Reference Chiang, Barnes, Angelov and Jiang2020) and Sani-Mohammed et al. (Reference Sani-Mohammed, Yao and Heurich2022) both used Mask R-CNN to perform ITC delineation of dead tree crowns from aerial imagery. Where dieback is ongoing, however, it requires measurement on a continuous scale or by multiple categories, rather than binary classification (Ciesla and Donaubauer, Reference Ciesla and Donaubauer1994). Şandric et al. (Reference Şandric, Irimia, Petropoulos, Anand, Srivastava, Pleşoianu, Faraslis, Stateras and Kalivas2022) used deep learning-based ITC delineation via Mask R-CNN to segment crown footprints and performed post hoc analysis using vegetation indices derived from color space transformations to measure tree health on a continuous scale. The approach of using vegetation indices—which do not require additional human labeling—on footprints extracted via deep learning, shows great promise for dieback measurement, but measurements of crown health were not verified versus human measurement. Additionally, the use of plantation data may more easily facilitate this type of forest health measurement—although often monospecific, the canopy formed by artificially spaced trees is easily delineated, simplifying health monitoring. This structural simplicity may not reflect performance in many natural ecosystems.
Here, we develop a new approach for early dieback detection from aerial RGB imagery, using deep-learning based ITC delineation and vegetation indices, and test it in an ecosystem experiencing dieback. Crucially, unlike in previous work (Şandric et al., Reference Şandric, Irimia, Petropoulos, Anand, Srivastava, Pleşoianu, Faraslis, Stateras and Kalivas2022), we verify our results by comparison to field-based dieback measurement by experts. We use drone data collected in a Mediterranean stone pine (Pinus pinea) forest with a structurally complex monospecific canopy, where some individuals are showing signs of drought-induced crown dieback (Moreno-Fernández et al., Reference Moreno-Fernández, Camarero, García, Lines, Sánchez-Dávila, Tijerín, Valeriano, Viana-Soto, Zavala and Ruiz-Benito2022). The severity of drought is projected to increase across the Mediterranean region (Dubrovský et al., Reference Dubrovský, Hayes, Duce, Trnka, Svoboda and Zara2014; Hertig and Tramblay, Reference Hertig and Tramblay2017), and poses a large-scale threat to the functioning of this biodiversity hotspot. Protection through active monitoring and management of these ecosystems is required (Fernández-Manjarrés et al., Reference Fernández-Manjarrés, Ruiz-Benito, Zavala, Camarero, Pulido, Proença, Navarro, Sansilvestri, Granda, Marqués, Temunovič, Bertelsmeier, Drobinski, Roturier, Benito-Garzón, Cortazar-Atauri, Simon, Dupas, Levrel and Sautier2018; Astigarraga et al., Reference Astigarraga, Andivia, Zavala, Gazol, Cruz-Alonso, Vicente-Serrano and Ruiz-Benito2020)—but is currently limited by reliance on time-consuming expert manual inventories taken on the ground. In addition to offering a solution to conducting this monitoring at large scale and low cost, drone-based remote sensing is ideal for the monitoring of drought-induced crown dieback, as effects at crown extremities are more easily observable from above than from traditional ground-based visual assessment. We develop a scalable approach based on deep learning and RGB vegetation indices to monitor this dieback at the individual level, and answer the following questions:
-
1. Is ITC delineation possible in a structurally complex, monospecific canopy?
-
2. Does individual health assessment based on vegetation indices (Şandric et al., Reference Şandric, Irimia, Petropoulos, Anand, Srivastava, Pleşoianu, Faraslis, Stateras and Kalivas2022) correlate with field-based estimation, and to what degree is this assessment affected by the accuracy of deep learning-based ITC delineation?
2. Material and methodology
2.1. Data
2.1.1. Study area and inventory
Our dataset covers 1500 ha total of Pinus pinea forest, across nine areas, showing signs of climate-induced crown dieback. Drone and ground data were collected in Pinar de Almorox, Spain (40.27 °N, 4.36 °W) in May/June 2021. A map showing the location of the study site can be seen in Figure 1a. The area is under a continental Mediterranean climate with a mean annual rainfall of 568 mm, a mean annual temperature of 14°C, and an altitude ranging from 500 to 850 m above sea level. In addition to the dominant canopy species, P. pinea, a smaller number of Juniperus oxycedrus L. and Quercus ilex can be found in the midstory. The understory further contains Salvia rosmarinus L., Lavandula stoechas Lam. and Lamiaceae shrubs, as well as fallen deadwood. Nine distinct, non-overlapping areas were selected to show a gradient of defoliation, and within these inventory data was taken at 31 sample plots (three to four plots per area). Sample plots were circular and of 17 m radius, with a total of 453 adult (diameter at breast height, DBH, > 7.5 cm) trees surveyed in total. Each adult tree was assessed by experts in the field for dieback estimation (percentage defoliation). Crown dieback percentage was estimated visually (with 100% corresponding to wholly defoliated), taking the average score from two experts for each tree within each plot. A histogram of dieback/defoliation percentage for all surveyed trees can be seen in Figure 1b. Trunk locations within each plot were georeferenced using an Emlid Reach RS2 multi-band RTK Global Navigation Satellite System (GNSS) receiver (Ng et al., Reference Ng, Johari, Abdullah, Ahmad and Laja2018). See Moreno-Fernández et al. (Reference Moreno-Fernández, Camarero, García, Lines, Sánchez-Dávila, Tijerín, Valeriano, Viana-Soto, Zavala and Ruiz-Benito2022) for further analysis of dieback patterns at the site and additional data collected.
Note that each area surveyed by the drone was not covered entirely by the plot data. Consequently, trees at the edges of each orthomosaic did not have in situ defoliation estimates or geolocated trunk coordinates. These trees were still delineated manually from orthomosaic images at a later stage to provide training data for deep-learning based ITC delineation, but they were not considered in our analysis of dieback estimation.
2.1.2. Drone imagery
Drone flights were used to obtain images over the nine non-overlapping areas (minimum area 16663 m2, maximum area 39736 m2), covering all plots, using a DJI Mavic Mini drone. The work was carried out with all relevant permissions, and all national drone regulations and location-specific regulations, including the use of rotor guards, were observed. As this area is of high conservation value, particularly for birds of prey, the areas to survey were discussed and approved with land managers in advance of flights. All flights were performed at a fixed altitude 50 m above take-off location and carried out in May/June 2021 at the same time as ground data collection. Flight start times varied from 09:30 to 17:00 (local time) and conditions varied from overcast to clear, resulting in significant differences in lighting conditions and shadowing between drone flights. Raw images of size 4000 × 2250 px were taken from both nadir and oblique (55° below horizontal; Nesbit and Hugenholtz, Reference Nesbit and Hugenholtz2019) angles, with 95% front and 80% side overlap. Ground sampling distance (GSD) was approximately 3 cm for all areas. Between four and six Ground Control Points (GCPs) were placed in each area and precisely located using the GNSS receiver.
GCPs were matched across images semi-automatically using Agisoft Metashape 1.8.1.13915 (Agisoft LLC, 2022). An orthomosaic was generated for each area, also using Agisoft Metashape. These orthomosaics were cropped to remove edge effects. All data used to train deep learning-based ITC delineation were based on the orthomosaics rather than the original images.
2.1.3. Manual delineation and training data
Orthomosaics for each area were split into 1024 $ \times $ 1024 tiles. This resolution was sufficiently large to preserve the visual context around each crown without downsampling, but small enough not to introduce excessive computational overhead. To create training data, all crowns visible in the orthoimagery were delineated manually, including those not covered by field plots, guided by geolocated trunk points where available. Manual delineation was performed in the full-size orthomosaics and the resulting delineations split to match the tiled images. We did not delineate understory growth where it was visible in the drone imagery. A visual example can be seen in Figure 2b. We make this tiled data available in the ubiquitous COCO format (Lin et al., Reference Lin, Maire, Belongie, Bourdev, Girshick, Hays, Perona, Ramanan, Zitnick and Dollár2015). A summary table describing our data can be seen in Table 1. Note that these statistics are derived from the manually delineated polygons, rather than the inventory data, as the inventory data were not available for all trees, only those in the subplots within each orthomosaic. We did not perform any hyperparameter tuning and left hyperparameters as specified by the original author for each component of our methodology (He et al., Reference He, Zhang, Ren and Sun2015, Reference He, Gkioxari, Dollár and Girshick2017; Akyon et al., Reference Akyon, Altinuc and Temizel2022). We therefore split our data into training and test sets only using nine-fold cross-validation, geographically, with each of the nine areas corresponding to the orthomosaic generated by one of the nine non-overlapping drone flights (see Section 2.1.2). In this approach, each fold uses one specific area as the test set while the remaining areas were used for training. This methodology means that the individual predicted footprints, which we used to evaluate segmentation performance and estimate dieback, were not generated by a single, universal model. Instead, each prediction was produced by a model trained on eight areas and tested on the remaining area (from which the tile to predict on is drawn). The average performance of our segmentation approach and the dieback estimation comparison was thus calculated by aggregating the results from each of these nine distinct models, on the area that each model was not trained on, providing a performance metric that reflects performance across all areas.
Note. Crown statistics are derived from manual delineations, rather than inventory, as ground sample plots only covered a small area of each orthomosaic.
2.2. Methods
2.2.1. ITC delineation via instance segmentation
To delineate individual trees from tiled images, we used the ubiquitous Mask R-CNN framework (He et al., Reference He, Gkioxari, Dollár and Girshick2017) with a ResNet-101 FPN backbone (He et al., Reference He, Zhang, Ren and Sun2015). The backbone was pretrained on ImageNet (Deng et al., Reference Deng, Dong, Socher, Li, Li and Fei-Fei2009), and both random resizing and flipping were used for augmentation. We henceforth refer to crown footprints predicted on individual tiled images as “tiled predictions.”
Tiled predictions were recombined using Slicing Aided Hyper Inference (SAHI; Akyon et al., Reference Akyon, Altinuc and Temizel2022). We did not change the hyperparameters, used tiles with a relative overlap of 0.2 at inference, and post-processed predictions using greedy Non-Maximum Merging (NMM) with an Intersection-Over-Smaller (IOS) match threshold of 0.5. During ITC delineation (instance segmentation), each predicted crown is given a confidence score, corresponding to how likely it is that it contains a crown. A minimum confidence threshold of 0.3 was imposed on predictions before merging, as in the original SAHI implementation (Akyon et al., Reference Akyon, Altinuc and Temizel2022). We report segmentation performance using mean Average Precision (mAP; see Beitzel et al., Reference Beitzel, Jensen, Frieder, Liu and Özsu2009) at two stages—on the tiled dataset, averaged across all tiles within each area, and after the tiled predictions are recombined using SAHI, for each of the nine orthomosaics.
2.2.2. Vegetation index-based dieback estimates at the tree level
We used an iterative approach to match ITCs to GNSS-measured trunk locations for trees with ground-truth GPS field locations (positional error < 1 m) and expert dieback assessment (percentage crown defoliation) in the full-size orthomosaics. To do this, the Euclidean distance was calculated between the centroid of each ITC and all ground truth trunk locations. The pair with the smallest distance was taken to be a match, and both ITC and ground truth (field) location for that pair were not considered as part of further matches. This step was repeated until no ground truth locations remained. We opted to discard ITC-trunk pairs where the distance from the centroid of the predicted crown to the ground-truth trunk location was greater than the square root of the crown area, to avoid including automatic dieback estimates that did not reliably correspond to the supposedly matching in situ estimates. See Appendix B for pseudocode. We repeated this analysis twice—once using the ITCs predicted using the deep learning-based ITC delineation model, and once using the manually delineated ITCs (which were also used to train the segmentation model).
For the matched crown-trunk pairs, defoliation was estimated via Green Chromatic Coordinate (GCC), a common metric that has been used to track phenology successfully in a range of ecosystems (Richardson et al., Reference Richardson, Hufkens, Milliman, Aubrecht, Chen, Gray, Johnston, Keenan, Klosterman, Kosmala, Melaas, Friedl and Frolking2018), and a similar metric to the vegetation indices used to track tree health in other works (Reid et al., Reference Reid, Chapman, Prescott and Nijland2016; Şandric et al., Reference Şandric, Irimia, Petropoulos, Anand, Srivastava, Pleşoianu, Faraslis, Stateras and Kalivas2022). GCC is defined below in Equation (1).
Here, R, G, and B refer to the total red, green, and blue pixel values, respectively, summed across the region of interest. ITCs were matched to ground-surveyed trunk locations (see Appendix B), and drone-derived GCC values were then correlated with field-based percentage of defoliation, using GCC values derived from both the manually labeled and automatically segmented crowns.
3. Results
Our results show that automatic crown delineation was possible in variably packed canopy with low species variation (Figure 2). We present both per-tile averages and results for full-size orthomosaic inference using SAHI, per-area in Table 2. Visual results can be seen in Figure 2. An average mAP of 0.519 was achieved on the tiled data, with a minimum of 0.473, a maximum of 0.602, and a standard deviation of 0.037. When recombined for the full orthomosaic using SAHI, mean mAP was reduced to 0.433, with a minimum of 0.377, maximum of 0.544, and a standard deviation of 0.065. We note that performance varied somewhat from area to area. We also note that recombining the tiled predictions increased variation in performance, and decreased mAP for all nine areas.
Note. Mean average precision is reported at IoU thresholds of 0.5 and 0.75 and averaged between 0.5 and 0.95 using intervals of 0.05, as in He et al. (Reference He, Gkioxari, Dollár and Girshick2017).
The correlation between drone and field-based defoliation estimation was strong (Figure 3). We correlated ground estimates of defoliation with GCC estimates derived from crowns segmented in aerial imagery, using both crowns segmented using deep learning (Figure 3a) and crowns segmented by hand (Figure 3b). For the automatically segmented crowns, we found the correlation between calculated GCC and field-based defoliation estimates was significant ( $ p<4\times {10}^{-32} $ , R $ {}^2=0.35 $ ). When the analysis was repeated using the manually segmented (ground truth) crown polygons (Figure 3b, the correlation was significant with equivalent performance ( $ p<2\times {10}^{-33} $ , R $ {}^2=0.34 $ ).
There was little change in the strength of correlation between field-based defoliation estimation and aerial GCC-based estimates, when calculated with the automatically extracted crowns (Figure 3a) versus the manually labeled crowns (Figure 3b). The GCC estimates for each case were matched according to the corresponding ground truth trunk location obtained for each polygon via Algorithm 1, and showed strong correlation against each other in Figure 4 (R $ {}^2=0.54 $ , RMSE $ =0.01 $ , $ p<3\times {10}^{-72} $ ).
4. Discussion
We found that, when evaluated on both the tiled and full-orthomosaic crown segmentation data, Mask R-CNN produced strong results (mAP = 0.519 ± 0.037), comparable or higher to those on machine learning benchmark data (mAP = 0.371 from Lin et al., Reference Lin, Maire, Belongie, Bourdev, Girshick, Hays, Perona, Ramanan, Zitnick and Dollár2015). Although results on different data are not directly comparable, these relatively similar mAP scores indicate that ITC delineation in monospecific canopy is no less achievable than segmenting everyday objects from curated imagery—although we stress that the absolute accuracy of the manually annotated training labels is not known. The lack of need for additional technical development underscores the potential of models such as Mask R-CNN for non-expert use, proving their applicability to real-world conservation. Whilst segmentation performance could likely be improved with a more sophisticated or larger model and better postprocessing, these may not be the limiting factors to using such models in a practical context. Performance decreased in almost all cases when recombining tiled to full-size orthomosaic prediction using SAHI.
Direct comparison to previous work performing ITC delineation is difficult due to variation in metrics used to assess performance, differences in canopy structural complexity between ecosystems, and the inclusion or omission of ground-based data to verify labels. Ground-based data serve as a more definitive source of accuracy, and the lack of such data in some studies (Yang et al., Reference Yang, Mou, Liu, Meng, Liu, Li, Xiang, Zhou and Peng2022; Ball et al., Reference Ball, Hickman, Jackson, Koay, Hirst, Jay, Archer, Aubry-Kientz, Vincent and Coomes2023) can potentially lead to misleading performance metrics or less reliable interpretations of model effectiveness. Although mAP and similar derivative metrics are widely accepted for measuring instance segmentation performance on large-scale benchmark datasets such as COCO (Lin et al., Reference Lin, Maire, Belongie, Bourdev, Girshick, Hays, Perona, Ramanan, Zitnick and Dollár2015), these metrics are not commonly reported in similar works—with authors commonly opting to report performance using metrics such as F1-score (Ball et al., Reference Ball, Hickman, Jackson, Koay, Hirst, Jay, Archer, Aubry-Kientz, Vincent and Coomes2023) or precision (Weinstein et al., Reference Weinstein, Marconi, Bohlman, Zare and White2019). The interpretability of these metrics is beneficial, but these are point metrics relying on a single selection of minimum confidence and Intersection-over-Union (IoU) thresholds—which can be adjusted arbitrarily to maximize the target metric (Maxwell et al., Reference Maxwell, Warner and Guillén2021b). A higher or lower value of F1-score, precision or recall may not represent a better or worse model for a user, even when trained on the same data. mAP-based metrics are a better reflection of model performance, and higher values indicate that a model is likely to be robust to different manual selections of confidence and IoU thresholds (Maxwell et al., Reference Maxwell, Warner and Guillén2021a,Reference Maxwell, Warner and Guillénb). Given the complex nature of forests, including variation in canopy structural complexity, a model’s performance across different ecosystems and conditions relies on its ability to handle observed variability.
We observed a significant negative correlation between field-based defoliation estimates and drone-derived GCC in our site for both automatic (R $ {}^2=0.35 $ ; Figure 3a) and manual segmentation (R $ {}^2=0.34 $ ; Figure 3b), with a high correlation between estimation using the manual labels versus the automatically segmented crowns (Figure 4). The difference between the two GCC estimates for each crown did not grow substantially when the automatically segmented crown footprint was further from the true location (See Appendix A for further details). Our findings suggest that even a basic deep-learning approach performs as effectively as manual annotations in producing ITC footprints for dieback estimation. Investing in more advanced segmentation may not offer significant improvements for estimating dieback or, perhaps, other crown metrics such as Leaf Area Index (LAI)—although further verification would be required to confirm this. Manual labels are, however, difficult to verify as being accurate, and it may be the case that training the model based on crowns labeled by humans from above introduces operator bias (Bai et al., Reference Bai, Zhang, Ding and Ghanem2018; Geva et al., Reference Geva, Goldberg and Berant2019). Verifiable ground-truth labels could derived, for example, from additional instruments such as Terrestrial Laser Scanners (TLS), but such data was not available for this site. Whether it is possible to match the accuracy of TLS-based segmentation from RGB imagery alone is unknown at this time.
The variance in calculated GCC for a given field-based defoliation percentage may be derived from physical reasons related to field-based estimates. Dieback is not expected to be uniform within each crown (Denman et al., Reference Denman, Brown, Vanguelova, Crampton, Asiegbu and Kovalchuk2022)—so metrics based on per-pixel color averages, without regard to spatial patterns, are unlikely to capture the degree of dieback for each crown with perfect accuracy. This effect is exaggerated by the particular patterns of drought-induced dieback, which typically begins at the extremities of the crown (Denman et al., Reference Denman, Brown, Vanguelova, Crampton, Asiegbu and Kovalchuk2022)—resulting in substantially different observations when viewed from above, as in the aerial images used here, and from below, as would be seen in situ. Although other, similar work, uses different vegetation indices (Şandric et al., Reference Şandric, Irimia, Petropoulos, Anand, Srivastava, Pleşoianu, Faraslis, Stateras and Kalivas2022), we suggest that the variance in results is unlikely to diminish significantly while still using vegetation indices based on simple color space transformations. Extracting dieback estimates using the deep learning model directly—akin to classification in the context of the original Mask R-CNN model (He et al., Reference He, Gkioxari, Dollár and Girshick2017), but modified to perform regression—may improve performance as these estimates could leverage spatial patterns in addition to color information. We opted not to use this approach to avoid the need to produce dieback labels for trees visible in orthoimagery that were not already covered by existing field-based estimates.
Our results show that, for data taken with the same instruments, vegetation index-based estimates correlate significantly with expert field-based estimates. However, care should be taken when making direct comparisons between estimates for individual trees. For example—comparing dieback estimates for the same tree from imagery taken at different times, or with different instruments, may generate confusion as measured color intensities will change due to lighting or camera differences. We suggest that color calibrating imagery may improve this, although we did not perform color calibration here as all images were taken with the same instruments and did not make direct comparisons between individual data points obtained under different conditions.
We successfully automated canopy defoliation measurement in a single-species canopy, comprising P. pinea, the only species displaying dieback in our ecosystem. Minor modifications are likely to be required for our approach—based on vegetation indices applied to crown footprints extracted using deep learning-based instance segmentation—to be applicable to data of multiple species. Since canopy reflectance may vary due to differences in leaf spectra and crown structure corresponding to species (Ollinger, Reference Ollinger2011), we would not recommend directly comparing vegetation index-based estimates between two species. The approach of Şandric et al. (Reference Şandric, Irimia, Petropoulos, Anand, Srivastava, Pleşoianu, Faraslis, Stateras and Kalivas2022) indicates that using the deep learning model to classify crown footprints by species prior to any health monitoring is promising, although this increases the labeling burden on practitioners. It is possible that an increase in open ITC delineation data may help to relieve this problem, by removing the need for site-specific datasets, or by reducing volume requirements on labels via pretraining on such open data. Alternatively, estimating dieback based on spatial patterns rather than simple color-space based estimation may prove to be more comparable, as dieback is not expected to be uniform within each crown (Denman et al., Reference Denman, Brown, Vanguelova, Crampton, Asiegbu and Kovalchuk2022). In the case of our data, the canopy comprises P. pinea almost exclusively—which is also the only species displaying dieback in this ecosystem—meaning we do not suffer from such confusion in this work.
Our approach has three important advantages that decrease financial and manual cost to application. Firstly, an increase in processing speed, secondly, the use of a low-cost, widely available drone platform, and thirdly, the reduced reliance on expert analysis in the field. The time savings, for example, of using a large-scale aerial approach such as this rather than a ground method, are significant. Unlike a typical field campaign, including manual inventory and perhaps the use of other instruments such as laser scanners, drone flights for a site can be conducted in hours to days rather than weeks to months. With a trained model, postprocessing these data to extract downstream metrics requires only a few hours (for a dataset of this size on consumer hardware). Short, drone-based campaigns also have smaller requirements for material—with the DJI Mavic Mini used here being available for only a few hundred euros, and being light enough to fly with minimal legal requirements. Combined with the reduced number of person-hours required, the cost of conducting and processing data from a drone-based survey is minimal compared with a traditional field campaign.
Although we used significant manual input at two points during postprocessing, this could be avoided in practice as more open data becomes available (Lines et al., Reference Lines, Allen, Cabo, Calders, Debus, Grieve, Miltiadou, Noach, Owen and Puliti2022). Firstly, manual GCPs were used to stitch together the orthorectified images. We found orthophotos stitched together without the use of GCPs were of comparable visual quality, although we did not compare the use of these images downstream to avoid the need to repeat the time-consuming labeling process. It is possible that the orthomosaics may be georeferenced less accurately without the use of GCPs, although this could be addressed by mounting a more accurate GPS receiver to the drone, and such RTK systems are increasingly becoming available on consumer drone systems. Secondly, the manual labeling process was time-consuming and required several weeks to complete by a single user. Although we used ground truth labels to verify our results, they may not be required for practitioners to use similar approaches, particularly with community efforts to provide ground-truth training data for similar applications (Puliti et al., Reference Puliti, Pearse, Surový, Wallace, Hollaus, Wielgosz and Astrup2023). Large foundation models, for example, can produce reasonable segmentation in a wide range of unseen contexts when pretrained on vast quantities of general visual data (Kirillov et al., Reference Kirillov, Mintun, Ravi, Mao, Rolland, Gustafson, Xiao, Whitehead, Berg, Lo, Dollár and Girshick2023), although can be difficult to employ using consumer hardware. Some of these approaches rely on context-specific prompting to identify rough areas of interest (Kirillov et al., Reference Kirillov, Mintun, Ravi, Mao, Rolland, Gustafson, Xiao, Whitehead, Berg, Lo, Dollár and Girshick2023). As accuracy requirements for prompting are less strict than for full segmentation, methods for generating these prompts developed on open forest data may be sufficiently accurate on unseen ecosystems. Alternatively, smaller models such as the one used here could be pretrained on public ITC delineation data, and used without retraining. Both of these solutions, however, require large-scale data of tree crowns from a diverse set of ecosystems.
In addition to speed and cost savings, the adoption of automated, digital methods also benefits accuracy (Mu et al., Reference Mu, Fujii, Takata, Zheng, Noshita, Honda, Ninomiya and Guo2018) and reduces dataset sampling bias. Typically, an inventory-based approach can gather only simple structural measurements such as trunk diameter or height, which could be used to produce crude estimates of difficult to measure quantities such as crown area. A data-driven approach, using drone-based remote sensing and deep learning, allows for measurement of quantities such as crown area far more accurately than hand measurement, but without the laborious time requirements of other highly accurate instruments such as TLS scanners. The comprehensive nature of automated surveying also reduces selection bias—for example, vegetation in areas that are difficult to reach by foot, such as on very steep terrain, may differ physiologically from plants found on flat ground. A drone-based survey can include a large number of stems from such areas, whereas an inventory-based survey may only include stems in areas that can be accessed directly.
With these combined improvements in speed, cost, coverage, and accuracy, profound implications for forest management in deteriorating ecosystems are evident. Real-time detection of tree dieback is essential, as declining trees become hotspots for primary pests and pathogens to thrive and multiply. Additionally, such trees are more susceptible to secondary pest attacks (Balla et al., Reference Balla, Silini, Cherif-Silini, Chenari Bouket, Moser, Nowakowska, Oszako, Benia and Belbahri2021). Regular updates on tree health are vital for guiding management strategies, such as thinning stands to lessen resource competition among surviving trees—crucial under extreme climate conditions—or implementing sanitation cuts to eliminate trees harboring pest or pathogen populations (Roberts et al., Reference Roberts, Gilligan, Kleczkowski, Hanley, Whalley and Healey2020).
For this paradigm shift toward automation to be applied in practice, two requirements remain. Firstly, an increase in extensive, diverse open datasets is vital for both training and validation, minimizing the labeling burden on end-users (Lines et al., Reference Lines, Allen, Cabo, Calders, Debus, Grieve, Miltiadou, Noach, Owen and Puliti2022). Secondly, open and easy-to-use implementations of these tools, applied to common forest management data formats, are needed to minimize barriers to deployment for non-expert users. Some promising initiatives already exist (Ball et al., Reference Ball, Hickman, Jackson, Koay, Hirst, Jay, Archer, Aubry-Kientz, Vincent and Coomes2023; Environmental Data Science Book, 2023). With these two requirements fulfilled, the shift to automation in forest management and conservation could be expedited. Focus should therefore be given to fostering a culture of data sharing, and to the development of accessible platforms to deploy these methods. Toward these goals, we make both code and data available for this work, in both common remote sensing and machine learning-readable formats.
5. Conclusions
In this work, we made three key contributions to the state-of-the-art in aerial tree health assessment. Firstly, we demonstrated the feasibility of deep learning-based ITC delineation in structurally complex monospecific natural canopies, where reduced leaf spectra variation between individual trees may result in relatively lower variation in canopy reflectance compared to mixed canopies. Although verifying crown segmentation precisely is challenging, we corroborated our segmentation method using ground truth GPS trunk locations to assess its accuracy. We highlight the need for further verification in future work, potentially through ground-based tools such as Terrestrial Laser Scanning (TLS).
Secondly, our work found that detecting crown dieback using vegetation indices derived from deep learning-based crown footprints is feasible, as suggested by Şandric et al. (Reference Şandric, Irimia, Petropoulos, Anand, Srivastava, Pleşoianu, Faraslis, Stateras and Kalivas2022). For the first time, we validated this type of tree health assessment with field-based expert observations and found that there was strong correlation between our automatic estimates and field-based estimates.
Lastly, we give evidence that our method of tree health assessment is not overly sensitive to segmentation accuracy. Repeating the analysis with ground truth segmentation labels in place of model predictions did not significantly improve the correlation with field-based assessment. This finding suggests developing more precise segmentation methods for tree health assessment may not be necessary, although more verification is required when using more complex or spatially explicit measurements downstream.
Author contribution
All authors contributed to the conceptualization of the work; M.J.A and E.R.L designed the methodology; E.R.L., D.M.-F. and R.-B. collected and pre-processed the data; M.J.A. processed the data and wrote the relevant software. M.J.A. analyzed the data and led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.
Competing interest
The authors declare none.
Data availability statement
Our code is available at github.com/mataln/dlncs-dieback. A permanent record is available at http://doi.org/10.5281/zenodo.10657079. Our data is available at http://doi.org/10.5281/zenodo.10646992.
Ethics statement
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Funding statement
M.J.A. was supported by the UKRI Centre for Doctoral Training in Application of Artificial Intelligence to the Study of Environmental Risks (EP/S022961/1). S.W.D.G. and E.R.L. were funded by a UKRI Future Leaders Fellowship awarded to E.R.L. (MR/T019832/1). P.R-B. was supported by the Community of Madrid Region under the framework of the multi-year Agreement with the University of Alcalá (Stimulus to Excellence for Permanent University Professors, EPU-INV/2020/010). P.R-B. and E.R.L. are supported by the Science and Innovation Ministry (subproject LARGE and REMOTE, N PID2021-123675OB-C41 and PID2021-123675OB-C42).
Abbreviations
- CHM
-
Canopy Height Model
- COCO
-
Common Objects in COntext
- ExG
-
Excess Green Index
- GCC
-
Green Chromatic Coordinate
- GCP
-
Ground Control Point
- GNSS
-
Global Navigation Satellite System
- GSD
-
Ground Sampling Distance
- IOS
-
Intersection-Over-Smaller
- IoU
-
Intersection-Over-Union
- ITC
-
Individual Tree Crown
- LAI
-
Leaf Area Index
- LiDAR
-
Light Detection and Ranging
- LSTM
-
Long Short-Term Memory
- mAP
-
Mean Average Precision
- nDSM
-
normalized Digital Surface Model
- NEON
-
National Ecological Observation Network
- NMM
-
Non-Maximum Merging
- OLS
-
Ordinary Least Squares
- R-CNN
-
Region-based Convolutional Neural Network
- RGB
-
Red, Green, and Blue
- RMSE
-
Root-Mean-Square Error
- RTK
-
Real-Time Kinematic
- SAHI
-
Slicing Aided Hyper Inference
- TLS
-
Terrestrial Laser Scanning
Appendix
A. Residual Plots
A plot of the residuals corresponding to the OLS of Figure 3, as a function of centroid-trunk match distance, can be seen in Figure 5. The magnitude of the residuals does not seem to increase with match distance for distances less than $ 1 $ , as per Algorithm 1. We suggest, for this reason, that the matching procedure outlined in Algorithm 1 is not the source of any variation in calculated GCC. This is despite the observation that P. pinea trunks often display curvature where the crown center does not coincide with the trunk position. The linear model overestimates GCC for larger in situ defoliation estimates. There is no reason a priori to expect GCC to be a linear function of the in situ visual defoliation estimates.
B. Crown Matching
We outline the iterative approach used to match ITCs with ground-surveyed trunk locations in Algorithm 1. Refer to https://numpy.org/doc/ for further details regarding specific functions (Harris et al., Reference Harris, Millman, van der Walt, Gommers, Virtanen, Cournapeau, Wieser, Taylor, Berg, Smith, Kern, Picus, Hoyer, van Kerkwijk, Brett, Haldane, del Río, Wiebe, Peterson, Gérard-Marchant, Sheppard, Reddy, Weckesser, Abbasi, Gohlke and Oliphant2020).
Algorithm 1 NumPy-like pseudocode for crown matching
# pred_polys—list of predicted polygons
# n_trunks—number of ground-truth trunk locations
# n_poly—number of ITCs
dist ← np.empty(n_trunks,n_poly)
while dist.shape[0] > 0 do
i, j ← np.where(dist == np.min(dist))
gcc ← calculate_gcc(pred_polys[j])
dist ← np.delete(dist, i, axis =0)
dist ← np.delete(dist, j, axis =1)
pred_polys. pop(j)
end while