Impact Statement
Data sets obtained on a large regional or even national scale, not recorded for a specific study, often present a significant heterogeneity, requiring extensive preprocessing efforts. Despite these challenges, these data sets can reveal valuable ecological information and can be used as readily available data sets. This study shows the advantages of using heterogeneous light detection and ranging (LiDAR) data for ecological modeling and mapping. The study emphasizes the benefits of exploiting so-called “data-at-hand”, rather than dismissing those in anticipation of more refined data sources.
1. Introduction
Forests provide a variety of indispensable ecosystem services, such as water storage and purification, regulation of air quality, and climate regulation by functioning as a sink and source for greenhouse gases, as well as recreation and provision of raw materials and food (TEEB, Reference Kumar2010). Thus, forests and their biodiversity are indispensable for mitigating the effects of climate change e.g., as carbon sinks (Hisano et al., Reference Hisano, Searle and Chen2018). In addition, forests with rich vegetation diversity and structural complexity offer various positive effects on biodiversity, including the promotion of animal species richness (Felix et al., Reference Felix, Campa, Millenbah, Winterstein and Moritz2004; Heidrich et al., Reference Heidrich, Brandl, Ammer, Bae, Bässler, Doerflerh, Fischer, Gossner, Heurich, Heibl, Jung, Krzystek, Levick, Magdon, Schall, Schulze, Seibold, Simons, Thorn and Müller2023; Macarthur and Macarthur, Reference Macarthur and Macarthur1961; Stein et al., Reference Stein, Gerstner and Kreft2014; Zellweger et al., Reference Zellweger, Braunisch, Baltensweiler and Bollmann2013). To preserve the ecosystem services and functions that forests provide, and to secure their climate mitigation potential, comprehensive information on the state and diversity of their ecosystems is needed to inform decision-making. An important component in this context is the accurate assessment of tree species composition (Berg, Reference Berg1997; Cavard et al., Reference Cavard, Macdonald, Bergeron and Chen2011; Felton et al., Reference Felton, Petersson, Nilsson, Witzell, Cleary, Felton, Björkman, Sang, Jonsell, Holmström, Nilsson, Rönnberg, Kalén and Lindbladh2020; Gamfeldt et al., Reference Gamfeldt, Snäll, Bagchi, Jonsson, Gustafsson, Kjellander, Ruiz-Jaen, Fröberg, Stendahl, Philipson, Mikusiński, Andersson, Westerlund, Andrén, Moberg, Moen and Bengtsson2013; Seidl et al., Reference Seidl, Schelhaas and Lexer2011) and additionally, the classification of forest successional stages at the tree species level. Forest successional stages typically describe the development of the forest ecosystem after a disturbance in several phases, which are different in forest structure and can thus serve as indicators for forest biodiversity (Wilson and Peter, Reference Wilson and Peter1988). For example, early successional forest ecosystems can provide complex structures of herbs and shrubs, that support high species diversity, and provide valuable habitat for many arthropods as well as numerous rare species (Swanson et al., Reference Swanson, Franklin, Beschta, Crisafulli, DellaSala, Hutto, Lindenmayer and Swanson2011). Hilmers et al. (Reference Hilmers, Friess, Bässler, Heurich, Brandl, Pretzsch, Seidl and Müller2018) found that the early and late successional stages support high biodiversity in temperate forests. Such a high level of biodiversity also enhances forest resilience to climate change, as it is linked to the functioning of the ecosystem (Hisano et al., Reference Hisano, Searle and Chen2018). However, the effects of climate change, such as fires, storms, and the introduction of new species, can also alter processes of forest succession (Dale et al., Reference Dale, Joyce, McNulty, Neilson, Ayres, Flannigan, Hanson, Irland, Lugo, Peterson, Simberloff, Swanson, Stocks and Wotton2001). Therefore, knowledge of forest successional stages and their associated ecological processes is crucial for understanding and mitigating for example climate change or anthropogenic disturbances (Corona et al., Reference Corona, Chirici, McRoberts, Winter and Barbati2011; Poorter et al., Reference Poorter, Amissah, Bongers, Hordijk, Kok, Laurance, Lohbeck, Martínez-Ramos, Matsuo, Meave, Muñoz, Peña-Claros and van der Sande2023). Such an understanding can furthermore improve monitoring and is fundamental for the development of adequate conservation strategies (Hilmers et al., Reference Hilmers, Friess, Bässler, Heurich, Brandl, Pretzsch, Seidl and Müller2018; Tew et al., Reference Tew, Conway, Henderson, Milodowski, Swinfield and Sutherland2022).
The monitoring of forests and their successional stages is one of the main goals of extensive manual forest inventories, which usually only provide point-based information (Vidal et al., Reference Vidal, Alberdi, Redmond, Vestman, Lanz and Schadauer2016). Comprehensive area-wide information on their spatial distribution and proportions can be of help for near-natural forest management (Hilmers et al., Reference Hilmers, Friess, Bässler, Heurich, Brandl, Pretzsch, Seidl and Müller2018). Remote sensing can also contribute to enhancing traditional forest inventories (White et al., Reference White, Coops, Wulder, Vastaranta, Hilker and Tompalski2016). Multispectral remote sensing has been found to be a feasible approach to classify tree species in numerous studies (Grabska et al., Reference Grabska, Hostert, Pflugmacher and Ostapowicz2019; Hemmerling et al., Reference Hemmerling, Pflugmacher and Hostert2021; Hościło and Lewandowska, Reference Hościło and Lewandowska2019; Immitzer et al., Reference Immitzer, Vuolo and Atzberger2016; Welle et al., Reference Welle, Aschenbrenner, Kuonath, Kirmaier and Franke2022; Wessel et al., Reference Wessel, Brandmeier and Tiede2018; Xi et al., Reference Xi, Ren, Tian, Ren, Dong and Zhang2021). Several studies have utilized remote sensing, particularly light detection and ranging (LiDAR) data, for area-wide classification of successional stages and the age of forest stands (Berveglieri et al., Reference Berveglieri, Imai, Tommaselli, Casagrande and Honkavaara2018; Cao et al., Reference Cao, Yu, Sanchez-Azofeifa, Feng, Rivard and Gu2015; Duan et al., Reference Duan, Bax, Laakso, Mashhadi, Mattie and Sanchez-Azofeifa2023; Falkowski et al., Reference Falkowski, Evans, Martinuzzi, Gessler and Hudak2009; Fujiki et al., Reference Fujiki, Okada, Nishio and Kitayama2016; Maltman et al., Reference Maltman, Hermosilla, Wulder, Coops and White2023; Zhao et al., Reference Zhao, Sanchez-Azofeifa, Laakso, Sun and Fei2021). However, those studies analyze the successional stages across large areas without differentiating between tree species. Up-to-date studies classifying tree species-specific successional stages are still rare (Stoffels et al., Reference Stoffels, Hill, Sachtleber, Mader, Buddenbaum, Stern, Langshausen, Dietz and Ontrup2015), but would contribute to recognizing the distinct differences associated with each tree species. Additionally, these studies were performed in rather small areas with temporally aligned LiDAR data, typically collected through dedicated flight campaigns. Unfortunately, LiDAR surveys are still very cost- and labor-intensive and therefore often not directly commissioned by ecological monitoring programs.
Even though costs for LiDAR flight campaigns are high, Germany and large parts of Europe benefit from abundant LiDAR data collected through statewide governmental campaigns. The complete coverage of a federal state in Germany through multiple flights typically spans several years. For instance, regions such as Hesse (HVBG, 2023) and Saxony (GeoSN, 2023) have intervals of 6 years, North Rhine-Westphalia (Geobasis NRW, 2023) is covered every 5 years, and Rhineland-Palatinate (LVermGeo, 2023) every 4 years. Similar circumstances are found in other European countries, for example in Finland (NLS, 2023) or Spain (MITMA, 2023) where governmental LiDAR data are collected at intervals of approximately 6 years, or in Estonia with updates every 4 years (Maa-amet, 2023). Moreover, data availability is unsystematically documented with no common standard or database. As the data are collected over multiple flights, there are e.g., inconsistencies in flight dates and technical scanning properties. Furthermore, also the already rather low point resolution of LiDAR data can vary as there are ongoing developments in sensor technologies (see Figure 4). As a consequence, these data sets are often viewed as not reliable enough for ecological research purposes at larger scales and in some cases, it is even documented that studies refrained from using LiDAR data sets for modeling due to their presumed poor quality (Stoffels et al., Reference Stoffels, Hill, Sachtleber, Mader, Buddenbaum, Stern, Langshausen, Dietz and Ontrup2015). However, obtaining an exact overview of when the data were not used for this reason is difficult, as the majority of studies do not report instances of unused data.
This study evaluates the potential of typically available heterogeneous LiDAR data in Germany and many parts of Europe for mapping temperate forest successional stages at the tree species level. Instead of only mapping e.g. tree species or age distribution of a forest, this present study explicitly focuses on classifying and mapping forest successional stages for individual tree species. A comparative analysis of models is conducted, employing different combinations of variables, which were derived from optical satellite data (Sentinel-2) and heterogeneous LiDAR data. Random forest models were used with a modeling approach that takes spatial auto-correlation into account by using spatial variable selection and spatial cross-validation techniques (Meyer et al., Reference Meyer, Reudenbach, Wöllauer and Nauss2019; Ploton et al., Reference Ploton, Mortier, Réjou-Méchain, Barbier, Picard, Rossi, Dormann, Cornu, Viennois, Bayol, Lyapustin, Gourlet-Fleury and Pélissier2020). In a hierarchical modeling approach, first a large-scale map for the seven most common tree species groups (Douglas fir, larch, pine, spruce, beech, oak, and other deciduous trees) was generated for the entire federal state of Rhineland-Palatinate, Germany. Subsequently, for each mapped tree species group up to three successional stages (qualification, dimensioning, and maturing) were modeled in three modeling approaches utilizing different variable sets. In doing so, the aim of this study is to determine whether the utilization of heterogeneous LiDAR data can positively influence model outcomes for forest successional stages at the tree species level.
2. Materials and methods
In the following sections, the modeling of tree species group-specific forest successional stages is presented in detail (see Sections 2.1–2.3.4 and Figure 1). The methodology involves training different models with varying combinations of Sentinel-2 and/or LiDAR data to predict forest successional stages utilizing the forest inventory of Rhineland-Palatinate as reference data. Through the application of spatial variable selection and spatial validation techniques, the potential of the heterogeneous LiDAR data was evaluated. The successional stages of the different tree species groups were mapped with a hierarchical approach. A tree species group model of Rhineland-Palatinate formed the basis for the successional stages models (see Section 2.3.4). All data processing and modeling was done in R version 4.2.3 (R Core Team, 2023).
2.1. Study area
The federal state of Rhineland-Palatinate with an area of 19,858 km2 (see Figure 2) is one of the especially forest-rich regions in Germany, with 42% of its area covered by temperate forest (BMEL, 2018). Only 25.6% are state-owned and surveyed by regular forest inventory campaigns. Most of the forests in Rhineland-Palatinate (46.1%) are owned by public corporations (e.g., local administration) or privately owned (26.7%) and therefore, no centralized information on the state of all forests is available (Thünen-Institut, 2012b). The majority of the forests are mixed forests and only 17.7% of the forests are pure stands (Thünen-Institut, 2012a). Overall, deciduous forests predominate, with shares ranging from 54.8% to 64.6% of the forest, depending on the ownership structure (Thünen-Institut, 2012c).
2.2. Databases
2.2.1. Forest inventory data
In this study, the official forest inventory of RhinelandPalatinate was used as reference data, encompassing stand information from state-owned forests. Each forest stand in varying size and shape is recorded in polygons, which led to approximately 170,000 polygons (see Figure 3). From these forest inventory polygons (Landesforsten Rheinland-Pfalz, 2014) information about the forest successional stage, the most common tree species group and the species purity were utilized. The polygons were filtered to a purity of the most common tree species group of at least 80%. Following this filtering process, seven tree species groups with at least 50 polygons for two to three successional stages (see Table A5 in the appendix) remained for model training: Douglas fir, larch, pine, spruce, beech, oak and other deciduous trees. Three successional stages were considered for all tree species groups except for larch and pine: the qualification stage (I), represents the early growth phase, which begins as the young trees outgrow competition vegetation (Landesforsten Rheinland-Pfalz, 2023). Following this, the dimensioning stage (II) develops, characterized by a notable decline in the height and lateral growth of the tree crown. The oldest successional stage considered in this study is the maturing stage (III), where the tree surpasses 75–80% of its final height, resulting in a deceleration of height growth. Since the number of polygons available for the qualification stage of both pine (16 polygons) and larch (1 polygon) was insufficient to provide representative information, the focus for these two tree species groups was directed solely on the dimensioning and maturing stages.
2.2.2. Sentinel-2 data
Multispectral optical data are proven to be adequate for tree species classifications (Grabska et al., Reference Grabska, Hostert, Pflugmacher and Ostapowicz2019; Hemmerling et al., Reference Hemmerling, Pflugmacher and Hostert2021; Hościło and Lewandowska, Reference Hościło and Lewandowska2019; Immitzer et al., Reference Immitzer, Vuolo and Atzberger2016; Wessel et al., Reference Wessel, Brandmeier and Tiede2018; Xi et al., Reference Xi, Ren, Tian, Ren, Dong and Zhang2021), making them an essential component in this study as well. ESA’s Sentinel-2 data provided the spectral predictors for the models and were processed using the “Software Framework for Operational Radiometric Correction for Environmental Monitoring” (FORCE; version 3.7.10; Frantz, Reference Frantz2019). With FORCE, the Sentinel-2 data from 2019 to 2021 were downloaded at level 1C and further atmospherically as well as topographically corrected. Within FORCE near-infrared Landsat data were used to correct the spatial position of the Sentinel-2 images and, thus, decreasing the spatial error across satellite images (Rufin et al., Reference Rufin, Frantz, Yan and Hostert2021). The Sentinel-2 images of the 3 years were used to create high-quality gap-free monthly mean composites for the entire state of Rhineland-Palatinate at a resampled spatial resolution of 10 m. To cover the whole phenological development, one image for winter (January), four covering the fast-changing period from deciduous leaf-unfolding to establishing the canopy (March, April, May, and June) and two images for leaf senescence (September and October) were created. In addition to the original bands, multiple spectral indices reflecting vegetation properties were calculated. Table 1 shows the Sentinel-2 spectral bands and indices that were used in this study. Refer to Bhandari et al. (Reference Bhandari, Bald, Wraase and Zeuss2024) for a more detailed and comprehensive description of the workflow for processing the Sentinel-2 data.
2.2.3. LiDAR data
The LiDAR data are the key elements of this study, as its aim is to identify the potential of these heterogenous data in contributing to the classification of successional stages for individual tree species groups. The LiDAR data utilized in this study were collected by the department for Surveying and Geographic Information of Rhineland-Palatinate (GeoBasis-DE/LVermGeoRP, 2022). Recently, the acquisition interval for LiDAR data in Rhineland-Palatinate was increased from a collection over 9 years to only 4 years. As the transition is still ongoing, in this study data from a 7-year interval from 2014 to 2021 covering the whole state were used (LVermGeo, 2023). Since the data result from many different flights, there are variations in data point density across the acquisition dates (see Figure 4). Study areas consisting of different flight campaigns tend to also vary notably in technical properties such as scan angle and flight altitude (Næsset, Reference Næsset2009; Ørka et al., Reference Ørka, Næsset and Bollandsås2010; Solberg et al., Reference Solberg, Brunner, Hanssen, Lange, Næsset, Rautiainen and Stenberg2009). However, often detailed metadata on these parameters are not available for freely available data. For the LiDAR data used in this study information was provided only on the sensors used each year and beam divergence, which can be found in Table A3 in the appendix. In total, 29 indices were computed from the LiDAR data using the Remote Sensing Database (RSDB) of Wöllauer et al. (Reference Wöllauer, Zeuss, Magdon and Nauss2020; see Table 2). The indices were computed for all areas identified as deciduous or coniferous forests according to the Copernicus high-resolution layer for forest types of 2018 with a spatial resolution of 10 m (EEA, 2022). The calculated indices represent different categories of forest structure including canopy characteristics (e.g., canopy height), vegetation structure (e.g., the penetration rate of different vegetation layers), overall vegetation properties (e.g., aboveground biomass, vegetation coverage, and leaf area index), and terrain features (e.g., elevation; see Table 2).
2.3. Methods
2.3.1. Matching data
For each tree species group, the successional stages from the forest inventory data were used as response variables, while either Sentinel-2, LiDAR, or Sentinel-2 and LiDAR variables were used as predictors in different models (see Figure 1). To process the polygons from the forest inventory data, all intersecting pixels from the Sentinel-2 and LiDAR variables were extracted. To prevent confusion with adjacent areas, a 10 m negative buffer was applied at the edges of the polygons to exclude the border areas of the polygons.
2.3.2. Balancing data and splitting into testing and training data
Reference data were balanced to ensure that all classes of successional stages of tree species groups were treated equally in the modeling process, finding a trade-off between as much training data as possible and equal distributions across classes. The data were balanced to ensure that for each tree species group (I) the same number of polygons from each successional stage was used and at the same time (II) the same number of pixels from each polygon was randomly sampled. If a small number of pixels from within each polygon is chosen, many of the available polygons can be included (also very small polygons). However, only very few pixels are used from each polygon, even from those that are very large. On the other hand, sampling a large number of pixels from each polygon results in more polygons being excluded from consideration, but it allows for more pixels to be sampled from the remaining polygons. This sampling of the data set was therefore optimized individually for each tree species group producing a balanced data set as large as possible (for more details see Appendix Figure A1 or the R code in the data availability statement). From this data set, 20% of the polygons from each class were retained for external testing. The remaining data were used for model training and validation (see Tables A1 and A5 in the appendix). From the training data sets for the successional stages, a data set for the tree species group model was created. This data set was later used as a base for the hierarchical mapping of the successional stages. The same balancing process as for each of the successional stages models was done for the tree species group model on this data set.
2.3.3. Model specifications
Random forest models (Breiman, Reference Breiman2001) were trained with a forward feature selection (FFS) from the R package CAST (version 0.7.1; Meyer et al., Reference Meyer, Milà, Ludwig and Linnenbrink2023). The FFS trains the models with each possible two-variable combination, keeps the best performing one and adds more predictor variables until none decreases the error of the current best model. This allowed the recognition and removal of variables that lead to overfitting (Meyer et al., Reference Meyer, Reudenbach, Hengl, Katurji and Nauss2018). As a result, only a small proportion of the variables prepared in this study, specifically those that are relevant to the models, are actually used.
Spatial cross-validation was used during variable selection and model tuning to evaluate which variables and hyperparameters lead to the highest ability to make predictions for new spatial locations within the study area. The polygons were used as spatial units and were randomly split into ten different folds for spatial cross-validation (Meyer et al., Reference Meyer, Reudenbach, Hengl, Katurji and Nauss2018; Ploton et al., Reference Ploton, Mortier, Réjou-Méchain, Barbier, Picard, Rossi, Dormann, Cornu, Viennois, Bayol, Lyapustin, Gourlet-Fleury and Pélissier2020). The final models were then tested on 20% of the polygons that were held out for spatially independent testing (see section 2.3.2) to evaluate the potential of LiDAR data for classifying successional stages.
2.3.4. Modeling approach
To analyze the utility of LiDAR variables to classify and map forest successional stages, models were trained on different combinations of Sentinel-2 and LiDAR variables using a variable selection algorithm. Three models were trained for each of the tree species groups Douglas fir, larch, pine, spruce, beech, oak, and other deciduous trees, to predict the successional stages. The models solely using Sentinel-2 variables are hereafter referred to as the “spectral models”, the models incorporating Sentinel-2 and LiDAR variables will be denoted as “hybrid models” in the following and the models exclusively trained on LiDAR variables as “structural models”. The comparison focused solely on models for successional stages assuming the tree species group as known.
To show the applicability of the successional stage models, an area-wide map with a resolution of 10 m for all forested areas of Rhineland-Palatinate was generated. To achieve this, the Copernicus high-resolution layer forest-type data from 2018 were used as a forest mask (EEA, 2022). To map the successional stages a tree species groups model was used as a baseline in a hierarchical modeling approach. This entailed a two-step process: first, modeling all tree species groups across the entire area as a baseline, and second, modeling successional stages based on the predicted tree species groups. The modeling approach (either spectral, structural, or hybrid) that performed best across all tree species groups on the test data, was used for mapping. The tree species groups model was based on 360 training polygons from the forest inventory data, each with 180 pixels, and tested on 96 polygons. The tree species groups were modeled using the same modeling approach and the performance was tested using the same testing data set as for all successional stages models. These data sets were never considered during model training, neither for the tree species model nor for the successional stages models and were spatially independent from the training data.
3. Results
This section presents the study’s findings on the performance of three different modeling approaches for modeling the successional stages of tree species groups. The potential of using heterogeneous LiDAR data was assessed by comparing the results of spectral, structural, and hybrid models. Additionally, the variable selection of the different models and the area-wide prediction of tree species groups specific successional stages throughout Rhineland-Palatinate were analyzed.
3.1. Model performance
The structural models (accuracy from 0.4 to 0.68) and hybrid models (accuracy from 0.43 to 0.78) performed notably better than the spectral models (accuracy from 0.33 to 0.63) for all tree species groups. The hybrid models and structural models were quite similar in performance with the hybrid models performance always being slightly superior except for pine, where model performances were the same (see Figure A2 in the appendix). Therefore, and for more clear comparability, the following analyses were limited to the comparison of the spectral models to the hybrid models. The results of the structural model can be found in Figure A2 in the appendix. The two left columns of Figure 5 show the test results of each successional stage and model. The right column shows the difference in the proportion of correctly classified pixels between models. For the spectral models, an overall accuracy between 0.33 for the group other deciduous trees and 0.63 for larch could be achieved. With the additional LiDAR variables in the hybrid models, the overall accuracies could be increased to between 0.43 for other deciduous trees to 0.78 for larch. However, the models for spruce and beech only gained very little improvements in overall accuracy (0.05 and 0.04) compared to the spectral models and therefore, the additional use of LiDAR variables (hybrid model) could not notably improve those performances.
The largest increase per tree species group in overall accuracy by adding LiDAR variables occurred for Douglas fir with an increase of 0.23, followed by oak and larch with an increase of 0.19 and 0.15, respectively. Overall, for individual successional stages, only the performances of the maturing stage of larch and beech decreased, all other stages benefited from the additional LiDAR variables. To investigate whether one successional stage profited more from the availability of LiDAR variables than another, an analysis of variance was performed. The differences in gain of accuracy between the modeled stages across all tree species groups showed no significant trend (p-value 0.29). A t-test was conducted to determine if the increase in accuracy differs between deciduous and coniferous forests, however, no significant difference existed (p-value 0.72). Generally, confusion matrices indicated that confusion predominantly occurred among adjacent successional stages. The only exception is the qualification stage of other deciduous trees, where only 5% of pixels were classified correctly. Here, the most misclassifications were not in the adjacent stage (dimensioning stage) but in the maturing stage, which led to low evaluation scores (precision, recall, and f1). In the hybrid model, the scores of the most diverse tree species group of other deciduous trees clearly improved but still had the weakest performance with an accuracy of 0.43 and poor recall (qualification: 0.22 and dimensioning: 0.36) and precision values below 0.4 (dimensioning: 0.37). Therefore, it appeared inappropriate to use this group and its classified stages for mapping and consequently, it was excluded from area-wide mapping (see Section 4.3). For all other hybrid models, overall accuracies were at least above 0.6 and the gain in accuracy through the usage of LiDAR variables was 0.13 on average (see Figure 5).
3.2. Contribution of predictor variables
During model training, the feature selection process optimized the selection of variables to create the optimal model. As described in more detail in Section 2.3.4, the variable selection of the FFS starts with a combination of two variables and adds the variable to the model that improves the current model the most until no further improvement occurs (Meyer et al., Reference Meyer, Reudenbach, Hengl, Katurji and Nauss2018). As a result, from a multitude of variables, only those deemed important for the models were selected and used in the models, as a side effect of this process correlated variables are only considered once. The assessment of the variable importance of each variable and each model is provided in Figure A4 in the appendix. As the hybrid model additionally used LiDAR-derived variables it was expected that the composition of variables changed for each hybrid model compared to the spectral model. As there might be similarities between variables (especially between the vegetation indices), all variables were categorized into groups by their information content to enable a comparison (see Tables 1 and 2). Figure 6 displays boxplots for both the spectral and the hybrid models containing the ranks of the variables per group as determined by the FFS. A smaller rank indicates an earlier selection and therefore, a stronger improvement and contribution to the model. The boxplot of the spectral model shows the lowest ranks for the variables of the shortwave infrared group (median rank 2), followed by the visible (median rank 4). The near-infrared and the group of the vegetation indices both were selected on average at rank 6. For the spectral models, variables from all groups but the visible bands were selected for the first variable combination (Note: As the FFS chooses a combination of two variables to start the feature selection with, rank 1 exists twice for each model). In every model at least one variable from the group of vegetation indices and for the hybrid models additionally, one variable from the group canopy was selected. For all other variable groups, at least two models did not select any variables from the respective variable group (see Table 3).
For the hybrid models, there was a clear shift in variable selection. During variable selection, the group of canopy variables, containing different properties of the canopy, was selected the earliest. This is represented by the median on the first rank, which differed significantly from the other variable groups (see appendix Table A6) with median ranks ranging from four to six. The vegetation indices and the group of near-infrared bands had on average a lower rank in the hybrid model than in the spectral model. In all spectral models except for Douglas fir, at least one variable from the group of vegetation indices was used as the initial variable combination. Only variables of canopy and vegetation indices were used in the initial variable combination (rank 1) in the hybrid model. Each model used one of those two groups for the initial combination, except for beech, where even two variables from the canopy properties group were used. The number of selected variables did not significantly differ between the spectral and the hybrid models (t-test p-value = 0.48).
3.3. Area-wide mapping
To assess the applicability of the models, successional stages for all forested areas in Rhineland-Palatinate were mapped, allowing for the approximation of a comprehensive spatial cross-validation error and the visual testing of the plausibility of spatial patterns. The tree species groups model reached an accuracy of 0.81. Details of the model and its variable importance are provided in the appendix (see Figure A3 and Table A7). The area-wide map of tree species groups specific successional stages for the entire state of Rhineland-Palatinate using the tree species groups model as well as the hybrid models for the successional stages achieved an overall accuracy of 0.6 on the test data sets. For detailed confusion matrices see Tables A7 and A8 in the appendix.
Figure 7 shows the map of tree species groups specific successional stages for the entire federal state of Rhineland-Palatinate. On this map, general spatial patterns of distributions are visible. In the Southeast (Palatinate forest), pine trees dominate, while in the North (Westerwald) and West (Eifel and Hunsrück), spruce trees are predominantly present. In the areas around the rivers (e.g., Moselle and Rhine), mainly various classes of deciduous trees are found. For a more detailed view visit the digital map at https://envima.github.io/LidarForestModeling/. Artifacts caused by the heterogeneous LiDAR data were not detected.
Figure 8 shows two areas of the area-wide map in more detail, which are located directly at the survey borders of different LiDAR scenes (see Figure 8b) with up to 6 years of temporal difference. Figure 8a shows an area dominated by deciduous species, while Figure 8c illustrates an area where predominantly coniferous forests are located. In the detailed maps of these border areas, no patterns are identifiable that can be attributed to artifacts of the LiDAR data.
4. Discussion
Although there have been attempts to identify either only tree species (Breidenbach et al., Reference Breidenbach, Waser, Debella-Gilo, Schumacher, Rahlf, Hauglin, Puliti and Astrup2021; Hemmerling et al., Reference Hemmerling, Pflugmacher and Hostert2021) or tree species in combination with successional stages (Stoffels et al., Reference Stoffels, Hill, Sachtleber, Mader, Buddenbaum, Stern, Langshausen, Dietz and Ontrup2015) on large-scale recently, the identification of the successional stages of tree species remains a major challenge (Fassnacht et al., Reference Fassnacht, Latifi, Stereńczak, Modzelewska, Lefsky, Waser, Straub and Ghosh2016). In this context, the successional stages of seven distinct tree species groups were modeled using different combinations of input variables and a variable selection approach. The best results were obtained through the combined use of Sentinel-2 and LiDAR data, even though the LiDAR data were of heterogeneous quality. This approach illustrated the potential of incorporating heterogeneous LiDAR data sources in varying quality as typically available from governmental sources for ecological mapping and monitoring.
4.1. Modeling of tree species groups specific successional stages
The results of the study highlight that models of tree species groups specific successional stages benefit from additional structural LiDAR variables regardless of the tree species group. Only in three models, the recall of singular successional stages decreased with the additional use of LiDAR variables, while at the same time the overall model performance of the particular tree species groups specific successional stages model was increased. This confirms that heterogeneous LiDAR data can supplement models based on multispectral satellite data for modeling tree species groups specific successional stages.
Several of the hybrid models predicted the tree species groups specific successional stages with high precision, but there were still limitations. Especially the successional stages model for the tree species group of other deciduous trees showed a rather poor performance. Even though its accuracy increased by 0.1, from 0.33 to 0.43, with the additional LiDAR variables, the performance still seemed not sufficient to use this model for accurate area-wide mapping. Therefore the group of other deciduous tree species was excluded from mapping. One potential factor causing the poor performance was likely to be the highly heterogeneous composition of this class. While polygons were filtered to have at least 80% purity, a large number of different tree species were grouped together in this class (see appendix Table A9). Individual tree species as cherry, birch, or willow were available in the data set with extremely limited amounts of polygons preventing a meaningful modeling of these groups independently. As these species are occurring less frequently in the study area, only enhanced field surveys targeting these species could enable effective modeling of these species. All other classes yielded overall accuracies above 0.6 for the hybrid models and were therefore appropriate for the use of mapping (see the use-case in section 3.3). Larch achieved the best performance with an overall accuracy of 0.78. However, due to limited data availability, larch was one of the two tree species groups where only two successional stages were modeled, which increased the possibility of random correct classification.
An area-wide map of tree species specific successional stages can be used for the identification of the habitat suitability for a certain species (e.g. Felix et al., Reference Felix, Campa, Millenbah, Winterstein and Moritz2004). However, the reliability of such maps is determined by the quality of the remote sensing and especially the forest inventory data as well as by the modeling approach. Forest inventory data of higher spatial resolution, potentially collected at individual tree level, could improve spatial mapping and could overcome limitations of data availability and quality for accurate and transferable models (Yates et al., Reference Yates, Bouchet, Caley, Mengersen, Randin, Parnell, Fielding, Bamford, Ban, Barbosa, Dormann, Elith, Embling, Ervin, Fisher, Gould, Graf, Gregr, Halpin and Sequeira2018). Despite these challenges, mapping successional stages has great potential. In managed forests, the successional stages with the highest biodiversity (early and late successional stages) are often the least represented (Hilmers et al., Reference Hilmers, Friess, Bässler, Heurich, Brandl, Pretzsch, Seidl and Müller2018). Large-scale regional mapping can provide planners with a comprehensive overview of the current state of the forest, that extends beyond the information from the forest inventory data. Such maps of successional stages can be utilized to identify particularly valuable habitats for conservation efforts (Hilmers et al., Reference Hilmers, Friess, Bässler, Heurich, Brandl, Pretzsch, Seidl and Müller2018; Reif et al., Reference Reif, Marhoul and Koptík2013) and guide forest management on which areas have high potential for future biodiversity enhancement and restoration initiatives.
The mapping of tree species groups specific successional stages in this study not only served as an end in itself but will form a baseline for more indirect biodiversity mapping. Specifically, this study is a component of a broader project that incorporated this information for modeling the habitat of endangered forest-dwelling bat species (Bald et al., Reference Bald, Gottwald, Hillen, Adorf and Zeuss2024). This direct application underscores how such readily available but heterogeneous LiDAR data can contribute to nature conservation efforts. The availability of governmental LiDAR data is high in Europe, and the successful utilization of LiDAR data in biodiversity research was proven (Reddy et al., Reference Reddy, Kurian, Srivastava, Singhal, Varghese, Padalia, Ayyappan, Rajashekar, Jha and Rao2021; Toivonen et al., Reference Toivonen, Kangas, Maltamo, Kukkonen and Packalen2023) for various ecological domains. However, the unsystematic accessibility and inherent heterogeneity in acquisition years and pulse densities of large-scale governmental data sets for entire federal states remain challenging and time-consuming. Greater focus must be placed on the preprocessing of data and the adjustment of modeling techniques, which increases workload substantially. Nonetheless, in this study the rather slow development of successional stages in forest ecosystems was investigated and the additional value of heterogeneous LiDAR data was shown. The data set used in this study, illustrates a typical temporal and spatial imbalance of data often faced. Advocating for thoroughly analyzing and, if applicable, using such heterogeneous and “old” data with appropriate training and validation procedures rather than dismissing it prematurely. In order to meet the growing requirements on conservation monitoring, those readily available but highly heterogeneous data should not be neglected (Vanden Borre et al., Reference Vanden Borre, Paelinckx, Mücher, Kooistra, Haest, De Blust and Schmidt2011). These data offer valuable insights into the three-dimensional forest structure, which passive sensors cannot fully substitute, which makes them especially valuable for forest ecosystem monitoring. Although high-resolution LiDAR data that are acquired close to the time of the conducted study, as used by Falkowski et al. (Reference Falkowski, Evans, Martinuzzi, Gessler and Hudak2009), should be preferable and likely yield better results, such options are too cost-intensive for most practical ecosystem monitoring applications. Governmental LiDAR data are more and more freely accessible (at least for scientific projects), and when combined with publicly available Sentinel-2 data, they can provide a valuable and cost-effective data set. Therefore, researchers and practitioners are encouraged to utilize the available heterogeneous data to advance the understanding of ecosystems.
4.2. Change in variable selection
To assess the potential of heterogeneous LiDAR data for modeling of tree species groups specific successional stages, it is of interest how the variable composition changes, when LiDAR variables are available for selection. Figure 6 clearly shows that with the availability of LiDAR variables, the canopy properties became important predictors. For the group of vegetation indices, interpretation is twofold. Except for the spectral model for douglas fir and the hybrid model for beech, at least one variable from the group of vegetation indices was used for every initial variable combination, indicating vegetation indices can facilitate the prediction of successional stages (see Table 3). However, the rather low average rank shows that vegetation indices were also often selected rather late in the FFS, indicating only little improvement of modeling performance. Vegetation indices form a strong modeling base, with different other variables in the spectral model depending on the tree species group. In the hybrid model these fluctuations were uniformly replaced by canopy properties adding to a strong combination of canopy properties and vegetation indices for the prediction of successional stages in all tree species groups. Apart from the initial variable combination, the median ranks of all variables but the canopy properties (rank 1) ranged between 4 and 6 in the hybrid models. In the hybrid model, the variables of the canopy properties also seem to replace the early usage of single Sentinel-2 bands in the spectral model. This means that the combination of structural and optical features form a great baseline for the differentiation between successional stages. The importance of structural information is reasonable as during succession the growth of vegetation is a key component. Therefore, it seems plausible that canopy variables are more crucial. In particular, canopy height (see Figure A4 in the appendix) was often selected as one of the first and, therefore, most important variables.
4.3. Area-wide mapping
According to Holzwarth et al. (Reference Holzwarth, Thonfeld, Abdullahi, Asam, Da Ponte Canova, Gessner, Huth, Kraus, Leutner and Kuenzer2020) and to our state of knowledge, large-scale mapping of tree species groups specific successional stages has so far only been carried out once in Germany by Stoffels et al. (Reference Stoffels, Hill, Sachtleber, Mader, Buddenbaum, Stern, Langshausen, Dietz and Ontrup2015). Here, the tree species groups specific successional stages for the entirety of Rhineland-Palatinate were mapped. The hybrid models, which demonstrated superior performance in modeling successional stages compared to the other models (see Section 3) supported by a preceding tree species groups model were utilized for mapping. The accuracy as derived from the test data was 0.6 for all 16 classes (this excludes the successional stages classes of other deciduous trees) and is therefore comparable with the results of Stoffels et al. (Reference Stoffels, Hill, Sachtleber, Mader, Buddenbaum, Stern, Langshausen, Dietz and Ontrup2015; Accuracy 0.55).
While the direct comparability of results of other studies is limited due to slightly different classes and the spatially independent validation and testing approach, a rough comparison of the magnitudes of their performances is permissible given the similarity. Both models demonstrate similar qualities, with the extended scope in the approach used here of including the larch tree species group into modeling. Notably, both approaches also show confusion mainly among adjacent successional stages. One advantage of the approach of this study is the utilization of free Sentinel-2 data in combination with readily available LiDAR data. However, even though typically LiDAR data are being collected and available almost everywhere across Europe, its documentation and accessibility vary significantly, often requiring case-based inquiries with the relevant authorities to obtain access. Nevertheless, depending on local regulations, the LiDAR data are often freely available for research and monitoring purposes, enabling monitoring regardless of financial capabilities. However, not only the availability and accessibility of LiDAR data are often unsatisfactorily documented, the metadata are also often lacking and incomplete. The LiDAR data used in this study were provided with very little detailed metadata (see Table A3 in the appendix) and information on pulse density, wavelength or footprint were lacking. Despite our efforts to obtain more details by contacting the data providers and the federal state office, metadata were not available to, or known by the authorities. Therefore, we advocate for more standardized documentation practices and improved metadata transparency when LiDAR data are collected and distributed. This would further enhance their broader usability and facilitate greater comparability across studies. The hierarchical modeling approach additionally features flexibility to improve the quality depending on the research question. Especially when interested in specific tree species or tree species groups this approach delivers the possibility to develop or use specialized tree species groups models and add successional stages models. This study also demonstrates that model accuracy of certain tree species groups (beech and spruce) do not significantly benefit from the additional LiDAR data. In cases where studies specifically focus on one of these tree species groups for the modeling of successional stages, potentially, no further benefit can be derived by adding LiDAR data for these species.
Not only the careful quantitative testing but also the visual analysis of the map yielded convincing results. Figure 7 shows general spatial patterns of tree species that align with the actual forest patterns in Rhineland-Palatinate (see Figure 3; PEFC—Arbeitsgruppe Rheinland-Pfalz, 2010). If the heterogeneity of the LiDAR data had posed a problem for the models, this would be expected to be revealed by the observation of rectangular areas mirroring the LiDAR flight scenes across the map. No artifacts are visible on the map, and even at boundaries between LiDAR aerial surveys that are furthest apart in time as shown in Figure 8 do not exhibit any distorting patterns. At the intersection of the 2014, 2017 and 2020 flight campaign boundaries in Figure 8, there is an area classified as beech in the qualification stage spreading across the borders of all three LiDAR scenes, without showing any artificial linear structures that could originate from these abrupt transitions. Nevertheless, we acknowledge that differences in acquisition dates have an impact on the data, due to changes in forest structure over time, such as the transition of trees between successional stages. However, the absence of visible artifacts and distorting patterns at these boundaries suggests that any temporal changes did not greatly affect the model outputs in our study.
This study focused on forests, where changes occur rather slowly compared to other vegetation types. It was demonstrated that even heterogeneous LiDAR data can be valuable for mapping tree species groups specific forest successional stages. However, there might be limits with faster growing vegetation that should be explored further.
5. Conclusion
In the ongoing biodiversity crisis, the monitoring of forests is of high importance. Traditional field-based inventories are not able to provide comprehensive, area-wide coverage of information over large areas due to their cost and labor-intensive nature. Remote sensing is a promising solution to develop efficient area-wide monitoring strategies. However, capturing structural data, such as LiDAR information for modeling of successional stages, necessitates expensive flight campaigns to acquire current high-resolution data. Such resources are often unavailable for regional monitoring purposes or minor nature conservation projects. In Germany, federal states commonly conduct smaller LiDAR flight campaigns, covering the same region approximately every 5 to 10 years. Consequently, this results in heterogeneous data sets that are often viewed with skepticism regarding their utility, leading to their exclusion from modeling efforts. The present study reveals that these highly heterogeneous LiDAR data improved the modeling of tree species groups specific successional stages considerably. Therefore, it can be concluded that the potential of LiDAR data should not be underestimated and at least a thorough analysis of their potential benefit for ecological studies should be conducted. The effort of adapting preprocessing and modeling can lead to improved results that can be valuable to nature conservation approaches. It is expected that more recent and higher-quality LiDAR data would improve model results further, however, such instances are rarely encountered in reality for large-scale studies. Improving the available LiDAR data was not within the scope of this study but it might be possible that utilizing current data sources like GEDI (Global Ecosystem Dynamics Investigation), could potentially optimize the use of heterogeneous LiDAR data in the future. During this study, it was found that even heterogeneous LiDAR data were evidently helpful for modeling tree species groups specific successional stages and should not be neglected. While public authorities collect LiDAR data almost everywhere in Germany and also other European countries, the direct availability and documentation are highly heterogeneous, incomplete, and disorganized. Therefore, we advocate for relevant authorities to make the data more accessible or at least visible in a structured manner and provide comprehensive metadata whenever data are made publicly accessible. As this study showed, this could provide a valuable contribution to ecosystem research and, subsequently, to the preservation of forest ecosystem services.
Acknowledgments
We would like to acknowledge the assistance of the AI language model ChatGPT, based on GPT-3.5, developed by OpenAI, which assisted in enhancing English writing and grammar checking. ChatGPT can be assessed via https://chat.openai.com/.
Author contribution
Conceptualization: L.B.(Equal); A.Z.(Equal); J.G.(Equal). Supervision: N.F.(Lead). Funding acquisition: J.G.(Lead). Project administration: J.G.(Lead). Resources: J.G.(Lead). Data curation: L.B.(Lead). Formal analysis: L.B.(Equal); A.Z.(Equal). Investigation: L.B. (Equal); A.Z.(Equal). Methodology: L.B. (Equal); A.Z.(Equal); J.G.(Equal); M.L.(Equal); H.M.(Equal). Software: L.B.(Lead); M.L.(Supporting); H.M.(Supporting); S.W.(Supporting). Validation: L.B.(Lead); A.Z.(Supporting). Visualization: L.B.(Lead); A.Z.(Supporting). Writing original draft: L.B.(Equal); A.Z.(Equal). Writing—review & editing: L.B.(Equal); A.Z.(Equal); J.G.(Equal); T.K.(Equal); M.L.(Equal); H.M.(Equal); D.Z.(Equal); N.F.(Equal). All authors approved the final submitted draft.
Competing interest
The authors have no conflict of interest to declare.
Data availability statement
R scripts used for this study are available under a GPL 3.0 license as Git repository at github.com. A release of the Git repository to reproduce the results of the study is available at https://github.com/envima/LidarForestModeling accessed on June 25, 2024. To reproduce the study, information on the data and sample datasets is available at the Open Science Framework (OSF): https://doi.org/10.17605/OSF.IO/CEK5J. The Sentinel-2 and LiDAR datasets used in this study are freely available; however, due to their large size (totaling over 2 TB), only the workflow to reproduce these datasets and small sample datasets are provided on OSF. The forest inventory data were made available to us by “Landesforsten Rheinland-Pfalz” specifically for this study and are not publicly available. Consequently, a dummy dataset, structured identically to the original dataset, is provided in the repository. Additionally, this study utilized the border of Rhineland-Palatinate to crop the data to the study area and a forest mask processed from Copernicus high-resolution forest type data. Both datasets are freely available for download and are provided on OSF.
Ethics statement
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Funding statement
These tree species groups and successional stages models are part of the project “Development of forest structure-based habitat models for forest bats” funded by the Rhineland-Palatinate State Office for the Environment (Landesamt für Umwelt Rheinland-Pfalz). The LiDAR data and forestry data were also provided as part of this project. The development of improved methods for remote sensing based classification of forests was conducted within the Natur 4.0 project funded by the Hessian state offensive for the development of scientific-economic excellence (LOEWE).
A. Appendix
Comments
November 22, 2023
Claire Monteleoni
Editor-in-Chief
Environmental Data Science
Dear Editor,
We are excited to submit our application paper, “Leveraging readily available heterogeneous LiDAR data to enhance modeling of successional stages at tree species level in temperate forests”, for consideration for publication in Environmental Data Science.
The paper presents our findings on the utilization of heterogeneous LiDAR data to model and map tree species specific forest successional stages on a large regional scale. Our research addresses an underutilized area that involves the wealth of available but often overlooked heterogeneous data, most often collected by government administrations, which, despite its potential, remains largely untapped in the realm of ecological modeling. By exploring the integration of heterogeneous LiDAR data alongside multi-spectral optical satellite data, we reveal significant improvements in performance accuracy on spatially independent test data. We believe our work will be of interest to the readers of Environmental Data Science due to the beneficial utilization of a heterogeneous dataset, which is of interest for interdisciplinary researchers using LiDAR data in ecological modeling and nature conservation. The study’s insights into harnessing readily available but heterogeneous datasets for ecological modeling could inspire a broader understanding of employing diverse data for conservation strategies in the community.
We confirm that this manuscript is original and has not been published elsewhere in part or in entirety and is not under consideration by another journal. We do not have any conflicts of interest to declare.
Sincerely,
and on behalf of all authors,
Lisa Bald
Department of Geography, Environmental Informatics, Philipps-University Marburg, Deutschhausstraße 12, 35032 Marburg, Germany
Phone: +49 6421 28-25323
Email: [email protected]