Introduction
Identification of weed species and their growth stages is critical for devising effective weed management strategies (Rydahl Reference Rydahl2003; Teimouri et al. Reference Teimouri, Dyrmann, Nielsen, Mathiassen, Somerville and Jørgensen2018). Moreover, managing weeds during their early growth stages is essential for efficient weed control and sustainable agricultural productivity (Hussain et al. Reference Hussain, Abideen, Danish, Asghar and Iqbal2021). Early-stage weeds are more easily removed through physical and chemical means, reducing herbicide usage (Espejo-Garcia et al. Reference Espejo-Garcia, Mylonas, Athanasakos, Fountas and Vasilakoglou2020), costs, and time and labor requirements. However, identifying weeds at early growth stages in crop fields is challenging due to their small size and differences in shape compared with mature plants.
To enhance weed detection, the application of image recognition employing convolutional neural networks (CNNs) is on the rise (Coleman et al. Reference Coleman, Bender, Hu, Sharpe, Schumann, Wang, Bagavathiannan, Boyd and Walsh2022; Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021; Rai et al. Reference Rai, Zhang, Ram, Schumacher, Yellavajjala, Bajwa and Sun2023). Automating the process of finding weeds (Lottes et al. Reference Lottes, Behley, Chebrolu, Milioto and Stachniss2018; Sujaritha et al. Reference Sujaritha, Annadurai, Satheeshkumar, Kowshik Sharan and Mahesh2017) and mapping their distributions (Huang et al. Reference Huang, Deng, Lan, Yang, Deng and Zhang2018; Partel et al. Reference Partel, Charan Kakarla and Ampatzidis2019) is expected to contribute to the facilitation of site-specific weed management (SSWM; Barnhart et al. Reference Barnhart, Lancaster, Goodin, Spotanski and Dille2022; Wang et al. Reference Wang, Zhang and Wei2019). Despite the accumulation of case studies for practical applications, growth stages have been identified as complicating factors in weed recognition (Coleman et al. Reference Coleman, Bender, Hu, Sharpe, Schumann, Wang, Bagavathiannan, Boyd and Walsh2022), and their effects on the performance of CNN algorithms remain poorly understand (Coleman et al. Reference Coleman, Bender, Hu, Sharpe, Schumann, Wang, Bagavathiannan, Boyd and Walsh2022; Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021; Wang et al. Reference Wang, Zhang and Wei2019). Particularly at the early growth stage, there is a possibility that the accuracy of weed recognition is unstable or decreases because the reflectance characteristics of crops and weeds are generally similar (López-Granados Reference López-Granados2011; Wang et al. Reference Wang, Zhang and Wei2019), and the shapes of dicotyledons tend to change markedly during seedling development. Given that cotyledons and true leaves often have different shapes in many species, even humans may struggle to recognize them as the same species without proper knowledge. Teimouri et al. (Reference Teimouri, Dyrmann, Nielsen, Mathiassen, Somerville and Jørgensen2018) demonstrated that leaf numbers may be useful for estimating early growth stages with a classification algorithm; however, accuracy tended to vary among stages and species. To apply image recognition to weed management, understanding how changes in plant shape during the early growth stage influence accuracy is essential.
In this study, we address how the change in plant shape during the early growth stage should be incorporated into image recognition training. To address these challenges, we focused on four weed species with different cotyledon and true leaf shapes: giant ragweed (Ambrosia trifida L.), red morningglory (Ipomoea coccinea L.), pitted morningglory (Ipomoea lacunosa L.), and burcucumber (Sicyos angulatus L.). Each of these species is a major noxious annual in crop fields globally, causing widespread yield loss (Grey and Raymler Reference Grey and Raymler2002; Kurokawa et al. Reference Kurokawa, Hajika and Shibuya2015; Lee and Son Reference Lee and Son2022; Norsworthy and Oliveira Reference Norsworthy and Oliveira2007; Regnier et al. Reference Regnier, Harrison, Loux, Holloman, Venkatesh, Diekmann, Taylor, Ford, Stoltenberg, Hartzler, Davis, Schutte, Cardina, Mahoney and Johnson2016; Savić et al. Reference Savić, Oveisi, Božić, Pavlović, Saulić, Schärer and Vrbničanin2021; Smeda and Weller Reference Smeda and Weller2001). In Japan, these species pose a threat to soybeans [Glycine max (L.) Merr.] and/or feed grains as invasive alien species (Kurokawa Reference Kurokawa2017). Notably, S. angulatus is designated as a “species to be managed urgently” in “The List of Alien Species That May Have Adverse Effects on Ecosystems in Japan” (Ministry of the Environment, Ministry of Agriculture, Forestry and Fisheries 2015), with its cultivation being banned without permission. By comparing models that use different class patterns in training datasets, we illustrate how to maintain the robustness of recognition accuracy amid temporal morphological changes during the early growth stage.
Materials and Methods
Target Species and Image Acquisition for Training
Leaf shapes of the target species (A. trifida, I. coccinea, I. lacunosa, and S. angulatus) exhibit both similarities and differences (Figure 1). Cotyledons in A. trifida and S. angulatus are round, whereas those in I. coccinea and I. lacunosa are V-shaped. True leaves of I. coccinea, I. lacunosa, and S. angulatus are roughly heart-shaped and alternate, whereas those of A. trifida are deltate or palmately 2- to 5-lobed and opposite.
Training images were captured in situ using 10 device makers’ digital cameras and smartphones in Japan (35.8°N to 37.7°N, 137.9°E to 140.5°E) during April to October in 2019, 2021, and 2022. Images were taken from 5 to 100 cm directly above plants, irrespective of weather and light conditions. These images were categorized into three growth stages (Figure 1): Stage 1, target weeds with only cotyledons; Stage 2, cotyledons and one or two true leaves; and Stage 3, only true leaves. Although some images included non-target plants and/or multiple target plants, each image could be categorized into one stage of one target species. For each stage of each species, 350 images were acquired (350 × 3 stages × 4 species = 4,200 images in total). These images were randomly divided in an 8:1:1 ratio for training, validation, and testing (280, 35, and 35 images/stage/species), respectively. The test data were used for Evaluation 1 (Figure 2). The image area containing the target species was annotated using an open-source tool LabelImg (Tzutalin 2015) for detection models.
Training Datasets
To assess the impact of varying training for growth stages on weed recognition success, we prepared four dataset patterns with different class definitions (Figure 1): (A) one class/species, treating Stages 1, 2, and 3 as a single class; (B) two classes/species, treating Stages 2 and 3 as a class; (C) two classes/species, treating Stages 1 and 2 as a class; and (D) three classes/species, treating each stage as a different class.
To ensure the robustness of the effects among A, B, C, and D, we prepared 16 models using two image recognition algorithms and a couple of conditions regarding the number of species treated (four and two species) (Figure 2). In the two-species models, we selected A. trifida and I. coccinea, which exhibit unsimilar leaf shapes throughout all stages.
Model Training
Two prominent open-source algorithms, namely the object detection algorithm You Look Only Once (YOLO) v5 (Jocher Reference Jocher2020) and the classification algorithm Visual Geometry Group (VGG) 19 (PyTorch tutorials: https://pytorch.org/tutorials, accessed: January 23, 2024), were applied to eight training datasets (4 class patterns × 2 species number conditions) within the Python 3.7.16 environment using PyTorch 1.13.1. Both YOLO and VGG are often used for weed recognition studies (Hasan et al. Reference Hasan, Sohel, Diepeveen, Laga and Jones2021; Rai et al. Reference Rai, Zhang, Ram, Schumacher, Yellavajjala, Bajwa and Sun2023). Regarding YOLOv5, a YOLOv5s architecture was trained with pretrained weights (yolov5s.pt) of the COCO network (Lin et al. Reference Lin, Maire, Belongie, Bourdev, Girshick, Hays, Perona, Ramanan, Zitnick and Dollár2014), employing default hyperparameter settings (Jocher Reference Jocher2020). Each model underwent training for up to 200 epochs with 32 images per batch. Regarding VGG19, models were trained for up to 100 epochs with a batch size of 30, using an optimizer with a momentum of 0.9 and a learning rate of 0.001. Proper learning convergence was confirmed for both YOLOv5 and VGG19, and the best weights from each training session were used for subsequent analysis. To evaluate model accuracy for each species, average precision (AP) at 0.5 threshold for intersection over union and classification accuracies were calculated for YOLOv5 and VGG19, respectively, using the abovementioned test dataset and the best weight (Evaluation 1). All training and evaluation were conducted on the NARO AI research supercomputer Shiho, equipped with NVIDIA Tesla V100 SXM2 GPU 32GB (NVIDIA, CA, USA).
Collection and Evaluation of Time-Series Images
To evaluate the accuracy of each model in capturing temporal morphological changes during the early growth stage (Evaluation 2; Figure 2), we collected time-series images. We sowed and cultivated 16 individuals of 3 target weeds, namely A. trifida, I. coccinea, and I. lacunosa, with S. angulatus excluded, as its cultivation is not permitted in Japan, at the experimental garden at NARO (Tsukuba, 36.03°N, 140.10°E) during May to June 2023. Images were captured from 30 to 50 cm directly above the plants 2 to 5 d wk−1. To minimize differences in sunlight condition, we used a sunshade during shooting. We recorded the number of true leaves for each plant, with zero true leaves corresponding to Stage 1 and one or two true leaves corresponding to Stage 2. To set replications for each date and plant and ensure data independence from training data, four different smartphones (Apple iPhone SE [Apple, CA, USA], SHARP A103SH [SHARP, Osaka, Japan], FCNT F-41B [FCNT, Kanagawa, Japan], and Samsung SC-56B [Samsung, Gyeonggi-do, Korea]) not used for training data collection were employed. As images taken with the autofocus of three smartphones, excluding the Apple iPhone SE, sometimes exhibited blown-out highlights depending on weather conditions, we set their exposure values to a minimum to avoid this. In total, 3,652 images (14 to 27, 8 to 25, and 14 to 22 per plant per camera for A. trifida, I. coccinea, and I. lacunosa, respectively) were collected.
All images were subjected to inference using YOLOv5 and VGG19 models (Figure 2). Four-species models were applied to all images, and two-species models were applied to images of A. trifida and I. coccinea. To evaluate changes in accuracy along with growth, we assessed the recognition success of each image rather than using comprehensive indices such as mean AP (mAP) and accuracy. In YOLOv5-based detection, inference results with a confidence threshold of 0.5 were defined as “TRUE” (detected only correct species), “FALSE” (results including incorrect species and/or locations), and “NONDETECT.” For VGG19-based classification, inference results were defined as TRUE or FALSE as a result was classified to one species per image. When a target’s shape displayed intermediate stages, that is, between Stages 1 and 2 or between 2 and 3, both stages (Stages 1 and 2 or 2 and 3) were considered correct. As VGG19 is a classification algorithm and does not indicate where models focused on location in an image, we created heat maps illustrating visual explanations of classification through VGG19 using gradient-weighted class activation mapping (GradCAM; Gildenblat Reference Gildenblat2021; Selvaraju et al. Reference Selvaraju, Cogswell, Das, Vedantam, Parikh and Batra2020).
We assessed whether the number of images assigned to TRUE was affected by patterns A, B, C, and D using a generalized linear mixed model with a binomial distribution, employing the glmer function from the lme4 package in R v. 4.3.0 (R Core Team 2023). With four replications for each date and plant, the maximum number of TRUE for each date and plant was four. The number of TRUE per four replications was considered the response variable, whereas patterns A, B, C, and D were treated as the explanatory variables, and the plant individual was regarded as the random effect. Separate analyses were conducted for each species, that is, A. trifida, I. coccinea, and I. lacunosa.
Results and Discussion
Comparison of Model Accuracy
Regarding the results of Evaluation 1, the mAP and mean accuracy of pattern A, treating all stages as a single class, surpassed those of the other patterns in most species under both YOLOv5 and VGG19 (Figure 3). Although the AP and accuracy of some classes in B, C, and D were marginally higher than those in A, certain instances in C and D (e.g., I. lacunosa in the four-species models under YOLOv5 and VGG19) exhibited a decrease of more than 0.1. When comparing the four-species and two-species models were compared, the AP values of A. trifida remained nearly constant under YOLOv5 across patterns A to D. Conversely, the accuracy in two-species models under VGG19 was slightly higher than that in four-species models. These findings suggest that because lower species number in a model contributes to higher accuracy in classification, such as in VGG19, it is necessary to narrow down the number of targets as appropriate.
Regarding four-species models, the AP and accuracy of I. lacunosa tended to be lower than those of the other species. Images of I. lacunosa were occasionally misrecognized as I. coccinea under both YOLOv5 and VGG19. Moreover, images categorized as Stage 3 were sometimes incorrectly detected as nontargets under YOLOv5. During Stage 3, I. coccinea typically exhibits more leaf-angle variation compared with the other three species; therefore, capturing its features may be challenging. However, the AP and accuracy of models with patterns A and B, integrating Stage 3 of I. coccinea with Stage 1 and/or 2, surpassed those with patterns C and D. This implies that growth stages influence recognition success, although integrating different stages could maintain higher accuracy. Despite the same class number between B and C, the accuracy in B was higher than that in C, suggesting that the combination of integrating growth stages is important to improve accuracy.
Overall, these Evaluation 1 results suggest that integrating growth stage classes for each species, rather than separating classes by growth stages, could contribute to maintaining higher accuracy in both detection and classification. As it is possible that not only class number but also different combinations integrating growth stage classes influence accuracy, optimization is necessary to compare accuracy among these combinations.
Image Recognition Robustness against Temporal Change
In Evaluation 2 results for images depicting temporal morphological changes in A. trifida, I. coccinea, and I. lacunosa, variations in recognition success emerged among species, class patterns, algorithms, and growth stages (Figures 4 and 5). Under YOLOv5, the detection success rate (TRUE detection rate) for A. trifida and I. coccinea in pattern A tended to surpass that in patterns B, C, and D (Figure 4; Supplementary Table S1). In particular, regarding the four-species model with patterns C and D of I. coccinea when the number of true leaves exceeded two (four or more leaves), detection failure increased: true leaves of I. coccinea were occasionally misidentified as those of I. lacunosa or S. angulatus. It is possible that these models could not distinguish shapes and arrangements of true leaves among the three species well because of their similarities of the shapes and arrangement (Figure 1). A comparison of the location of the output bounding box between detection success in patterns A and B and misrecognition as S. angulatus in C and D revealed no clear differences (Supplementary Figure S1). This suggests that detectors focused on similar parts in the image as the features of each species. Stage 3 (only true leaves) was treated as a single class in both patterns C and D. Treating similar shapes of different species as a class and training them with the same model may increase the risk of misrecognition. One way to avoid such issues is limiting targets, as observed in two-species models. For example, in two-species models excluding I. lacunosa and S. angulatus, misrecognition in patterns C and D did not occur, leading to increased detection success for true leaves of I. coccinea (Figure 4). However, misrecognition may occur in two-species models when nontarget species resembling the target species are apparent. Our results suggest the potential of another approach, wherein merging different growth stages with different shapes as a class, as observed in patterns A and B, can effectively prevent misrecognition (Figure 4). Although cotyledons and true leaves have distinct shapes in this study’s target species, their integration is expected to contribute to maintaining stable recognition success at the species level during the early growth period.
Rates of NONDETECT tended to be higher in both four- and two-species models when leaf numbers were 2.5 to 4 (Figure 4). A leaf number of 2.5 indicates that true leaves were half-grown, whereas 3 or 4 corresponds to Stage 2 (cotyledons and one or two true leaves). These timings corresponded with changes in leaf shapes from cotyledons to true leaves. The timing of leaf shape changes may have increased training difficulty, although the number of training images categorized into Stage 2 was equal to that in Stages 1 and 3.
Results of recognition success under VGG19 did not exhibit a common pattern among target species (Figure 5; Supplementary Table S1). Rates of TRUE classification in two-species models were higher than those in four-species models. In four-species models, A. trifida was prone to misclassification as I. coccinea or I. lacunosa, with I. coccinea often misclassified as I. lacunosa and vice versa. GradCAM-generated heat maps illustrating classification success under VGG19 tended to focus on the whole plant shape or around the plant (Figure 6). However, heat maps of some images did not focus on target plants, despite successful classification. Thus, over-learning may have occurred. Common patterns and biases in such heat maps were not discernible.
The Evaluation 2 results indicate that recognition accuracy was unstable during the early growth stage when leaf shapes changed temporally. However, they also suggest that integrating classes per species has the potential to increase accuracy, as observed for I. coccinea in four-species models under YOLOv5. For distinguishing among weed species with similar shapes and facilitate practical SSWM, merging different growth stages with different shapes as a single class, as demonstrated in patterns A and B, is effective.
When developing identifiers for specific plant species, determining how to train the temporal change in plant shape is a primary challenge. The present study reveals that integrating different shapes within a plant species as a single class is effective for maintaining robust recognition success during the early growth stage. This finding is expected to contribute not only to the early detection of weed seedlings but also to the robustness of general plant species identifications. Our study also highlights the difficulty of identifying multiple species and each growth stage simultaneously. As both pieces of information are essential for optimizing weed management and reducing herbicide use, solving this problem will enhance the application of image recognition technology to weed management. Although this issue is challenging, further technological improvements and the accumulation of training images are anticipated to address it in the future.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/wsc.2024.63
Acknowledgments
The authors thank T. Takayama for his support in setting up model environments; Y. Akamatsu, M. Asai, H. Asami, M. Fukuda, N. Ihara, S. Jingu, T. Kanno, Y. Kowata, H. Ohdan, N. Ueda, K. Sasaki, O. Watanabe, and N. Yoshino for their help in data collection; K. Ikeda for her help in annotation; and A. Kozaki for her support in cultivation. This research used the SHIHO supercomputer at NARO to train deep neural network models.
Funding statement
This study was supported by Research Project for Technologies to Strengthen the International Competitiveness of Japan’s Agriculture and Food Industry.
Competing interests
The authors declare that there are no conflicts of interest.