Introduction
The soil weed seedbank is a repository of viable weed seeds within the soil profile that usually encompasses a diversity of weed species at various depths and densities (Teo-Sherrel et al. Reference Teo-Sherrel, Mortensen and Keaton1996). The weed seedbank is composed of recently shed seeds, as well as older seeds dating back several years (Buhler et al. Reference Buhler, Hartzler and Forcella1997; Mahé et al. Reference Mahé, Cordeau, Bohan, Derrouch, Dessaint, Millot and Chauvel2020). Common practices such as tilling can effectively shuffle the soil seedbank, depositing recently added seeds deeper into the soil profile, while at the same time lifting to the surface older, dormant seeds, that may have been there for many years (Feledyn-Szewcyzk et al. Reference Feledyn-Szewcyzk, Smagacz, Kwiatkowski, Harasim and Woźniak2020). The presence of viable seeds in the soil can vary greatly; it is difficult and time-consuming to quantify and remains a significant challenge for both researchers and growers (Buhler et al. Reference Buhler, Hartzler and Forcella1997; Mahé et al. Reference Mahé, Cordeau, Bohan, Derrouch, Dessaint, Millot and Chauvel2020). Understanding the composition of the weed seedbank could be a critical component in management decisions if such information could be timely and accurately collected (Ambrosio et al. Reference Ambrosio, Iglesias, Marin and Del Monte2004; Creech et al. Reference Creech, Westphal, Ferris, Faghihi, Vyn, Santini and Johnson2008; Luo et al. Reference Luo, Zhao, Gu, Zhang, Qiao, Tain and Han2021; Mahé et al. Reference Mahé, Cordeau, Bohan, Derrouch, Dessaint, Millot and Chauvel2020).
Identification of weed seeds at the species level presents challenges due to seed size and the often-inconspicuous appearance of seeds. Some species may require the germination of seeds or the use of molecular markers for accurate identification (Hussain et al. Reference Hussain, Ali, Tahir, Shah, Ahmed, Sarwar and Latif2017). The challenge is further complicated by the presence of large volumes of diverse seeds that may be present in the soil seedbank – therefore, a single sample often consumes considerable processing time and requires specialized experience (Karlík and Poschlod Reference Karlík and Poschlod2014; Morgensen et al. Reference Morgensen, Allen and Jeffery2005).
Developing an accurate and efficient identification method is essential for overcoming the challenges associated with weed seed identification and seedbank quantification. Automated solutions that can handle diverse weed species with complex morphology and structures across different regions and environmental conditions would substantially increase the throughput, facilitating the processing of a far greater number of samples. The incorporation of advanced technologies such as the use of artificial intelligence for object detection in agriculture has opened avenues for addressing challenges more efficiently (Eli-Chukwu Reference Eli-Chukwu2019; Xu et al. Reference Xu, Yang, Zeng, Zhang, Zhang and Tan2021). Object detection models offer a valuable alternative to traditional methods such as germination assays or manual seed counting for the task of quantifying seedbanks.
Object detection with machine learning is increasingly used in crop research due to its ability to learn distinct features for precise identification and localization of objects in an image or video frame (Wosner et al. Reference Wosner, Farjon and Bar-Hillel2021; Wu et al. Reference Wu, Zhang, Na, Mi, Zhu, He and Zhang2019; Zhang et al. Reference Zhang, Li and Zhang2022). Such models have been successfully deployed to detect and identify diseases, pests, and weeds and for crop monitoring and growth analysis (Alfonso et al. Reference Alfonso, Fonteijn, Fiorentin, Lensink, Mooij, Faber, Polder and Wehrens2020; Khalid et al. Reference Khalid, Oqaibi, Aqib and Hafeez2023; Pérez-Porras et al. Reference Pérez-Porras, Torres-Sánchez, López-Granados and Mesas-Carrascosa2023; Verma et al. Reference Verma, Tripathi, Singh, Ojha and Saxena2021; Yao et al. Reference Yao, Zheng, Zhou and Zhang2024). Object detection models use convolutional neural networks (CNNs), a class of deep-learning algorithms, to detect various objects in images. CNN architecture, inspired by the human brain, works by employing convolutional layers to learn the hierarchical features of input images (Alzubaidi et al. Reference Alzubaidi, Zhang, Humaidi, Al-Dujaili, Duan, Al-Shamma, Santamaría, Fadhel, Al-Amidie and Laith Farhan2021). The series of layers in a CNN use kernels or filters to scan the input data to detect features of an image (Alzubaidi et al. Reference Alzubaidi, Zhang, Humaidi, Al-Dujaili, Duan, Al-Shamma, Santamaría, Fadhel, Al-Amidie and Laith Farhan2021). Image classification models, such as GoogLeNet and AlexNet, have been deployed to identify seeds with high accuracy; however, these image classification methods were based on either a single seed or single species at a time (Gulzar et al. Reference Gulzar, Hamid, Soomro, Alwan and Journaux2020; Luo et al. Reference Luo, Zhao, Gu, Zhang, Qiao, Tain and Han2021).
You Only Look Once (YOLO), a popular object detection model, incorporates CNNs to predict bounding boxes and class probabilities all in one forward pass, facilitating rapid and efficient detection with high accuracy (Jocher et al. Reference Jocher, Chaurasia and Qiu2023). Several versions of YOLO have been released, with each model building upon addressing the limitations of its predecessor (Jiang et al. Reference Jiang, Ergu, Liu Fangyao and Ma2022). Different object detection models can be evaluated for their accuracy of prediction using mean average precision (mAP). mAP facilitates an evaluation of the accuracy and recall, offering the average precision (AP) of the method (Zhao et al. Reference Zhao, Zheng, Xu and Wu2019). The most recent version, YOLOv8, demonstrates an improved mAP of 53.9% on the Microsoft Common Objects in Context dataset (Lin et al. Reference Lin, Maire, Belongie, Hays, Perona, Ramanan and Dollár Zitnick2014), a widely used benchmark dataset in computer vision research, surpassing the AP of YOLOv5, which achieved 50.7% (Lee and You Reference Lee and You2024). Recently, YOLO models have been highly successful in identifying weed plants within cornfields (Hasan et al. Reference Hasan, Diepeveen, Laga, Jones and Sohel2024).
In this study, we employ a YOLOv8 model to streamline and accelerate the identification process of weed seeds. The objective of this study is to assess the efficacy of a deep-learning approach in accurately detecting weed seeds, thereby establishing a robust identification workflow. As proof of concept, we conducted comprehensive training and testing of the model using seed data from 19 distinct weed species, ensuring thorough validation of its performance across diverse samples. Through our research, we aim to develop a solution that allows for efficient assessments of large weed seedbank samples, significantly reducing the time required compared with conventional methods.
Material and Methods
Dataset Preparation
The red-green-blue (RGB) images for the weed seedbank dataset were obtained using an IVESTA3 digital microscope (Leica Microsystems, 10 Parkway N, Suite 300, Deerfield, IL 60015) with an external light source. The microscope was stationed on a stable platform with a constant working distance for all seed types. The image processing was optimized by using 0.61× object lens for precise focus. A total of 485 images were captured at a resolution of 4,000 by 3,000 with a pixel size of 1.55 μm by 1.55 μm using a high-resolution camera integrated in the microscope body. The image dataset included 19 weed species, that are common in eastern Washington, elutriated from soil samples from field experiments conducted in the region (Table 1). All the seeds analyzed in these experiments were obtained and cleaned from elutriated samples and weed seed collection; thus, no seeds were added to the soil. By imaging real samples collected in-field, this project demonstrates the direct utility of the methods presented here for finding and identifying seeds from weeds in realistic datasets. The dataset included weed seeds of different sizes, shapes, and colors, accounting for the wide spectrum of weed species encountered in eastern Washington dryland cropping systems. Images present in the dataset were separated into two categories. The first category consisted of images containing seeds of single weed species to carry out a focused training for individual species, while the images in the second category featured the coexistence of multiple weed species within single frame to train the model to identify and differentiate various weed species together.
Data Labeling
The images in this project were labeled manually using labelImg (Tzuta Reference Tzuta2015) to create ground-truth data. The approach facilitated identification of weed seed in images by drawing bounding boxes around the individual seeds in the image. The output of the annotation was saved a text format. The output generated from this exercise represents annotated information for a specific bounding box. The format includes class ID indicating the numeric identifier for the specific weed species and bounding box coordinates (x_center, y_center, width, height), providing the position and dimensions of the bounding box relative to the image dimensions (Table 2; Figure 1). The annotated dataset generated using labelImg was crucial in teaching the model to effectively identify and classify different weed species.
a Henbit (Lamium spp.).
b Pigweed (Amaranthus spp.).
Model Training and Validation
The annotated dataset generated through the labeling process in the previous step was utilized to train the YOLOv8 model. This dataset, consisting of images of various weed instances and their corresponding bounding box annotations, served as the training ground truth. To facilitate the training and cross-validation process, the annotated dataset was split into two subsets: a training and validation set, with 80% of data allocated to the training set. The remaining 20% of data was allocated to the validation set to assess model performance in a 20% hold-out cross-validation scheme. The validation set of the hold-out data scheme is data set aside and not used for training the model, so the model can be assessed on images that were not used to train it, ensuring that the reported accuracy of the model is not due to overfitting. The training process was set to 100 epochs with an image size of 640 pixels (width and height). YOLOv8 utilizes an image size of 640 by 640 pixels to optimize computational efficiency and speed during object detection. The fixed input size enables faster processing and inference times compared with larger image sizes. However, the reduction in resolution may affect the model’s ability to accurately detect and localize objects, particularly smaller ones or those requiring finer details. During the training process, YOLOv8 leveraged a transfer learning strategy by initializing its weights with pretrained values from Common Objects in Context (Lin et al. Reference Lin, Maire, Belongie, Hays, Perona, Ramanan and Dollár Zitnick2014), which contains 200,000 images for object detection. Transfer learning, also known as weight transfer, is a technique commonly used in deep learning, where knowledge gained from training on one task or dataset is transferred to another related task or dataset. Abstract data features learned by CNNs for classification problems often can be applied to other classification problems, but learning these abstract features can initially require hundreds of thousands of training images. Thus, using pretrained networks can lead to CNNs with high accuracy without requiring hundreds of thousands of training images. Transfer learning will be essential for weed scientists to use for deploying deep-learning models, because weed scientists will seldom have hundreds of thousands of images for training models, but using previous model weights classification models can be developed using much smaller image collections. The training process used here involved further fine-tuning of these weights using the weed seed dataset. The model was trained to minimize focal loss function for the bounding boxes, and the stochastic gradient descent loss function was used for the classification, thus refining the model’s ability to differentiate between weed species.
Performance Metrics
The model performance was assessed by using precision, recall, F1 score, and the mAP. Precision is calculated as the ratio of true positive predictions to the total number of actual positives (Equation 1):
Recall represents the ability of model to capture all positive instances. It is calculated as the ratio of true positive predictions to the total number of actual positives (Equation 2:
F1 score provides a measure of model performance on both precision and recall by calculating the harmonic mean of precision and recall (Equation 3):
mAP compares detected box with the ground-truth bounding box to return a score. A higher mAP value denotes the accuracy of the model detection. The first step in the mAP calculation requires the estimation of AP of each class. The mean of these APs for all classes is used to produce mAP. The equation to calculate mAP is (Equation 4):
These performance metrics play a crucial role in evaluating the effectiveness of the model in detecting and classifying objects, in this case, weed seeds. Precision measures the proportion of correctly identified positive cases out of all cases identified as positive, providing insight into the ability of model to avoid false positives. Recall, on the other hand, assesses the model capability to capture all positive instances, indicating its sensitivity to true positives. The F1 score balances precision and recall, providing a single metric to gauge overall model performance.
AP and mAP extend this assessment to object detection tasks by evaluating the accuracy of bounding box predictions. AP calculates the precision–recall curve for each class, while mAP aggregates AP values across all classes to provide a comprehensive measure of detection accuracy. Together, these metrics offer a detailed understanding of the model’s strengths and weaknesses, enabling researchers to fine-tune algorithms for improved performance.
Additionally, a normalized confusion matrix was used to examine the levels of misclassification between species. A confusion matrix is a matrix with output (predicted) classes along the y axis and the target (true) classes along the x axis, with each cell containing the number of misclassifications between classes. The confusion matrix was then normalized by dividing the value in each column by the total number of incidences of the class for which the column represents the true values. In our study, the normalized confusion matrix is used to examine patterns of misclassification between the seeds of weed species.
Results and Discussion
Image Acquisition
Images obtained from the microscope played a pivotal role in enhancing the accuracy of seed labeling, particularly for small seeds. The microscopic images provided clear and discernible details, enabling the differentiation of even small and dark-colored seeds. Importantly, the use of a microscope ensured that images were sufficiently large during labeling, eliminating the need to zoom in to locate specific seed types and draw labels around them. The aspect is critical, as zooming in during manual labeling may result in inaccuracies that can affect the performance of the detection model during inference. Therefore, the availability of high-quality images from the microscope not only facilitated straightforward labeling but also contributed to ensuring the robustness and accuracy of the subsequent detection process. It is also important to recognize that YOLOv8 typically resizes input images to a smaller size, usually 640 by 640 pixels. Resizing can lead to a loss of information, especially for small objects in images. In some cases, the downsized images may make small objects undetectable to the model, presenting a challenge for accurate detection. Images from a microscope for small weed seeds ensure that they appear larger, so that when the model resizes them, they remain sufficiently detailed for detection.
Species Classification Metrics
The performance of the YOLOv8 model was evaluated through classification matrices, precision recall curve, precision, recall, and mAP. The precision recall curve and area under the precision recall curve (AUC) for all classes of weed species in the dataset is used to assess weed seed identification performance, where a higher AUC indicates greater performance (Figure 2). We observed variability in the AUC values across weed species, indicating varying levels of classification performance. For instance, cheatgrass (Bromus tectorum L.) exhibited a higher AUC (0.995) compared with kochia [Bassia scoparia (L.) A.J. Scott] (0.658), suggesting that the model achieved better discrimination for B. tectorum. The lower AUC for B. scoparia compared with B. tectorum indicates a comparatively lower level of discriminative ability of the model, suggesting that it had reduced capability in distinguishing B. scoparia from other weed species in the dataset compared with B. tectorum.
Despite having a larger number of training images for B. scoparia compared with B. tectorum, the lower ability of the model to accurately identify and discriminate B. scoparia seeds (as indicated by an AUC of 0.66) can be attributed to several factors, including the inherent characteristics of the weed species and the challenges they pose for object detection models. Bassia scoparia seed, being smaller in size and lacking distinct shape features compared with B. tectorum, presents a greater difficulty for the model in accurately recognizing and delineating its boundaries. The limited visual cues and variations in appearance may hinder the model’s ability to differentiate B. scoparia from other weed species present in the dataset. Additionally, the smaller size of B. scoparia seeds may result in less prominent features, making them harder to detect amid cluttered backgrounds or similar-looking objects.
In contrast, B. tectorum, with its larger size and more pronounced shape characteristics, provides clearer visual cues for the model to identify and distinguish (AUC of 0.995). The distinctiveness of B. tectorum seeds facilitates better recognition and classification, despite the smaller number of training images available. The larger size and more-defined shape of B. tectorum seeds likely contribute to higher accuracy in detection compared with B. scoparia. It is essential to consider the complexity of the training process and the representation of different classes within the dataset. While the number of training images is important, the quality and diversity of these images also play a crucial role in model performance. The presence of diverse examples and variations within the B. tectorum training set may have enabled the model to learn more robust features for accurate detection, despite the smaller quantity of data.
Precision measures the proportion of correctly identified positive cases out of all cases identified as positive, providing insight into the ability of model to avoid false positives (Figure 3). The precision of the model for bounding box prediction improved considerably from 0.43 at epoch 1 to 0.82 at epoch 100. The increased precision value indicates the model’s ability to make accurate predictions with minimum false positives. Recall, unlike precision, gauges the model’s ability to identify all positive instances, reflecting its sensitivity to true positives. For instance, a recall value of 0.76 signifies the model’s effectiveness in capturing true positives. The mean mAP at 50% intersection over union (IoU) (mAP50), a crucial metric for object detection, increased consistently with each training iteration, reaching a value of 0.82 by epoch 100. mAP50 evaluates the AP of the model across different object classes when there is at least a 50% IoU between predicted bounding boxes and ground-truth boxes (precise location and class of object), indicating how well the model detects objects with a moderate level of overlap with the ground truth. Additionally, mAP50-95, which considers mAP at various IoU thresholds from 50% to 95%, improved from 0.22 to 0.51 over the course of training, highlighting the learning capacity of the model over different rounds of training (Figure 3). The improvement in these values is important to handle variation present between weed species. In addition to the accuracy, the YOLOv8 model also displayed an impressive speed in processing both images and real-time videos, making it suitable for large-scale applications and for extension functions where real-time identification would be useful for training and demonstration.
The progressive increase in the precision signifies the ability of the model to distinguish between different species, contributing to minimizing false positives. Adaptability along with processing speed is crucial for real-world field application in handling large datasets. However, despite these advancements, it is important to acknowledge potential challenges associated with the model’s performance. For instance, different weed species exhibit varying precision scores based on factors such as size and color, with lower precision for small seeds.
Detection Performance
The YOLOv8 model performed remarkably well in detecting, counting, differentiating, and classifying seeds from cleaned field samples using both still images and real-time videos for the majority of weed species. Challenges did arise in cases of increased similarity between seed morphology and color, such as between flixweed [Descurainia sophia (L.) Webb ex Prantl] and tumble mustard (Sisymbrium altissimum L.) (Figure 4). The constraint, however, was limited to seed pairs that were within the same taxonomic family, such as S. altissimum and D. sophia or annual sowthistle (Sonchus oleraceus L.) and prickly lettuce (Lactuca serriola L.) We anticipate that similar constraints will be identified with Amaranthus spp. seed.
The normalized confusion matrix (Figure 5) depicts a clear pattern where species with similar seeds in size are frequently confused with one another. While common lambsquarters (Chenopodium album L.), a small-seeded Chenopodiaceae species, was never identified as another species, other small-seeded species were frequently misidentified as C. album. In the hold-out validation, 32% of B. scoparia, 11% of background material, 5% of S. oleraceus, 3% of mayweed chamomile (Anthemis cotula L.), and 3% of Amaranthus spp. were misclassified as C. album (Figure 5). Interestingly, B. scoparia represented the direct opposite of C. album. Misclassification in B. scoparia was high, with 50% of samples being misidentified in the hold-out validation, as 32% was identified as C. album, 4% as A. cotula, and 14% as background material (Figure 5). Additionally, there was a high rate of background material misclassified as weed seeds. None of the background material was classified as background material, with 54% classified as A. cotula, 11% as C. album, 9% as Lamium spp., 8% as Amaranthus spp., 5% as Italian ryegrass (Lolium multiflorum Lam.), 4% as both catchweed bedstraw (Galium aparine L.) and S. altissimum, 3% as Brassica spp., 2% as B. scoparia, and 1% as both rattail fescue [Vulpia myuros (L.) C.C. Gmel.] and Solanum spp., indicating that the model is overly sensitive to false positives when encountering background material (Figure 5). However, the model was able to identify large-seeded weeds with nearly 100% accuracy (Figure 5). While the model is useful for classifying and counting seeds from larger-seeded grass species, caution should be taken when using the model for smaller-seeded weeds when there is non-seed material present.
The ability of the model to precisely identify different species differed based on the size and the color of the seed. For example, the model performed with increased precision on larger seeds with distinguishing features than on smaller seeds with similar color spectra. Because we only used 19 weed species to evaluate performance, the risk of misclassification escalates as the number of weed species to be identified increases, especially if the model encounters challenges in distinguishing between morphologically similar species, or when morphology varies significantly within species. Addressing these challenges requires continued research and refinement of the model, potentially incorporating additional features or training strategies to improve performance across a broader range of weed species. Furthermore, ongoing efforts to expand the diversity of the training dataset can enhance the model’s ability to generalize to novel weed species, mitigating the risk of misclassification in practical agricultural scenarios. While the YOLOv8 model demonstrates considerable promise in weed seed detection, ongoing optimization and adaptation are necessary to address the inherent complexities of real-world weed management applications.
Application
Rapid and accurate quantification of the weed seedbank is a critical capability gap in weed science. Utilizing high-throughput systems, including automated weed seed identification, renders many of the research needs on weed seedbank dynamics that have been proposed for decades potentially feasible (Buhler et al. Reference Buhler, Hartzler and Forcella1997; Khan et al. Reference Khan, Tufail, Khan, Khan and Anwar2021). Spatially explicit quantification of weed seedbanks, obtained by collecting soil samples across a landscape and using YOLOv8 to classify and count weed seeds in images, could be utilized to understand the relationship between the seedbank and final weed populations, the emergence dynamics of weed species, the effects of management practices on weed seed dynamics, and the empirical effect of management inputs on weed seedbank and weed densities. The use of such systems could also be grower targeted—with just a few samples of soil at the time of planting or tilling, growers could make critical management decisions earlier, enhancing their integrated weed management programs or increasing the impact of decision support systems. Because YOLOv8 is compatible with any three-band images such as RGB images, it will be applicable to identifying seeds with any RGB camera that has adequate resolution, but likely will require a microscope.
Although the YOLOv8 model is adequate for many tasks in weed seedbank assessment, it has critical limitations. First, YOLOv8 cannot identify bounding boxes of overlapping objects. If many seeds are overlapping in an image, the YOLOv8 model likely would not be able to accurately count the number of seeds of each species. This limitation can be overcome by splitting dense samples into multiple images and lightly shaking the paper to spread the seeds adequately, so they are not overlapping.
Second, many of the smaller weed seeds appear as specks in images, making it difficult for the classifier to distinguish them from background material. Another challenge with samples from elutriators is the presence of soil coating on the objects, giving them a similar color and making it difficult for the model to make accurate predictions. While it is unlikely that elutriators will ever be able to remove all background material from samples, a second washing along with higher-resolution images and a specific classifier that separates small seeds from background material could alleviate the misclassification between small seeds and background material.
The integration of high-resolution microscopy images was pivotal in carrying out the object detection analysis for small weed seeds and enhancing the overall robustness of the detection process. While handheld devices such as phones or scanners offer a faster alternative for image acquisition, the small size of the seeds led to inaccuracies in labeling and detection. Images from a microscope provided consistent and precise magnification, eliminating the need for additional programming to adjust for scale variations. This ensures accurate and reliable measurements, which are often a challenge with smartphone images due to variable camera settings and distances. Evaluation of metrics further highlighted the inherent challenges associated with small and similarly colored seeds, emphasizing the importance of dataset quality and diversity in model training. While the YOLOv8 model demonstrated high accuracy in detection and classification tasks, challenges persist in accurately distinguishing between morphologically similar species and addressing variations in seed size and color, particularly in seedbank samples with substantial soil particles obtained directly after the elutriation step. Incorporating images of seedbank samples taken post-elutriation into the dataset would likely improve detection accuracy. This limitation could not be addressed in the current study due to time constraints and the unavailability of sufficiently diverse elutriated seedbank images for training. Addressing these challenges will require continued research, refinement of the model, and expansion of the diversity of the training dataset. Despite these limitations, this work presents proof of concept that the integration of automated weed seed identification systems holds great promise for driving innovation and improving integrated weed management systems.
Acknowledgments
We thank Cody Willmore and Vishal Sonawane for their assistance.
Funding statement
The study was supported by the Washington Grain Commission, the R. J. Cook Endowment for Wheat Research, and Pacific Northwest Herbicide Resistance Initiative.
Competing interests
The authors declare no competing interests.