Introduction
Alfalfa, an important leguminous forage crop, is widely cultivated worldwide (Bai et al. Reference Bai, Ma, Ma, Velthof, Wei, Petr, Oene, Lee and Zhang2018; Li et al. Reference Li, Gu, Liu, Wei, Hu, Wang, McNeill and Ban2021; Mielmann Reference Mielmann2013; Radovic et al. Reference Radovic, Sokolovic and Markovic2009). Alfalfa contains a high amount of crude protein and is rich in minerals, particularly calcium, iron, and manganese (Richter et al. Reference Richter, Siddhuraju and Becker2003), and is thus considered to be a healthy feed for livestock (Salzano et al. Reference Salzano, Neglia, D’Onofrio, Balestrieri, Limone, Cotticelli, Marrone, Anastasio, D’Occhio and Campanile2021). Weeds are a significant challenge in alfalfa production because they compete with alfalfa for nutrients, space, sunlight, and water, and reduce forage yield and nutritive value. Moreover, certain weed species such as perilla mint (Perilla frutescens L.) are toxic to livestock (Kerr et al. Reference Kerr, Johnson and Burrows1986). A variety of postemergence (POST) herbicides are used for weed control in alfalfa fields. For instance, clethodim and 2,4-DB control a wide range of grasses and broadleaf weeds, respectively, in conventional alfalfa crops (Cudney and Adams Reference Cudney and Adams1993; Idris et al. Reference Idris, Dongola, Elamin and Babiker2019), while glyphosate provides nonselective control of weeds in glyphosate-tolerant alfalfa crops (Wilson and Burgener Reference Wilson and Burgener2009). These POST-applied herbicides are typically broadcast-applied in alfalfa fields, including where weeds do not occur.
Site-specific weed management, particularly precision herbicide application, can considerably reduce herbicide input and weed control costs (Franco et al. Reference Franco, Pedersen, Papaharalampos and Orum2017; Sabzi et al. Reference Sabzi, Abbaspour-Gilandeh and Garcia-Mateos2018; Yu et al. Reference Yu, Schumann, Sharpe, Li and Boyd2020; Zaman et al. Reference Zaman, Esau, Schumann, Percival, Chang, Read and Farooque2011). A major obstacle for autonomous precision herbicide application is the ability to accurately and reliably detect weeds in a real-time manner (Sabzi et al. Reference Sabzi, Abbaspour-Gilandeh and Garcia-Mateos2018). Traditional machine vision techniques depend on the ability to recognize and differentiate plant leaf color, spectral information, and feature fusion (Sabzi et al. Reference Sabzi, Abbaspour-Gilandeh and Garcia-Mateos2018, Reference Sabzi, Abbaspour-Gilandeh and Arribas2020); morphological features (Bakhshipour and Jafari Reference Bakhshipour and Jafari2018; Hamuda et al. Reference Hamuda, Mc Ginley, Glavin and Jones2017; Pulido et al. Reference Pulido, Solaque and Velasco2017); and spatial information (Farooq et al. Reference Farooq, Hu and Jia2019). However, these traditional approaches cannot reliably detect weeds intermingled with crops, especially in the context of a complex environment with high crop and weed densities (Ahmad et al. Reference Ahmad, Jan, Farman, Ahmad and Ullah2020; Akbarzadeh et al. Reference Akbarzadeh, Paap, Ahderom, Apopei and Alameh2018; Sujaritha et al. Reference Sujaritha, Annadurai, Satheeshkumar, Sharan and Mahesh2017; Yu et al. Reference Yu, Sharpe, Schumann and Boyd2019a).
Machine learning techniques have advanced significantly in recent years (Jordan and Mitchell Reference Jordan and Mitchell2015). Deep convolutional neural networks (DCNNs) have been used successfully in various field applications (LeCun et al. Reference LeCun, Bengio and Hinton2015; Ni et al. Reference Ni, Wang, Vinson, Holmes and Tao2019). For example, recent studies have shown that deep learning can be used to diagnose coronavirus disease (Saood and Hatem Reference Saood and Hatem2021), to help predict seizure recurrence (Geng et al. Reference Geng, Alkhachroum, Bicchi, Jagid, Cajigas and Chen2021), for high-accuracy three-dimensional optical measurement (Yao et al. Reference Yao, Gai, Chen, Chen and Da2021), to predict the activity of potential drug molecules (Ma et al. Reference Ma, Sheridan, Liaw, Dahl and Svetnik2015), to analyze particle accelerator data (Azhari et al. Reference Azhari, Abarda, Ettaki, Zerouaoui and Dakkon2020; Ciodaro et al. Reference Ciodaro, Deva, De Seixas and Damazio2012), to detect industrial defects in wood veneer finishes (Shi et al. Reference Shi, Li, Zhu, Wang and Ni2020), to achieve efficient classification of green plum detects (Zhou et al. Reference Zhou, Zhuang, Liu and Zhang2020), and to reconstruct brain circuits (Helmstaedter et al. Reference Helmstaedter, Briggman, Turaga, Jain, Seung and Denk2013). In addition, DCNNs have demonstrated an exceptional ability to detect objects and classify digital images (Tompson et al. Reference Tompson, Jain, LeCun and Bregler2014; Wang et al. Reference Wang, Gao and Yuan2018). Thus, DCNNs show great promise for being employed for weed detection and classification purposes (Wang et al. Reference Wang, Zhang and Wei2019).
Researchers have recently explored the feasibility of using DCNNs to detect weeds in various cropping systems (Ferentinos Reference Ferentinos2018; Ghosal et al. Reference Ghosal, Blystone, Singh, Ganapathysubramanian, Singh and Sarkar2018; Sharpe et al. Reference Sharpe, Schumann, Yu and Boyd2019b; Singh et al. Reference Singh, Ganapathysubramanian, Sarkar and Singh2018; Yu et al. Reference Yu, Schumann, Sharpe, Li and Boyd2020). Sharpe et al. (Reference Sharpe, Schumann, Yu and Boyd2019b) showed that You Only Look Once (YOLO, version 3; Cornell University, Ithaca, NY [YOLO is a unified, real-time object detection system of software]) can be used as an object detector to discriminate broadleaves, grasses, and sedges in the middle rows of plastic-mulched vegetable crops. Yu et al. (Reference Yu, Sharpe, Schumann and Boyd2019a, Reference Yu, Schumann, Sharpe, Li and Boyd2020) reported the feasibility of using DCNNs to detect multiple broadleaf and grass weeds among actively growing or dormant bermudagrass [Cynodon dactylon (L.). Pers.] plants. Hennessy et al. (Reference Hennessy, Esau, Farooque, Schumann, Zaman and Corscadden2021) reported the feasibility of using YOLO3-tiny to detect hairy fescue (Festuca filiformis Pourr.) and sheep sorrel (Rumex acetosella L.) among wild blueberry (Vaccinium spp. L.) plants. Hussain et al. (Reference Hussain, Farooque, Schumann, Abbas, Acharya, McKenzie-Gopsill, Barrett, Afzaal, Zaman and Cheema2021) investigated the feasibility of using DCNNs to detect common lambsquarters (Chenopodium album L.) in potato (Solanum tuberosum L.) fields. However, the feasibility and effectiveness of using DCNNs for weed detection in alfalfa have never been investigated.
Alfalfa hay is typically harvested multiple times per growing season, unlike most other crops. Alfalfa has the capability to re-grow following harvest and can rapidly regenerate new stems and leaves. Weed detection in various heights of alfalfa stands might be a significant challenge.
Image classification with DCNNs can be used in the machine vision subsystem of smart sprayers for weed detection and real-time precision treatment (Sharpe et al. Reference Sharpe, Schumann, Yu and Boyd2019b; Wang et al. Reference Wang, Zhang and Wei2019; Yu et al. Reference Yu, Sharpe, Schumann and Boyd2019a, Reference Yu, Schumann, Sharpe, Li and Boyd2020). He et al. (Reference He, Zhang, Ren and Sun2015) noted that arbitrary use of fixed-size input images for training a neural network might reduce the classification accuracy. However, a careful review of the literature suggests that almost all previously reported studies that evaluated the feasibility of using DCNNs for weed detection and classification arbitrarily used a particular size of input images (Ferentinos Reference Ferentinos2018; Sharpe et al. Reference Sharpe, Schumann, Yu and Boyd2019b; Yu et al. Reference Yu, Sharpe, Schumann and Boyd2019a, Reference Yu, Sharpe, Schumann and Boyd2019b, Reference Yu, Schumann, Sharpe, Li and Boyd2020). Limited research has been carried out to investigate the impact of training image sizes on the performance of DCNNs for weed detection and classification through a comparative study.
When training is given to a deep learning model, the algorithm gradually improves to optimize through a large number of samples, with a certain weight of the optimizer (Krizhevsky et al. Reference Krizhevsky, Sutskever and Hinton2012; Simonyan and Zisserman Reference Simonyan and Zisserman2014; Szegedy et al. Reference Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke and Rabinovich2015). Thus, selecting an appropriate optimizer is critical in the training pipeline for deep learning models (Choi et al. Reference Choi, Shallue, Nado, Lee, Maddison and Dahl2019). The majority of previous studies have focused on comparing the state-of-the-art deep learning architectures for weed detection (Sharpe et al. Reference Sharpe, Schumann and Boyd2019a, 2019b; Wang et al. Reference Wang, Zhang and Wei2019; Yu et al. Reference Yu, Sharpe, Schumann and Boyd2019a, Reference Yu, Sharpe, Schumann and Boyd2019b). However, none of them attempted to improve weed detection accuracies by using deep-learning optimizers through comparative research. Therefore, the objectives of this research were to 1) explore the effects of using various image sizes for training purposes to gauge the performance of DCNNs to detect and classify weeds; 2) compare several DCNNs trained with different deep learning optimizers for weed detection purposes; and 3) determine the feasibility of using DCNNs to detect multiple broadleaf and grass weeds growing in alfalfa.
Materials and Methods
Overview
In this research, the image classification DCNN architectures AlexNet (Krizhevsky et al. Reference Krizhevsky, Sutskever and Hinton2012), GoogLeNet (Szegedy et al. Reference Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke and Rabinovich2015), VGGNet (Simonyan and Zisserman Reference Simonyan and Zisserman2014), and ResNet (He et al. Reference He, Zhang, Ren and Sun2016) were evaluated. These neural networks were trained to recognize four different sizes of input images (200×200, 400×400, 600×600, and 800×800 pixels); and three commonly employed deep-learning optimizers, Adagrad (Duchi et al. Reference Duchi, Hazan and Singer2011), AdaDelta (Zeiler Reference Zeiler2012), and Stochastic Gradient Descent (SGD; Darken et al. Reference Darken, Chang and Moody1992). AlexNet consists of eight layers, including five convolutional layers and three full connection layers (Krizhevsky et al. Reference Krizhevsky, Sutskever and Hinton2012). GoogLeNet consists of 22 convolutional layers and is designed to work with small convolutions in order to reduce the neuron numbers and parameters (Szegedy et al. Reference Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke and Rabinovich2015). VGGNet used in this research consists of 19 weight layers. VGGNet is designed to implement smaller convolutional kernels to limit neuron numbers and parameters (Simonyan and Zisserman Reference Simonyan and Zisserman2014). ResNet is based on the VGG19, which consists of 50 layers and is modified to include a residual unit through a short-circuit mechanism (He et al. Reference He, Zhang, Ren and Sun2016). ResNet solves the degradation problem of the deep network through residual learning for training deeper networks (He et al. Reference He, Zhang, Ren and Sun2016). All neural networks were pretrained using the ImageNet database (Deng et al. Reference Deng, Dong, Socher, Li, Li and Li2009) with specific spatial tensor image sizes of 224×224 pixels, whereas the AlexNet was trained with 227×227 pixels (He et al. Reference He, Zhang, Ren and Sun2016; Krizhevsky et al. Reference Krizhevsky, Sutskever and Hinton2012; Simonyan and Zisserman Reference Simonyan and Zisserman2014; Szegedy et al. Reference Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke and Rabinovich2015).
Image Acquisition
Images of various weeds growing in alfalfa fields were acquired multiple times during September and October 2020 using a digital camera (Panasonic® DMC-ZS110; Xiamen, Fujian, China) at a resolution of 4,160×3,120 pixels. The images taken in alfalfa fields in Bengbu, Anhui, China (117.89°N, 117.88°E) were used for the training dataset, validation dataset (VD), and testing dataset (TD). Additional testing images were taken in separate alfalfa fields in Bengbu, Anhui, China (additional testing dataset 1, TD 1) and Yangzhou University Pratacultural Science Experiment Station (32.20°N, 119.23°E) in Yangzhou, Jiangsu, China (additional testing dataset 2, TD 2). The additional testing datasets were used to examine the robustness of the models. The images containing alfalfa (8 to 52 cm height) and various broadleaf and grass weed species were captured from a height of approximately 1.5 m from the ground (0.05 cm pixel−1). Our research team designed a smart sprayer with a camera installed at 1.5 m above the ground (data not shown). Thus, all images were captured at 1.5 m above the ground to mimic the height of the smart sprayer’s camera. Images were acquired under various outdoor lighting conditions, including clear/bright, cloudy, and partially cloudy skies. In the present study, weed images were captured in the fall season, and only mature weeds were used for training and testing. An additional investigation is needed to evaluate the feasibility of using neural networks to identify weed growth stages. Variable rates could be sprayed according to the weed growth stages. For example, low and high herbicide rates could be sprayed to control seedling and mature annual weeds, respectively, while maintaining adequate weed control.
Impact of Training Using Various Image Sizes
During training, all collected images were cropped into sub-image datasets with resolutions of 200×200, 400×400, 600×600, or 800×800 pixels using Irfanview (v.5.50; Iran Skijan, Jajce, Bosnia; Figure 1A). The DCNN architectures received training with these image sizes. For each image size, the training dataset contained 3,000 positive images (with weeds) and 3,000 negative images (without weeds; Figure 1B). The VD contained 600 positive and 600 negative images. The TD contained 300 positive and 300 negative images that were randomly selected from the sites where the training images were taken but were not used for training. The TD 1 and TD 2 each contained a total of 700 positive and 700 negative images. The training and testing images contained a variety of broadleaf and grass weed species occurring in the mixture. The dominant broadleaf weed species (Figure 1C) included annual fleabane [Erigeron annuus (L.) Pers], common sage (Salvia plebeia R. Br.), Canada thistle [Cirsium arvense (L.) Scop.], and hemistepta [Hemistepta lyrata (Bunge) Bunge]; whereas the major grass weeds (Figure 1C) included crabgrass (Digitaria spp.), goosegrass [Eleusine indica (L.) Gaertn.], barnyardgrass [Echinochloa crus-galli (L.) Beauv], and green foxtail [Setaria viridis (L.) Beauv.].
Effect of Optimizers
Next, we investigated the performance of the CDDNs when they received additional training with four common deep-learning optimizers, Adagrad (Duchi et al. Reference Duchi, Hazan and Singer2011), AdaDelta (Zeiler Reference Zeiler2012), Adam (Kingma and Ba Reference Kingma and Ba2014), and SGD (Darken et al. Reference Darken, Chang and Moody1992). The characteristics of the deep-learning optimizers are described below.
Adagrad uses different learning rates for every parameter in the network (Duchi et al. Reference Duchi, Hazan and Singer2011). It updates the learning rate η based on the frequency of the update of each parameter. The performance of Adagrad relies on manually setting a global learning rate. The optimizer AdaDelta is an extension of Adagrad (Zeiler Reference Zeiler2012). AdaDelta accumulates the previous gradients over a fixed timeframe and employs Hessian approximation to ensure that the update direction is in the negative gradient. Adam combines the advantages of Adagrad and Root Mean Square Propagation (RMSProp; Kingma and Ba Reference Kingma and Ba2014). The method calculates the adaptive learning rate of different parameters by estimating the first and second gradients. It has the following advantages: 1) simple implementation, efficient calculation, and less memory demand; 2) the updating of parameters is not affected by the scaling transformation of gradient; 3) it is suitable for large-scale data and parameter scenarios and is applicable to unstable target functions; and 4) it is suitable for addressing the problem of sparse gradient or large noise gradient. Although Adam is currently the mainstream optimization algorithm, the best results in many fields (e.g., object recognition in computer vision) are still obtained by using SGD (Wilson et al. Reference Wilson, Roelofs, Stern, Srebro and Recht2017). SGD refers to mini-batch gradient descent (Qian et al. Reference Qian, Jin, Yi, Zhang and Zhu2015) and is one of the simplest deep-learning optimizers used to calculate the mini-batch gradient at every iteration. Although SGD is one of the most commonly used optimizers, its disadvantages are obvious. SGD can easily converge to the local optimum and be trapped in a saddle point.
Detection of Broadleaf and Grass Weeds
The deep learning architectures AlexNet, GoogLeNet, VGGNet, and ResNet were trained using input images of 200×200 pixels to detect broadleaf and grass weeds growing among alfalfa plants. The neural networks were trained with a total of 3,000 positive (with broadleaf weeds) and 3,000 negative (with grass weeds) images. The images of VD, TD, TD 1, and TD 2 contained broadleaf or grass weeds. The VD contained 600 positive and 600 negative images, the TD contained 300 positive and 300 negative images, and the TD 1 and TD 2 each contained a total of 700 positive and 700 negative images.
Training and Testing
The training and testing datasets were imported into the NVIDIA Deep Learning GPU Training System (DIGITS v. 6.0.0; NVIDIA Corporation, Santa Clara, CA, USA). The training and testing were performed on a GeForce RTX 2080Ti computer with 64 GB of memory using the Convolutional Architecture for Fast Feature Embedding (CAFFE; Jia et al. Reference Jia, Shelhamer, Donahue, Karayev, Long, Girshick, Guadarrama and Darrell2014). The hyper parameters used for training the neural networks are presented in Table 1. The actual training was carried out using the initial hyper parameters proposed by the original authors (Darken et al. Reference Darken, Chang and Moody1992; Duchi et al. Reference Duchi, Hazan and Singer2011; Kingma and Ba Reference Kingma and Ba2014; Zeiler Reference Zeiler2012).
a The deep convolutional neural networks AlexNet, GoogLeNet, VGGNet, and ResNet were evaluated using various image sizes and various deep-learning optimizers for training purposes. All four neural networks were trained with input images of 200×200 pixels. The images in the validation and testing datasets contained images of both broadleaf and grass weeds.
The testing and validation results of the neural networks were arranged in a confusion matrix with four possible conditions: true positive (tp), false positive (fp), false negative (fn), and true negative (tn). Precision, recall, and F1 scores were computed based on the results of confusion matrices.
Precision measures the accuracy of the neural network at positive detection and was calculated using Equation 1 (Hoiem et al. Reference Hoiem, Chodpathumwan and Dai2012; Sokolova and Lapalme Reference Sokolova and Lapalme2009; Tao et al. Reference Tao, Barker and Sarathy2016):
Recall measures the effectiveness of the neural network in identifying the target and was determined using Equation 2 (Hoiem et al. Reference Hoiem, Chodpathumwan and Dai2012; Sokolova and Lapalme Reference Sokolova and Lapalme2009; Tao et al. Reference Tao, Barker and Sarathy2016):
F1 score is the harmonic mean of precision and recall. The F1 score is used for comprehensive evaluation of precision and recall and was calculated using Equation 3 (Tao et al. Reference Tao, Barker and Sarathy2016):
Results and Discussion
Effect of Training Using Various Image Sizes
The input image size significantly affected the performance of the ability of DCNNs to detect weeds (Table 2). The neural networks that were trained with the small input images (200×200 pixels) performed better than they did with any other image sizes (400×400, 600×600, and 800×800 pixels) as evidenced by higher precision, recall, and F1 score values. For all neural networks, the F1 scores were ≥0.94 for the VD and TD when networks were trained with the small (200×200 pixels) images; however, the F1 scores were ≤0.95, ≤0.87, and ≤0.82 when the neural networks were trained with the relatively larger input image sizes of 400×400, 600×600, and 800×800 pixels, respectively. Interestingly, an increase in image size resulted in lower F1 scores for GoogLeNet and VGGNet compared to AlexNet and ResNet. When the neural networks were trained with large input images of 800×800 pixels, the F1 scores for GoogLeNet and VGGNet were ≤0.59 and ≤0.44, respectively, for VD, TD, TD 1, and TD 2, whereas the F1 scores of AlexNet and ResNet were ≥0.79 and ≥0.81, respectively.
a Abbreviations: VD, validation dataset; TD, testing dataset; TD 1, testing dataset 1; TD 2, testing dataset 2.
b The models were trained to detect all types of weeds. The training datasets contained 3,000 positive and 3,000 negative images; the validation dataset contained 600 positive and 600 negative images; the testing results contained 300 positive and 300 negative images; and the TD 1 and TD 2 contained 700 positive and 700 negative images.
A significant difference was observed among the ability the neural networks to detect weeds. When the neural networks were trained with input images of 200×200 pixels, AlexNet, GoogLeNet, and VGGNet were highly effective and achieved high F1 scores (≥0.98), with high recall values (≥0.99) for the VD and TD; however, the F1 scores of ResNet were ≤0.96 for the VD and TD, primarily due to low precision (≤0.94). The F1 scores of AlexNet, GoogLeNet, and VGGNet were ≥0.99 for TD 2 but ≤0.98 for TD 1. The lower recall of TD 1 compared with TD 2 images cannot be adequately explained but it might be related to the presence of a greater diversity of weed species and a wider range of alfalfa height. ResNet demonstrated greater image classification accuracy compared to AlexNet, GoogLeNet, and VGGNet when the neural networks were trained with large input images. ResNet also had the highest F1 scores across all validation and testing datasets when the neural networks were trained with images of 800×800 pixels.
In the experiment, the image datasets of the four pixel sizes and the model that had been trained with the 200×200 image datasets demonstrated the greatest ability to recognize weeds. The four networks trained by the four different pixel images and the loss curve is shown in the schematic diagram on the left of Figure 2. Under the training image sizes of 200×200, the model iterated for a total of 30 steps, started to converge within 5 steps, and then tended to stabilize. This size outperformed other cropping sizes, achieved stable convergence in less time, and obtained the lowest loss convergence value and the highest accuracy.
Transfer learning is the process of recycling previously trained neural networks by updating a small part of the original weights using new data (Bengio et al. Reference Bengio, Guyon, Dror and Lemaire2012). The use of transfer learning can reduce the amount of data required for training DCNNs (Espejo-Garcia et al. Reference Espejo-Garcia, Mylonas, Athanasakos, Fountas and Vasilakoglou2020) and is therefore widely adopted for deep-learning models of training (Geng et al. Reference Geng, Alkhachroum, Bicchi, Jagid, Cajigas and Chen2021; Mohanty et al. Reference Mohanty, Hughes and Salathé2016; Singh et al. Reference Singh, Ganapathysubramanian, Sarkar and Singh2018; Yu et al. Reference Yu, Sharpe, Schumann and Boyd2019a, Reference Yu, Sharpe, Schumann and Boyd2019b, Reference Yu, Schumann, Sharpe, Li and Boyd2020). In addition, He et al. Reference He, Zhang, Ren and Sun2015 noted that the use of fixed-size input images might significantly reduce the recognition accuracy of images or sub-images of arbitrary size. Mishkin et al. (Reference Mishkin, Sergievskiy and Matas2017) reported a similar finding: that the size of training images could significantly affect the recognition accuracy of DCNNs. As the input image size increases, the number of pixels in the images also increases. Using excessively large images may reduce the abstract level of the information, leading to increased calculation requirements and thereby reduced recognition accuracy; however, the critical information for feature extraction may not be well preserved when excessively small images are used. In addition, the small image size (200× 200pixels) performed best, likely because it is close to the initial spatial tensor image sizes used for pre-training the neural networks. AlexNet was pre-trained with the spatial tensor image size of 227×227 pixels, while GoogLeNet, VGGNet, and ResNet were pre-trained with images of 224×224 pixels (He et al. Reference He, Zhang, Ren and Sun2016; Krizhevsky et al. Reference Krizhevsky, Sutskever and Hinton2012; Simonyan and Zisserman Reference Simonyan and Zisserman2014; Szegedy et al. Reference Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke and Rabinovich2015). Therefore, further reducing the image size used for training (i.e., smaller than 200×200 pixels) may not improve weed detection accuracy. Further study is needed to verify this assumption.
In previous research, neural networks exhibited excellent weed detection accuracy, but they were exposed to too many training images (Ferreira et al. Reference dos Santos Ferreira, Freitas, da Silva, Pistori and Folhes2017; Yu et al. Reference Yu, Schumann, Sharpe, Li and Boyd2020). For example, Yu et al. (Reference Yu, Schumann, Sharpe, Li and Boyd2020) used a dataset of 8,000 positive and 9,000 negative images to train a neural network to detect and classify multiple grass weed species growing among bermudagrass plants; the authors reported that VGGNet outperformed AlexNet and GoogLeNet in their ability to do so. Based on the present study’s findings, we suggest that using the most appropriate training image size can substantially enhance the performance of weed detection and thereby reduce the need to train the programs to detect image quantity. Furthermore, using an appropriate image size may also minimize the difference between the neural networks’ ability to detect weeds, although this assumption needs to be further verified.
Effect of Optimizers
AdaDelta and SGD optimizers generally outperformed Adagrad and Adam. The F1 scores of AlexNet trained with AdaDelta and SGD were ≥0.96 with VD, TD, TD 1, and TD 2; whereas F1 scores were ≤0.92 when AlexNet was trained with Adagrad, and ≤0.98 when it was trained with Adam (Table 3). The F1 scores of GoogLeNet did not significantly differ between the optimizers when the VD and TD were used. However, the F1 scores for GoogLeNet were ≥0.97 when TD 1 and TD 2 were used and when it was trained with AdaDelta and SGD, but the scores were ≤0.95 when it was trained with Adagrad and ≤0.93 when it was trained with Adam (Table 3). The F1 scores for VGGNet were ≥0.98 when VD, TD, TD 1, and TD 2 were used and when it was trained with AdaDelta, but the scores were ≤0.98 when it was trained with Adagrad and Adam (Table 3). ResNet trained with SGD and Adam exhibited significantly lower F1 scores when TD 1 and TD 2 were used than when it was trained with Adagrad and AdaDelta. These characteristics were evidenced in the loss curve on the right side of Figure 2. For AlexNet, Adagrad and Adam were obviously unsuitable compared to SGD and AdaDelta. SGD converged faster than AdaDelta, and eventually, the two curves tended to be stable. For GoogLeNet, the curves of the four optimizers exhibited little difference and leveled off in the end. For VGGNet, Adam performed worse than the other three optimizers. Among the optimizers, AdaDelta reached convergence faster, and finally, the three curves became stable. For ResNet, the four curves fluctuated greatly and did not tend to be stable when the number of iterations was 30 steps. These findings indicate that the classification accuracy of weed detection can be improved when the neural networks are trained with appropriate optimizers.
a Abbreviations: VD, validation dataset; TD, testing dataset; TD 1, testing dataset 1; TD 2, testing dataset 2.
b The models were trained to detect all types of weeds in images at 200×200 pixels, and the training dataset contained 3,000 positive and 3,000 negative images. The VD contained 600 positive and 600 negative images; the TD contained 300 positive and 300 negative images; and the TD 1 or TD 2 contained 700 positive and 700 negative images.
To date, hundreds of deep-learning optimizers have been developed (Schmidt et al. Reference Schmidt, Schneider and Hennig2021). However, the research community commonly relies on benchmarking or even personal and anecdotal experiences to choose an optimizer (Geng et al. Reference Geng, Alkhachroum, Bicchi, Jagid, Cajigas and Chen2021; Nagaraju and Chawla Reference Nagaraju and Chawla2020). During deep learning, an optimization algorithm is required to reduce losses that occur by updating the weight parameters (Choi et al. Reference Choi, Shallue, Nado, Lee, Maddison and Dahl2019; Schmidt et al. Reference Schmidt, Schneider and Hennig2021). The optimization algorithm can significantly affect the training speed and determine the final performance of the neural network being trained (Choi et al. Reference Choi, Shallue, Nado, Lee, Maddison and Dahl2019). Our results confirmed the importance of selecting an appropriate deep-learning optimizer during the period when neural network models are being trained for weed detection purposes. The neural networks evaluated here needed different optimizers to achieve the best performance in weed detection.
To the best of our knowledge, this is the first report to investigate the effect of using optimizers on neural networks for purposes of weed detection. For detection of fruits or plant diseases, recent empirical comparisons revealed the differences between the neural networks when they were trained with different optimizers (Postalcolu Reference Postalcolu2020; Schmidt et al. Reference Schmidt, Schneider and Hennig2021). Adam, SGD, and RMSProp were used for training DCNNs for fruit detection and found that Adam and RMSProp outperformed SGD (Postalcolu Reference Postalcolu2020). In another study, Xception trained with the optimizer Adam achieved higher F1 scores for classifying plant disease images than other optimizers, including Adagrad, Adamax, SGD, and RMSProp (Saleem et al. Reference Saleem, Potgieter and Arif2020). Wilson et al. (Reference Wilson, Roelofs, Stern, Srebro and Recht2017) reported that adaptive learning-rate methods (e.g., Adagrad, AdaDelta, RMSProp, Adam) generally performed worse than SGD Contrast software (Ithaca, NY: Cornell University) in terms of object recognition character-level language modeling and constituency parsing.
Detection of Broadleaf vs. Grass Weeds
Based on the results presented in the two sections above, AlexNet, GoogLeNet, VGGNet, and ResNet performed best with images of 200×200 pixels in detecting multiple broadleaf and grass weeds and the most effective deep-learning optimizers. No obvious differences were observed among any neural networks in their ability to detect broadleaf plants vs. grasses, as evidenced by the precision, recall, and F1 score values (Table 4). Among the neural networks we evaluated, VGGNet consistently produced the highest F1 scores when VD, TD, TD 1, and TD 2 were used (classification results are shown in Figure 3), whereas ResNet consistently produced the lowest F1 scores in its ability to detect broadleaf and grass weeds. VGGNet achieved high F1 scores (≥0.99) with high recall (≥0.98) when VD, TD, and TD 2 were used, whereas the F1 scores of ResNet never exceeded 0.73. GoogLeNet outperformed AlexNet in its ability to detect broadleaf and grass weeds; the F1 scores of GoogLeNet were consistently higher than those of AlexNet when VD, TD, TD 1, and TD 2 were used.
a Abbreviations: VD, validation dataset; TD, testing dataset; TD 1, testing data set 1; TD 2, testing data set 2.
b The models were trained to detect all types of weeds with training images of 200×200 pixels, and the training dataset contained 3,000 positive and 3,000 negative images. The validation dataset contained 600 positive and 600 negative images; the testing results contained 300 positive and 300 negative images; the TD 1 or TD 2 contained 700 positive and 700 negative images.
Although the neural networks achieved excellent performance in their ability to detect weeds when they were trained with the best-performed image size and optimizer, various factors may affect their performance. In the present study, except for ResNet, the precision and recall values were lower when TD 1 was used than when TD 2 was used. This result might be because the TD 1 photographs were acquired primarily on cloudy days and thus were darker than the TD 2 images. Additional studies are needed to evaluate the training and testing of neural networks with images from a wide range of geographic locations, weed species, weed densities, weed, crop growth stages, and light intensities, and their ability to adapt to more complex situations.
DCNNs detect weeds based on plant morphological features, including leaf pattern and texture (Kamilaris and Prenafeta-Boldu Reference Kamilaris and Prenafeta-Boldu2018). For this reason, the detection of broadleaf weeds growing in alfalfa fields is hypothetically more difficult than the detection of grasses. However, among all the neural networks tested here, the present study clearly showed no obvious differences in their ability to detect broadleaf and grass weeds. We note that a high image processing speed is vital for real-time weed recognition and precision herbicide application (Yang et al. Reference Yang, Prasher, Landry and DiTommaso2000). The neural networks in this study exhibited fast image processing using the NVIDIA Geforce 2080Ti graphics processing unit in the present study. The image processing speeds were 23 ms, 35 ms, 64 ms, and 68 ms image-1 for AlexNet, GoogLeNet, VGGNet, and ResNet, respectively.
Conclusion
This research demonstrated the feasibility of using DCNNs for purposes of weed detection in alfalfa crops. AlexNet, GoogLeNet, VGGNet, and ResNet trained with small input images of 200×200 pixels performed better than when large images of 400×400, 600×600, and 800×800 pixels were used. Furthermore, the choice of a deep-learning optimizer can significantly affect the performance of neural networks. The optimizers AdaDelta and SGD outperformed Adagrad and Adam when they were used with AlexNet and GoogLeNet; AdaDelta outperformed Adagrad, Adam, and SGD when used with VGGNet; and Adagrad and AdaDelta outperformed Adam and SGD when used with ResNet. All neural networks showed no differences in their ability to detect broadleaf and grass weeds. When the neural networks were trained with the best-performing input image size and optimizer, the neural networks were ranked as follows, from the highest to lowest classification accuracy: VGGNet > GoogLeNet > AlexNet > ResNet. Future research will integrate these neural networks into the machine vision subsystem of smart sprayers.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant No. 32072498), the Key Research and Development Program of Jiangsu Province (Grant No. BE2021016), and Jiangsu Agricultural Science and Technology Innovation Fund (Grant No. CX(21)3184). The authors declare no conflicts of interest.