Introduction
Modern manufacturing practices encompass various robotic applications in processing, manufacturing, assembly, and quality control tasks. These robotic devices must possess sufficient computing and communication technologies for effective data collection, pre-processing, and sharing to realize the vision of Industry 4.0. In the realm of welding technology, robotic welding represents the pinnacle of modern welding techniques (Guo et al., Reference Guo, Zhu, Sun and Yu2019). However, the efficiency of welding robots is relatively low when dealing with low-volume, high-variety, and nonstandardized weldments, as path planners need to generate numerous path points to weld complex shapes. Manual welding is often quicker and more efficient, particularly in low to medium-scale manufacturing or overhaul and maintenance. Robotic welding must become more adaptable to cater to the needs of these businesses.
Addressing this challenge, Intelligent Welding Manufacturing Technology (IWMT) has emerged as a foreseeable trend, heavily relying on intelligentized robotic welding (Tarn and Chen, Reference Tarn and Chen2014). As Wang et al. (Reference Wang, Hu, Sun and Freiheit2020) state, “Intelligent welding involves utilizing computers to mimic, enhance, and/or replace human operators in sensing, learning, decision-making, monitoring, and control processes.” Robotic welding represents a distinct form of automated welding where human operators supervise and regulate the process while machines carry out the welding. However, automated welding using robots promises higher precision, reduced cycle times, enhanced worker safety, and decreased material waste (Gyasi et al., Reference Gyasi, Handroos and Kah2019; Reisgen et al., Reference Reisgen, Mann, Middeldorf, Sharma, Buchholz and Willms2019). Several methodologies contribute to the realization of IWMT, including Virtual and Digital Welding Manufacturing and Technology (V&DWMT), Intelligentized Robotic Welding Technology (IRWT), and Flexible and Agile Welding Manufacturing and Technology (F&AWMT) (Tarn and Chen, Reference Tarn and Chen2014).
Machine learning (ML) is a discipline of artificial intelligence (AI) and computer science that utilizes algorithms that learn from previous information to generalize new information, process input with noise and complicated data settings, use past know-how, and construct fresh notions. ML techniques have been prominently used in manufacturing like predictive analytics, Defect prognosis, tool/machine condition monitoring (Wuest et al., Reference Wuest, Weimer, Irgens and Thoben2016; Vakharia et al., Reference Vakharia, Gupta and Kankar2017; Wang et al., Reference Wang, Jing and Fan2018), quality control through image recognition (Wang et al., Reference Wang, Zhong, Shi and Zhang2020), weld bead diagnosis (Chen et al., Reference Chen, Chen and Feng2018; Rodríguez-Gonzálvez and Rodríguez-Martín, Reference Rodríguez-Gonzálvez and Rodríguez-Martín2019; Yang et al., Reference Yang, Liu and Peng2019; He et al., Reference He, Li, Pan, Ma, Yu, Yuan and Le2020), weld quality monitoring (Sumesh et al., Reference Sumesh, Rameshkumar, Mohandas and Babu2015; Mahadevan et al., Reference Mahadevan, Jagan, Pavithran, Shrivastava and Selvaraj2021), weld joint optimization (Choudhury et al., Reference Choudhury, Chandrasekaran and Devarasiddappa2020; Mongan et al., Reference Mongan, Hinchy, O’Dowd and McCarthy2020), robot trajectory generation (Duque et al., Reference Duque, Prieto and Hoyos2019), welding monitoring (Cai et al., Reference Cai, Wang, Zhou, Yang and Jiang2019), real-time weld geometry prediction (Lei et al., Reference Lei, Shen, Wang and Chen2019), weld parameters prediction (Las-Casas et al., Reference Las-Casas, de Ávila, Bracarense and Lima2018), recognition of welding jointtype (Fan et al., Reference Fan, Jing, Fang and Tan2017; Zeng et al., Reference Zeng, Cao, Peng and Huang2020; Chen et al., Reference Chen, Liu, Chen and Suo2022), and Fault detection and diagnosis (He et al., Reference He, Yu, Li, Ma and Xu2019) are expanding daily. Therefore, we infer that the application of intelligent welding in the manufacturing sector must acknowledge the need for support in handling the high-dimensional data, difficulty, and interactions among the involved data to benefit from increased data availability.
Accurately identifying the type of weld joint is a critical prerequisite for extracting features from the weld edge and guiding robotic systems in autonomous tracking. Researchers have extensively investigated this challenge, emphasizing applicability, accuracy, and computational complexity. Furthermore, algorithms designed to identify location information for dissimilar weld joints vary, and parameters such as current amperage rating, voltage levels, swing movement, and material deposition rate are specific to each weld joint type (Gheysarian and Honarpisheh, Reference Gheysarian and Honarpisheh2019; Sharma et al., Reference Sharma, Sharma, Islam and Roy2019). Consequently, manually adjusting rules and parameters for each weld joint before every welding operation proves inefficient. Therefore, achieving independent recognition of the weld joint prior to welding would significantly enhance the efficiency and ease of the intelligent welding process.
Practical weld joint identification requires high identification accuracy and reliability. In this regard, the strategies like Support Vector Machines (SVM), deep learning classifiers, LASER stripe data processing, Silhouette-mapping, and Hausdorff distance (Li et al., Reference Li, Xu and Tan2006; Fang and Xu, Reference Fang and Xu2009; Fan et al., Reference Fan, Jing, Fang and Tan2017, Reference Fan, Jing, Yang, Long and Tan2019; Zeng et al., Reference Zeng, Chang, Du, Peng, Chang, Hong, Wang and Shan2017, Reference Zeng, Cao, Peng and Huang2020; Zou and Chen, Reference Zou and Chen2018; Sharma et al., Reference Sharma, Sharma, Islam and Roy2019; Tian et al., Reference Tian, Liu, Li, Yuan, Feng, Chen and Wang2021; Chen et al., Reference Chen, Liu, Chen and Suo2022) have been offered in the literature. Although few of these classifiers have demonstrated adequate recognition accuracy, weld joint identification remains an open research problem that necessitates the development of innovative strategies and procedures for improving recognition accuracy, run-time, and computational complexity.
Chen et al. (Reference Chen, Liu, Chen and Suo2022)demonstrated the application of the Convolutional kernel with a nonmaximum suppression algorithm for the detection and location of fillet joints. Fang and Xu (Reference Fang and Xu2009) considered image-based recognition of fillet joints. Zeng et al. (Reference Zeng, Chang, Du, Peng, Chang, Hong, Wang and Shan2017), Shah et al. (Reference Shah, Sulaiman, Shukor, Kamis and Rahman2018), Shao et al. (Reference Shao, Huang and Zhang2018), Fan et al. (Reference Fan, Jing, Yang, Long and Tan2019), and Xue et al. (Reference Xue, Chang, Peng, Gao, Tian, Du and Wang2019) researched the recognition of narrow butt weld joints, and Zou and Chen (Reference Zou and Chen2018) studied feature recognition of lap joints using image processing and convolution operator. However, the authors tailored these methods to identify only one type of weld joint.
However, several weld joint types, such as ship hull welding, may exist in a particular welding situation (Bao et al., Reference Bao, Zheng, Zhang, Ji and Zhang2018). Moreover, the welding parameters vary as per the weld joint. As a result, in a realistic robotic welding setup, the system shall be capable of distinguishing multiple types of weld joints. Among a few researchers, Li et al. (Reference Li, Xu and Tan2006) utilized Hausdorff distance with shape matching technique to detect multiple, that is, six types of weld joints. Tian et al. (Reference Tian, Liu, Li, Yuan, Feng, Chen and Wang2021) utilized the spatial component relationship of the vertices essentials and the intersections of weld joints (Silhouette-mapping). Initially, the authors specified the links and their associations with the line segment components. Subsequently, this distinctive compositional association facilitated weld joint recognition. This approach is viable but has trouble distinguishing between distinct weld joints with identical compositional properties.
Fan et al. (Reference Fan, Jing, Fang and Tan2017) developed a technique for building an SVM classifier by forming the feature map from the joint extremities toward the lowest part of the weld; it outperforms other approaches in terms of detection performance and computing cost but suffers from the fact that its application is cumbersome. Zeng et al. (Reference Zeng, Cao, Peng and Huang2020) extracted two types of characteristics as feature vectors and constructed an SVM classifier to enrich the identifying information and recognition accuracy. The authors used a visual sensor employing a CCD camera, optical filters, and a LASER.
Upon reviewing the existing literature, it becomes evident that researchers have diligently endeavored to extract custom features from images through manual means. However, time-consuming processes and require skilled and trained personnel characterize this manual approach to feature extraction. Moreover, the tailor-made feature extraction technique employed in this approach tends to compromise precision and efficiency. The computation of unnecessary characteristics may lead to increased computational costs, ultimately resulting in suboptimal recognition system performance. Conversely, the nontailor-made feature extraction approach directly extracts features from raw photos, eliminating the need for prior knowledge or specific feature design. This strategy mitigates the limitations associated with manual feature extraction and circumvents the requirement for costly hardware. This strategy eliminates the need for prior knowledge and feature design specifics (Babić et al., Reference Babić, Nešić and Miljković2011). It also gets rid of the need for costly hardware. The present article employs the feature extractors, namely Local binary pattern (LBP), Histogram of gradients (HOG), and features from different layers of ResNet18. Various classifiers use the feature vectors for training, including SVM, Decision trees (DT), and K-Nearest Neighbor (KNN).
The LBP feature extraction technique is efficient and straightforward in both theoretical foundations and computational implementation. LBP exhibits a desirable rotation invariance property, making it well-suited for time-sensitive applications such as weld joint type identification. Moreover, LBP demonstrates strong robustness against variations in grayscale intensity caused by uneven illumination conditions commonly encountered in welding environments. This resilience further enhances its suitability for accurate and reliable feature extraction in challenging scenarios. These conditions are essential in an arc welding setup (Ojala et al., Reference Ojala, Pietikainen, Maenpaa, Pietikäinen, Mäenpää, Pietikainen, Maenpaa, Pietikäinen, Mäenpää, Pietikainen and Maenpaa2002). The HOG description emphasizes the configuration or form of the image. The angle of the gradient and its magnitude form the basis for computing the features. The contour of an image provides detailed information about what the image could be, so detecting edges often facilitates easy recognition of objects. The essential elements in identifying weld joints are the edges and the gradient. The HOG descriptor is invariant to geometric and photometric alterations because it operates on local cells. This property serves a purpose as the camera may take the weld images at different angles with varying degrees of illumination (Wang et al., Reference Wang, Han and Yan2009).
Moreover, the Histogram of Oriented Gradients (HOG) technique demonstrates remarkable efficacy in representing structural elements that exhibit minimal shape variation. The relative form factor between the edges and corners remains relatively consistent regarding weld joints. Traditional Convolutional Neural Network (CNN) methods prove inadequate for weld joint classification due to lengthy training times, high computational demands, and limited availability of large datasets. However, we can address these challenges by leveraging pre-trained networks. CNNs like ResNet18 are trained on extensive and diverse image datasets, enabling them to develop comprehensive feature representations for various images. These pre-trained networks often outperform custom features. Adopting a pre-trained CNN as a feature extractor offers a straightforward approach to capitalize on the capabilities of CNNs without investing substantial time and resources in training (Wang et al., Reference Wang, Jiao, Wang and Zhang2020). In the ResNet18 network, the initial layer prepares filters that extract weld joints’ blob and edge characteristics. Subsequent deeper layers process these foundational characteristics, integrating them to generate higher-level image features such as corners, edges, and relative positions. These higher-level features are better suited for recognition tasks as they encapsulate all primitive features into a more comprehensive visual representation (Donahue et al., Reference Donahue, Jia, Vinyals, Hoffman, Zhang, Tzeng and Darrell2014). In the context of Deep networks, one may argue that they require many training images specific to the given task. However, there are numerous instances where obtaining a large, well-balanced, annotated dataset is unfeasible. Identifying weld joint types from images poses a challenging problem due to the need for extracting multiple features from photos, the limited size of the available dataset, numerous joint types, the diversity of metal surfaces, and the variations in the arrangement of welding parts and welding conditions. Hence, it becomes the best candidate for image feature extraction and machine learning.
The literature review shows that the SVM effectively uses datasets with multiple features. They are highly memory efficient (Cortes and Vapnik, Reference Cortes and Vapnik1992). Multiple SVM can be used in a program in the one-versus-one form to achieve multi-classification. In this study, we utilized the Error-Correcting Output codecs (ECOC) classifier. The ECOC classifier is a Multiclass model for SVMs. Trained ECOC classifiers store Training sets, parameter settings, and posterior likelihood, enabling the prediction of labels for new data. Similarly, KNNs are simple to implement and comprehend intuitively. They effectively handle nonlinear decision boundaries for classification and regression tasks, continuously improve with additional data, and operate with only one hyperparameter (Wang et al., Reference Wang, Jing and Fan2018).
Moreover, DTs are a Supervised learning technique that can handle continuous and categorical values. They are easily readable by humans and easy to comprehend (Breiman et al., Reference Breiman, Friedman, Olshen and Stone2017). Tree performance remains unaffected by nonlinear relationships between its parameters. Also, DTs can handle data sets with errors or missing values.
The critical contributions of the article are as follows:
-
1. The article proposes an approach to enhance robotic welding automation by introducing a method for recognizing weld joint types. The method achieves this by extracting features from weld images.
-
2. The article demonstrates the superiority of nonhandcrafted image features over custom, mathematically, and computationally intensive features.
-
3. The article highlights the combination of a feature extractor and classifier that yields the best accuracy and computational time.
The article comprises five sections. Section “Introduction” provides an introduction to the problem and establishes the necessary background. Section “Methodology” presents the methodology developed for weld joint recognition. Section “Implementation” focuses on the implementation details of the proposed approach. Additionally, it discusses the key features of the dataset employed in the study. Section “Experiments and results” describes the experimentation process and presents the results obtained. Finally, section “Conclusion” summarizes the conclusions drawn from the research and discusses potential avenues for future work.
Methodology
Researchers commonly employ image processing and classification techniques to identify weld joint types. These approaches involve representing crucial elements of an image as a compact feature vector using feature extractors. Few authors (Fan et al., Reference Fan, Jing, Fang and Tan2017; Zeng et al., Reference Zeng, Cao, Peng and Huang2020) achieved weld joint type recognition by weld image feature extraction using hand-made feature extractors and classifying them by applying machine learning algorithms. However, they all use costly imaging systems and frequently fail to identify the type of weld joints in situations with significant variations in lighting conditions and highly reflective surfaces.
The proposed method follows a systematic approach, as illustrated in Figure 1.
Following Table 1, the feature vectors extracted by LBP, HOG, and different layers of ResNet18 serve as inputs for individual classification algorithms, namely SVM, KNN, and DT. To ensure comprehensive analysis, we construct 15 models organized into five sets, each employing distinct feature extraction algorithms. The image dataset consists of 2348 weld joint images, encompassing five types of weld joints, captured from diverse perspectives to enhance the generalization capability and maintain variability within the model.
The pre-processing stage is crucial in preparing the image dataset before feature extraction. This stage involves various actions to transform the incoming data into a suitable format for subsequent feature extraction. This meticulous pre-processing step is necessary to ensure the accuracy and integrity of the feature extraction process, mitigating any potential biases. In this study, the network receives images depicting different types of weld joints, and we pre-process these images according to the specific requirements outlined in Table 2.
To evaluate the performance of our approach, we employ the k-fold cross-validation method, with a value of k set to 3, to determine the accuracy of the classification results. The primary purpose of performing k-fold cross-validation is to assess a machine learning model’s performance and generalization capability. It is a widely used technique in model evaluation and selection. By using k-fold cross-validation, we achieve the following key objectives:
-
1. Robust Performance Estimation: Cross-validation provides a more reliable estimate of the model’s performance by evaluating it on multiple subsets of the data. Instead of relying on a single train-test split, which may be biased or subject to chance fluctuations, k-fold cross-validation averages the performance over k iterations. Doing so helps to obtain a more representative and stable estimate of the model’s performance.
-
2. Mitigation of Overfitting: Overfitting occurs when a model learns to fit the training data too closely, resulting in poor generalization to new, unseen data. By repeatedly splitting the data into different training and validation sets, k-fold cross-validation helps identify models that generalize well and are less likely to overfit. It provides a means to assess the model’s performance on different subsets of the data, ensuring that it performs consistently across multiple variations of the training and validation sets.
-
3. Model Selection and Hyperparameter Tuning: Cross-validation aids in selecting the best-performing model or determining optimal hyperparameters. By comparing the performance of different models or parameter settings across multiple folds, we can decide which model configuration yields the highest performance and best generalization.
-
4. Insight into Model Robustness: Cross-validation allows us to evaluate the stability and robustness of the model by examining its performance across different subsets of the data. It provides insights into how the model may perform on new, unseen data and indicates its ability to handle variations or potential biases in the dataset.
We carried out the experiments using MATLAB R2022a programming software.
Implementation
Image dataset
We trained and tested the proposed model using the Kaggle Weld joint dataset (Weld-Joint-Segments | Kaggle, Reference Munozn.d.) (for Butt and Fillet Joints) and an in-house dataset (for Vee, lap, and corner weld joints). The collection comprises 2348 weld joint images divided into five types of weld joints. Table 3 shows the image distribution of considered weld joint types. The images are down-sampled to 640 by 480 pixels in .png format for faster processing and low storage requirements.
Including image samples from natural scenes plays a crucial role in enhancing the generalization ability and robustness of the model. This diverse set of intrinsic scene images helps the model learn and adapt to different lighting conditions, backgrounds, object placements, and occlusions. Consequently, the model becomes more adept at extracting relevant features and patterns across various scenarios. By training on natural scene images, we foster the model’s ability to generalize its learned representations to unseen data, enabling it to make accurate predictions in real-world settings. Moreover, including natural scene samples helps reduce the model’s susceptibility to overfitting on specific training samples or artificially created datasets. In addition to the original dataset, we have employed various techniques to enhance the generalization of our proposed method and mitigate the risk of overfitting. Specifically, we applied data augmentation to the dataset, which has proven a practical approach for improving generalization capabilities. Data augmentation of images included:
-
1. Change the orientation of images in X and Y (both) directions erratically;
-
2. Arbitrarily scale images in both directions;
-
3. Randomized translation of pictures in both directions.
Table 4 shows the value range of augmentation procedures applied to the dataset. By introducing these variations, we increase the diversity of the dataset and expose the model to a broader range of potential scenarios. This process helps to prevent the model from memorizing specific details or patterns in the training set, thus reducing the likelihood of overfitting. We aim to augment the dataset, simulating real-world conditions and introducing additional variability into the evaluation process. Doing so allows us to assess the performance of our proposed method under more challenging and diverse scenarios encountered in practical applications. By strengthening the generalization capabilities of our model through this approach, we empower it to make accurate predictions on unseen or slightly different data.
Figure 2 illustrates 12 representative sample photos from the dataset, showcasing the diverse visual characteristics of the image dataset. The samples exhibit variations in weld types, encompassing comparable textures and edge-corner connections in some instances while differing in others. Furthermore, we have observed variations in the number of images across different subsets of the dataset. To mitigate the challenge posed by an imbalanced dataset, we randomly selected an equal number of images for training. This approach ensured a balanced representation of different classes within the training set, enabling more robust and unbiased model learning.
Feature extraction
Feature extraction transforms primary weld image data, such as surfaces, corners, edges, and blobs, into concise numerical characteristics for image recognition or classification. According to Nixon and Aguado (Reference Nixon and Aguado2012), feature extraction involves extracting information from the raw data that is most suitable for classification purposes while simultaneously minimizing within-class pattern variability and enhancing between-class pattern variability. In the context of weld joints, feature extraction plays a crucial role in accurately and individually determining the subset of features that constitute the structure of the joint. This approach reduces the dimensionality of the image data without significant loss, potentially leading to improved and faster outcomes than directly applying machine learning to unprocessed data. Additionally, feature extraction enables the description of critical parts of an image as a compact feature vector for image data. Therefore, this phase holds great importance as it directly influences the performance of the weld joint recognition system, as highlighted by Kumar and Bhatia (Reference Kumar and Bhatia2014).
Feature extraction using LBP
The feature extraction model set A relies on the LBP algorithm. It was first presented in 1994 (Ojala et al., Reference Ojala, Pietikainen and Harwood1994). However, the texture unit, spectrum, and analysis model were shown earlier in 1990 (He and Wang, Reference He and Wang1990). The LBP works on a 3-by-3-pixel block size. The central pixel acts as a cut-off for the pixels around it. This feature extractor has been thoroughly explored and presented in work by Ojala et al. (Reference Ojala, Pietikainen, Maenpaa, Pietikäinen, Mäenpää, Pietikainen, Maenpaa, Pietikäinen, Mäenpää, Pietikainen and Maenpaa2002).
Figure 3 shows black and white pixels representing relative differences in the intensity of the central pixel. Following inferences can be drawn,
-
• All black or white pixels represent a featureless surface (i.e., a flat surface).
-
• Successively appearing black or white pixels denote uniform patterns such as corners or edges.
-
• If the pixels follow no specific pattern and alternate between black and white, it represents a nonuniform surface.
Figure 4 shows that LBP thus can intuitively extract features for weld joint type identification as corners, edges, and flat surfaces characterize weld joint types.
The algorithm uses 1 and 8 as default values for the parameters like radius and the number of neighboring points, respectively. In addition, the rotation invariance flag is set to “true” with L2 Normalization (MATLAB R2022a implementation). In this case, only cell size affected the classifier performance significantly.
To optimize the performance of the LBP feature extractor, we conducted experiments with various Cell Size values and identified those that yielded the highest accuracy. Table 5 shows the corresponding cell sizes employed for each classifier. It is worth noting that adjusting the algorithm parameters can enhance the accuracy of the LBP feature extractor. However, conducting these tests required considerable time and financial resources. Moreover, our investigations into modifying the radius and number of neighboring points did not yield significant improvements in the results obtained from the limited set of tests. The feature vectors acquired through LBP are represented by histograms, as described by Wang et al. (Reference Wang, Han and Yan2009).
Consequently, these feature vectors can be directly utilized for training classifiers, eliminating additional processing steps. By carefully selecting the optimal Cell Size values and utilizing the associated feature vectors, we have demonstrated the potential for improving the accuracy of the LBP feature extractor. These findings contribute to a better understanding of the factors influencing the performance of LBP and provide insights into the effective utilization of LBP feature vectors in training classifiers.
The technique of 3-fold cross-validation utilizes the “Classification Error” loss function, which quantifies the proportion of observations inaccurately classified by the classifier. Figure 5 presents a histogram representation that illustrates a sample image before and after applying the Local Binary Patterns (LBP) feature extraction technique.
Figure 6 shows a sample representation of LBP features around eight neighboring pixels (using the Least significant bit (LSB)) and the averaged representation at the center image.
The function of the number of cells and the number of bins determines the feature-length, which relies on whether we set the rotation invariant flag to true or false. To calculate the feature-length, we followed these steps:
Image size is 640 by 480, and CellSize is per Table 5.
where P is the number of neighboring points.
So, for Cell size of 8 by 8 for SVM.
Number of cells = (640/8) × (480/8) = 4800.
Number of bins = (8 × 7) + 3 = 59.
Therefore, Feature-length = 4800 × 59 = 283200.
We can demonstrate similar calculations for different cell sizesFootnote 1 in KNN and DT.
Feature extraction using HOG
The feature extraction model set B employed the Histogram of Oriented Gradients (HOG) technique to extract image features. We chose HOG for its effectiveness in capturing the appearance and shape of objects in an image (Dalal and Triggs, Reference Dalal and Triggs2005). The HOG descriptor encompasses concatenated histograms of individual cells within the image. Contrast normalization becomes crucial considering the variation in illumination levels commonly encountered in weld joint-type images. Figure 7 presents an illustrative example output of the HOG feature extractor, where the image on the right clearly outlines the shape of the butt joint.
The algorithm uses 2 by 2 and 9 as default values for the Number of cells per block and the Number of orientation histogram bins, respectively (MATLAB R2022a implementation). Table 6 shows the concerned Cell Size utilized for the individual classifier.
We also tested several Cell size values here, and the ones producing the highest accuracy are selected. Next, we utilized histograms for training the classifiers directly. For HOG, feature-length is the function of Cell size, Cells per block, Number of histogram bins, and image size. We calculated the Feature-length as follows:
So, for Cell size of 64 by 64 for SVM.
Blocks per image = (640/(64 × 2)) + ((640/(64 × 2)) − 1) = 5 + 4 = 9 by (480/(64 × 2)) + ((480/(64 × 2)) − 1 = 3.75 + 2.75 = 6.5 ~ 6.
Feature-length = 9 × 6 × 2 × 2 × 9 = 1944.
We can demonstrate similar calculations for KNN and DT for different Cell size.
Deep features extraction using pre-trained ResNet18 deep network
The feature extraction model sets C, D, and E used the ResNet18 deep network. ResNet18 is an 18-layer convolutional neural network capable of classifying images into 1000 categories (Corke, Reference Corke2015). As a result, the network has learned comprehensive representations for a wide range of images. Within the ResNet18 network architecture, we extracted features from three distinct layers, namely Res3b_ReLU, Res4b_ReLU, and Pool5. We chose these layers because the pre-trained ResNet18 network, which underwent training on the ImageNet dataset, has proven its capability to capture significant features from diverse objects. Each layer within ResNet18 is responsible for extracting distinct characteristics from an input image. The earlier layers focus on extracting fewer but more sophisticated features while maintaining higher spatial and temporal resolution with increased activations. Specifically, we obtained 512 activations/features from the Pool5 layer, 256 features from the Res4b_ReLU layer, and 128 features from the Res3b_ReLU layer.
The network takes RGB images of size 224 by 224 pixels as input. It processes each input sample to calculate the output probability and generate a feature map. The transformation process converts this feature map into a single-column vector. This vector, comprising the extracted features, is then used as input for the classification algorithms employed in our study. After generating the respective features using each model, we employed the same classification algorithms to evaluate the classification performance.
Experiments and results
All experiments in this study were conducted on a Workstation with AMD Ryzen Threadripper 2950X CPU @ 3.50GHz, 128 GB of DDR4 RAM, and NVIDIA Quadro RTX 6000 24GB GPU.
LBP results
KNN classifier with LBP features emerged as the most accurate and fastest, with an image recognition time of 30 ms. SVM performed with nearly identical accuracy but took more time for image recognition.
HOG results
HOG features fared the poorest, with a maximum accuracy of 84.32% for SVM and a minimum of 65.56% for DT.
Deep features results
The model using Deep features from Layer’ Pool5’ obtained the best results. The accuracy achieved is 98.74% using SVM. By carefully adjusting the Gamma and C values to 1, we ensured optimal parameter settings that maximized the accuracy of the classifier. The C value is a regularization parameter that controls the trade-off between achieving a low training error and minimizing the margin violations. It determines the penalty assigned to misclassifications in the training data. At the same time, the Gamma value is a hyperparameter that determines the influence of each training example on the decision boundary. It defines the reach of the individual training samples and affects the shape of the decision boundary. The Gamma value determines the kernel coefficient for nonlinear SVM models. It measures the weight assigned to the distance between the data points in the input space (Cortes and Vapnik, Reference Cortes and Vapnik1992). The choice of the linear kernel and the utilization of the Sequential Minimal Optimization (SMO) solver further enhanced the efficacy of our approach.
The findings reveal a marginal difference in performance between the features extracted from layer Pool5 and Res4b_ReLU. However, it is noteworthy that the processing time for image analysis was longer in the case of Res4b_ReLU. Regarding classifier and feature extractor selection, SVM emerges as the top-performing classifier, with a pooling layer of ResNet18 as the feature extractor. Additionally, comparative analysis shows that SVM and KNN demonstrate comparable performance when utilizing features from Res4b_ReLU and Pool5. However, DT exhibits the lowest accuracy, reaching a maximum of 85.89% when employing features from the Pool5 layer.
Furthermore, when examining the utilization of features from three layers of ResNet18, slight variations in accuracy are observed between SVM and KNN classifiers. We presented the specific accuracy values and the classifiers employed, utilizing ResNet18 as the feature extractor in relevant rows of Table 7. Overall, these results provide valuable insights into the performance of different classifiers and feature extractors, aiding in selecting optimal approaches for image analysis tasks.
Note: □highlights the highest values for the respective model.
Table 8 summarizes the best values of accuracy for each set (ascending order), along with the Classification method, respective values of feature vector length, and image recognition time (ms).
Table 9 shows the image recognition time performance of each classifier with different feature extractors.
Boxes indicate best values of ‘Image recognition time’ for a particular classifier and feature extractor method combination.
Table 10 presents comparative image recognition results of different methods, including our proposed approach. Our strategy, which involves extracting features from the Pool5 layer of ResNet18 and passing them through an SVM classifier, achieved an accuracy of 98.74%.
Comparing our method to the other approaches, we can observe that it outperforms them in accuracy. The method proposed by Zeng et al. (Reference Zeng, Cao, Peng and Huang2020), which utilizes custom features supplied to an SVM classifier, achieved an accuracy of 98.4%, slightly lower than our approach. The Vision Sensor with the Silhouette-mapping technique introduced by Tian et al. (Reference Tian, Liu, Li, Yuan, Feng, Chen and Wang2021) reported an accuracy of 97.67%, while the use of custom features with Structured light vision by Wang et al. (Reference Wang, Hu, Sun and Freiheit2020) got an accuracy of 96.81%. It is worth noting that the accuracy achieved by our method is significantly higher than the approach presented by Fan et al. (Reference Fan, Jing, Fang and Tan2017), where custom features were supplied to an SVM classifier, resulting in an accuracy of 89.2%. Our approach utilizing features from the Pool5 layer of ResNet18 combined with an SVM classifier demonstrates superior accuracy, surpassing the custom feature-based methods and other vision sensor-based approaches. The higher accuracy achieved by our strategy is indicative of its ability to effectively capture discriminative features and generalize well to diverse image datasets. These results support the claim that our proposed method offers improved performance and accuracy in image recognition tasks compared to the other methods listed in the table.
Table 11 shows average accuracies obtained across three classifiers.
Bold text indicates highest value of average accuracy obtained across three classifiers.
Conclusion
This article tested various classifiers’ input with image feature vectors from different commonly used feature extractors and strived to see the best combination applied to a custom dataset of weld joint images. The experimental results lead to the following conclusions:
-
1. For ResNet18 features, the SVM and KNN classifier performed relatively equally on the mentioned dataset (except for the Res3b_ReLU feature set), with a slight variation in the image recognition time.
-
2. HOG and Res3b_ReLU as feature extractors resulted in the lowest average accuracy (78.19% and 79.57% resp.) across classifiers, whereas the Pool5 layer of ResNet18 feature extractor resulted in the best average accuracy (94.44%).
-
3. All the classifiers performed their individual best with features from the Pool5 layer features.
-
4. Average accuracy across three classifiers was similar in the case of HOG and Res3b_ReLU features (78.19% and 79.57% resp.) and across LBP and Res4b_ReLU features (90.91% and 90.97% resp.).
-
5. Table 8 shows that classifiers work better and faster with sparser feature information.
-
6. We obtained the highest average accuracy from features of the Pool5 layer.
-
7. In the future, achieving online detection of the weld joint types may require an improved noise-handling capability.
Data availability statement
The Kaggle weld joint dataset used for this study is available at https://www.kaggle.com/datasets/derikmunoz/weld-joint-segments. The rest of the data is available with authors and can be shared on a decent request.
Author contribution
S.S.: Writing – Original draft, Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Experiment. S.C.: Writing – review & editing, Supervision, Methodology, Project administration, Resources.
Funding statement
This work received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Competing interest
The authors declare none.
Satish Sonwane is a research scholar at Visvesvaraya National Institute of Technology, Nagpur, India, pursuing a Ph.D. in Mechanical Engineering. He received his master’s degree (August 2014) in CAD CAM from Nagpur University, India. His research focuses on studying the application of AI and ML technologies to facilitate intelligent welding.
Dr. Shital S. Chiddarwar is a Professor of Mechanical Engineering at the Visvesvaraya National Institute of Technology (VNIT), Nagpur, India. She has done B.E. (Mechanical Engineering) from VNIT and an M.Tech. from Nagpur University. She has a Ph.D. in Robot Motion Planning from IIT Madras (2009). She is a member of the Association for Machines and Mechanisms, IEEE, Robotics Society of India, IIIE, and ISTE. She mentors IvLabs, a unique robotics and AI lab, and is the Center head of Siemens Center of Excellence at VNIT. She works in the domain of Robot Motion Planning, Machine Vision, AI, and Adaptive Control.