1 Introduction
With the rise of new technologies, industry moved a step forward to a new era in the field of manufacturing. This complex transformation, including the integration of emerging paradigms and solutions such as Artificial Intelligence (AI), Human-Computer Interaction, Cloud Computing, Industrial Internet Of Things (IIoT) and Blockchain, is referred as Industry 4.0. The impact of the field is witnessed by the effort to promote its development within several national economic policies. For example, the Italian Ministry of Development (nowadays called Ministry of Enterprise and Made in Italy, and identified by the MIMIT acronym) is funding the application of AI to the manufacturing processes to improve efficiency and push the development and modernization of Italian SMEs. In this evolving scenario, Quality Control (QC) is greatly benefiting from the adoption of advanced AI tools and techniques, that can allow for speeding up or automatizing processes of assessment about integrity, working capability, and durability of the products (Javaid et al. Reference Javaid, Haleem, Singh and Suman2022). In particular, the automation of the compliance verification process for products is among the promising applications of AI for QC that poses a significant challenge for all manufacturing-related businesses, because it can make more efficient a necessary but costly and time-consuming operation in the supply chain.
Among the projects funded by MIMIT is the one titled “Multipurpose Analytics Platform 4 Industrial Data” (MAP4ID), where one of the main use cases is precisely the development of an AI capable of automating the compliance checking of Electrical Control Panels (ECPs).
Basically, an ECP is an enclosure, typically a metal or plastic box, which contains electrical components to control and monitor various mechanical processes, motors, sensors, and actuators. ECPs are employed to regulate a wide variety of components used in industry: for example, they allow to control of mechanical equipment, electrical devices, manufacturing machinery, etc.
One of the basic QC tasks in the manufacturing of ECPs requires checking the compliance of the produced control panels with their schematics. Automating this task is particularly relevant since it is currently manually performed by human experts, which makes the whole process inefficient, expensive, and prone to errors. The release of defective ECPs (due to poor quality control) can cause exposure to penalties by the customer and compromise the company’s reputation. The adoption of AI-based tools can greatly mitigate these risks by enabling continuous monitoring of the whole production chain and early detection of issues in each stage of the production process.
Main Problem. In this work, we devise an innovative approach combining Deep Learning (DL) (Goodfellow et al. Reference Goodfellow, Bengio and Courville2016), and Answer Set Programming (ASP) (Brewka et al. Reference Brewka, Eiter and Truszczynski2011; Gelfond and Lifschitz Reference Gelfond and Lifschitz1991) to support the QC for the production of electrical control panels. Here, the main task consists in identifying anomalies in the final product, such as the lack, the misplacing, or the wrong connectivity of the electrical components in the cabinet of the ECP, by just analyzing an image of the assembled product. Important requirements are that the AI must be capable of producing the results of the compliance-checking task in a very short time (in the order of seconds) and with high accuracy ( $\gt$ 90%), to enable the integration into a tool assisting human inspectors that delivers real-time and robust performance.
This problem is made very challenging for standard DL approaches by the following main issues:
-
1. Data scarcity. Although the companies can produce sufficient amounts of data, semantics, and labels are often missing from images. In particular, in our scenario, such a problem affects both the data representations that is, the pictures depicting the ECPs, and the correspondent schematics. Indeed, supervised information about the position, dimensions, and typology of the installed components is missing for the pictures. As regards the schematics, although they seem to provide a more detailed representation, the possibility of easily translating them into actionable constraints (in the form of grammar rules) strongly depends on the underlying software used to produce them.
-
2. Custom Designs. Despite ECP being made of standard components, there is no standard set of schematics for ECPs. Usually, the design of a solution is customized and very specific for the needs of a specific customer. Thus, the AI must be able to work with different schematics without requiring any additional training.
Contribution. In this work, we define a solution approach composed of two main phases:
-
1. First, a Deep Learning based solution allows for recognizing the electrical components (object detection) from the images of the panels and reconstructing the scheme. In this phase, a number of data augmentation strategies are also exploited to cope with the lack of labeled data.
-
2. Then, an Answer Set Programming-based system is used to compare the scheme reconstructed from the picture with its original schematic in order to discover possible mismatches/errors.
The contribution of this paper can be located in the challenge of providing a suitable combination of learning and reasoning through the development of integrated components, which, nowadays, is identified by the buzzword neuro-symbolic AI (d’Avila Garcez et al. Reference d’Avila Garcez, Besold, Raedt, Földiák, Hitzler, Icard, Kühnberger, Lamb, Miikkulainen and Silver2015). Actually, our system can be classified as a Neural—Symbolic architecture (or architecture of Type 3) according to Henry Kautz’s taxonomy (Kautz Reference Kautz2022), where DL is used for sensing (detecting components) and a reasoner (ASP-based) is used for checking conformance and detecting issues.
Although based on a conceptually straight combination of DL and ASP, an experiment conducted on (scarce) data provided by an Italian SME leader in the production of ECPs confirms that our neuro-symbolic system can deliver the expected performance, which is the main acceptability criteria to be fulfilled by a successful real-world application.
2 Framework overview
In this section, we illustrate the solution approach devised to address the main problem of how to automate the compliance verification process of control panels. As highlighted in Section 1, an effective solution for this problem has to cope with the challenges of understanding image contents and extraction of the constraints encoded in schematics, while coping with the issues of lack of labeled data and unlabeled data distribution. To this aim, we defined the framework shown in Figure 1 that includes two main macro-modules, respectively named Component Detection and Quality Assessment.
The former is devoted to recognizing the electrical components assembled in the cabinet. Basically, it includes the modules characterizing the adopted machine learning methodology, whose main objective is to identify the components of the panel from a picture. Specifically, a set of Data Augmentation and generation techniques builds a suitable dataset (robust to overfitting), which feeds a Convolutional Neural Network (CNN) based model trained to perform the component detection.
The latter exploits ASP to tackle the task of compliance checking. It automatically compares the control panel scheme built starting from the neural network output and the corresponding schematic to highlight any anomalies.
3 Component detection via deep learning
The component detection is meant to recognize, given a picture representing a panel, the type and geometric position of each component within the panel. This is a preliminary and fundamental step since, in order to check the compliance of the cabinets with their schematics, we need first to understand their composition. The main problem in this step is given by the scarcity of data, as well as the lack of labeling annotations. This is a typical scenario characterizing industrial processes: the quality of a machine learning model relies on the data used to train it; however, the latter requires an accurate labeling process that is time and resource-consuming and hence difficult to obtain. In our framework, we address these issues by exploiting a synthetic data generation process that allows us to enrich the starting training set.
3.1 Data augmentation and generation
Basically, our synthetic data generation method is fed with three inputs: (i) a picture showing an empty cabinet, (ii) a catalog including all the available components that can be installed in a cabinet, and (iii) a limited number of real pictures that will be manipulated in order to add noisy elements in the generated data, by exploiting a suitable strategy described in the following.
The core idea is to enrich the ground truth (consisting of a limited number of images) with synthetic images, where the area of the empty cabinet is filled with random components picked from the catalog. Notice that, at this stage, we are not interested in generating compliant panels, since our only objective at this stage is to build a suitable object detector that is capable of recognizing both the component and its geometrical position and extension. The size of the catalog and the randomness of the composition allow us to generate a suitable number of images where each component can be included with a suitable frequency, thus making the result dataset robust to object detection and segmentation learning tasks.
This simple approach can be further combined with other image augmentation strategies (Image Processing module in Figure 1) with the aim of yielding a training set that includes a sufficient and diversified number of examples for learning the model. In particular, our framework also includes traditional data augmentation strategies that is, Gaussian Blur and PerspectiveTransform. As regards the former, the idea consists in introducing imperfections into data so as to make component detection more resilient to data changes, it is obtained by averaging contiguous pixel values. The last one allows for applying random four-point perspective transformations to images.
The resulting dataset from this process will include all the necessary features for the training phase: (i) a large number of different pictures, (ii) the position of each component, (iii) the type of each component. Notably, since each component depicted in the synthetic pictures is randomly placed, the detection model will be forced to learn the intrinsic features of each component, instead of considering positional features, that may vary in the different schematics. Within a cabinet, there are other “auxiliary” elements that are simplified in a schematic, mainly separation boxes, metal runners, and cables. For simplicity, we call them noise to randomly add to the generated images in order to make the object detection model able to distinguish and ignore these elements.
Figure 2, show some examples of the data generation process. We can observe the empty cabinet (Figure 2a), and two instances where it is filled with random components (Figure 2b and 2c). Notice that the synthetic data does not necessarily represent a realistic situation. As already mentioned, this is not an issue since our purpose here is to strengthen the object detection and segmentation phase, which is discussed below.
3.2 Component detection
The Model Building module in Figure 1 allows for training the deep architecture used to perform the component detection. For this, we adopted the Mask R-CNN convolutional neural architecture proposed by He et al. (Reference He, Gkioxari, Dollár and Girshick2017). In general, R-CNN (Region based CNN) refers to a family of neural architectures adopting a Multi-shot approach. The underlying idea is a two-step process: first, different bounding boxes across possible regions of interest (RoIs) are extracted; then, such regions are independently evaluated through a CNN architecture in order to map them to any of the proposed classes (see Figure 3).
Mask R-CNN extends a specific architecture named Faster R-CNN (Ren et al. Reference Ren, He, Girshick and Sun2015) that includes two main components: (i) Region Proposal Network (RPN), a deep neural network aimed at extracting RoIs from the picture, and (ii) Fast R-CNN, a neural architecture that performs classification, by scaling a region to a predefined size thus enabling the computation of a set of CNN feature maps. The main advantage of the Faster R-CNN architecture is a suitable trade-off between competitive accuracy in terms of object recognition, and relative speed in the recognition phase. By contrast, other approaches based on Single-Shot architectures such as YOLO (Redmon et al. Reference Redmon, Divvala, Girshick and Farhadi2016) or SSD (Liu et al. Reference Liu, Anguelov, Erhan, Szegedy, Reed, Fu and Berg2016) focus on fast recognition, at the cost of recognition accuracy. This is clearly not acceptable in our scenario, where we aim at checking compliance, and missing a component in the picture would result in a failure in the check. Mask R-CNN further improves Faster R-CNN by introducing a further branch for predicting segmentation masks on each Region of Interest. The recognition of the mask is crucial in our scenario since it allows to precisely identify the geometrical position of the component within the panel. Technically, Mask R-CNN rebuilds the mask by resorting to an alignment component and a mask head, composed of two convolutional layers and capable of generating a mask for each RoI in order to segment the picture in a pixel-to-pixel fashion.
Mask R-CNN relies on a backbone convolutional architecture. In our framework, we used ResNet (Residual Network) (He et al. Reference He, Zhang, Ren and Sun2016), a very deep CNN architecture characterized by residual blocks and skip connections. These two features guarantee both, fast convergence in the training stage and expressiveness/accuracy in the recognition phase. We further strengthened the training phase by exploiting Transfer Learning. In particular, we used a ResNet pre-trained on COCO dataset, Footnote 1 which was also fine-tuned for our specific scenario, by exploiting the generated dataset with the artificially generated labeled components.
Figure 2d shows the output of the recognition phase, on a real picture representing a true panel. We can see that the model successfully recognizes all available components, and additionally devises a contour of their geometric extension. These contours represent one of the inputs to the reasoning component.
4 Compliance checking in answer set programming
In this section, we describe the reasoning component of our architecture for compliance checking. In particular, this component has been implemented by resorting to Answer Set Programming (ASP). ASP is a well-established paradigm for declarative programming and non-monotonic reasoning developed in the area of Knowledge Representation and Reasoning (Baral Reference Baral2003; Brewka et al. Reference Brewka, Eiter and Truszczynski2011; Bonatti et al. Reference Bonatti, Calimeri, Leone and Ricca2010; Gelfond and Lifschitz Reference Gelfond and Lifschitz1991). ASP has been employed to develop many academic and industrial applications of AI (Erdem et al. Reference Erdem, Gelfond and Leone2016; Gebser et al. Reference Gebser, Maratea and Ricca2020; Calimeri et al. Reference Calimeri, Gebser, Maratea and Ricca2016; Grasso et al. 2009; Reference Grasso, Leone, Manna and Ricca2011; Dodaro et al. Reference Dodaro, Gasteiger, Leone, Musitsch, Ricca and Schekotihin2016). ASP is based on logic programming and non-monotonic reasoning, and it allows for flexible declarative modeling of search problems, by means of logic programs (collection of rules), whose intended models (answer sets) encode solutions (Baral Reference Baral2003; Brewka et al. Reference Brewka, Eiter and Truszczynski2011). The specification (logic program) described in the following can be fed to an ASP system to actually compute the solutions to the modeled program (Lierler et al. Reference Lierler, Maratea and Ricca2016).
The reasoning module is fed by two handlers, named ASP File Generator and CAD Parser. The former component is devoted to translating the objects recognized by the neural model in ASP facts (a file containing a list of facts concerning coordinates and membership of the electrical component), similarly, the second one yields a list of facts from the input CAD image.
In the following, we focus on the core parts of our solution and simplify some technical aspects that do not impact the comprehension of the working principle of our solution. This is done with the aim of making the presentation more accessible and meeting space requirements. Hereafter, we assume the reader to be familiar with ASP. For details please refer to Brewka et al. (Reference Brewka, Eiter and Truszczynski2011), Baral (Reference Baral2003), Gelfond and Lifschitz (Reference Gelfond and Lifschitz1991).
4.1 Input specification
In ASP the input specification is made by a set of “facts,” which are assertions that model true sentences. Thus, the labeled schematic of the circuit (we informally refer to it as cad), and the output of the Mask R-CNN net (exemplified in Figure 2a) are converted in a set of ASP facts of the following form:
These facts provide information about the components like their label, id, top-left and bottom-right coordinates, and membership. In particular, the membership is valued with “cad” if the object modeled is part of the schematic of the panel, and “net” if it is recognized by the neural network in the actual picture we are comparing to the schematic. Moreover, we also compute a graph of topological relations among objects, providing information on relative position and distance among objects. The relative position and the distance among components are actually calculated by our ASP program, but for simplicity, we assume here they are given in input as facts of the form:
The between predicate denotes the neighbors for the component ID along the direction DIR having MEM as membership; additionally, the manhattan predicate specifies the manhattan distance between the two components ID1 and ID2, where the terms MEM1 and MEM2 stand for their membership.
4.2 ASP program
We now present ASP program (see Encoding 1) that encodes in a uniform way (w.r.t. the input instance provided as a set of facts) the compliance problem. First, the graph is preprocessed (lines 2-3), by calculating useful information about the relative positions of the objects. Next, according to the “guess-and-check” programming methodology, a disjunctive rule guesses the mapping between “cad” components of the schematic and “net” components predicted by the neural network (see lines 6-7).
The disjunctive rule can be read as follows: “Given a cad component and a net component of the same type, the two can be mapped, or not.” The candidate solutions are filtered out by the constraints in lines 9-13, ensuring that the same element of the cad is not mapped twice, and the same element of the net is not mapped twice.
The optimal mapping is obtained by weak constraints in lines 15-35. In detail, the program first minimizes the cad elements without a mapping (lines 15-16), then (also in order of priority) the weak constraints in lines 18-31 ensure that “If a cad component ID1 is mapped to a net component ID2, ID1 neighbors should be mapped to ID2 neighbors.”
The mapping is further optimized considering the distance (lines 33-35) between cad components and net components. The distance is optimal when the elements are in the same position in “net” and $\text{"cad."}$ Finally, the program identifies components that are absent or in excess w.r.t. the schematic by rules in lines 37-40.
5 Evaluation
This section describes a suite of experiments we conducted, devoted to demonstrating the effectiveness of our approach and its suitability for the industrial scenario. Specifically, we are interested in evaluating the capability of DL-based approach in recognizing the cabinet components when no training data are available, and the scalability of the ASP-based technique in verifying the conformance of the image with the schematics.
5.1 Experimental setup
We set up the experimentation by considering the extreme scenario where no labeled examples are available. Therefore, our training set used includes only the synthetically generated images (by using the data augmentation techniques described in Section 3), while the real pictures of the EPCs are used to evaluate the predictive performances. The final result of this process is a training set composed of $\sim$ 10,000 colored images synthetically generated with size ( $320 \times 320$ ) and a test set of 32 images depicting real control panels with the same size as the training ones.
The model discussed in Section 3 has been implemented in the form of a python prototype based on TensorFlow Footnote 2 library. The experiments were performed on an NVidia DGX Station equipped with 4 GPU V100 32GB. As described in Section 3, a ResNet instance (including 101 layers) is used as the backbone of the component detection model, the Mask R-CNN, that is trained over 200 epochs with $batch\_size = 2$ , while Adam is adopted as optimizer with learning rate $\mathit{lr}= 10^{-4}$ .
To assess the capability of the proposed approach in detecting the components installed in the ECPs, a number of traditional measures and well-known metrics for the Object Recognition tasks have been used. In this sub-section, we briefly introduce and define such measures. The first measures we consider are the standard Precision and Recall metrics, defined as $p = \frac{TP}{TP + FP}$ and $r = \frac{TP}{TP + FN}$ . Here, TP, FP, FN, and TN denote respectively the number of cases that are: positive and correctly classified, positive and incorrectly classified, negative and incorrectly classified, and negative and correctly classified. Hence, a Precision-Recall Curve can be obtained by computing and plotting the precision against the recall for different threshold values (i.e. the detection probabilities of the model).
In an object detection scenario, precision and recall represent the capability of the prediction model to identify the boxes that contain the target objects. In particular, for a given object the focus is on comparing the true bounding box with the predicted bounding box, and the TP, FP, FN, and TN values depend on the degree of overlap between these two boxes. Given two boxes, the Intersection Over Union (IoU) is defined as the fraction of the overlapping area between the ground truth b and the predicted bounding box $\hat{b}$ :
Then, given a threshold $\theta$ , an object with a true bounding box b and a predicted bounding box $\hat{b}$ is positive if $\mathit{IoU}(b,\hat{b})>\theta$ , and negative otherwise. For a given $\theta$ , it is possible to devise a precision-recall curve by plotting all p/r values relative to all objects and interpolating the resulting curve (He et al. Reference He, Gkioxari, Dollár and Girshick2020; Ren et al. Reference Ren, He, Girshick and Sun2015).
Since the values of precision and recall are defined on a given $\theta$ threshold, we can define (He et al. Reference He, Gkioxari, Dollár and Girshick2020; Ren et al. Reference Ren, He, Girshick and Sun2015) Average Precision and Recall as the area represented by integrating over all possible thresholds:
Finally, by averaging $\mathit{AP}$ (resp. $\mathit{AR}$ ) over all class components we can finally obtain the mean average precision ( $\mathit{mAP}$ ) and mean average recall ( $\mathit{mAR}$ ) measures.
5.2 Evaluation results
Here, we discuss the results in terms of the effectiveness of the DL-Based detection model and the scalability of the ASP module. For the first aspect, the detection model exhibits optimal performances for both the quality measures, exhibiting values of $\mathit{mAP}=0.954$ and $\mathit{mAR}=0.935$ . In order to evaluate the operational applicability of our approach in a real scenario, we conducted a further analysis by considering the values of precision and recall on a fixed $\mathit{IoU}$ threshold $\theta=0.5$ . Basically, in this test case, precision and recall, respectively, represent the capability of the model to correctly recognize the components depicted in the picture and the percentage of recognized components w.r.t. the ground truth.
Figure 4a reports the resulting precision-recall curve. The resulting area is $0.947$ , which denotes a good performance of the detection model also considering the operational case. Figure 4b shows a more detailed picture of the model performances by plotting the pr-curve for each instance. As expected, for almost all instances, the yielded curves highlight the good predictive accuracy of the model in recognizing the different types of components, except for one case in which the quality is slightly lower. The above evaluation shows that the component detection module is effective in recognizing the components of a panel: in particular, prediction errors can occur in rare cases with inaccurate image acquisition (e.g. non-frontal framing or inclusion of elements external to the cabinet) as the catalog provides only a limited number of component perspectives. An example of such behavior is depicted in Figure 5, where accuracy is affected by the wrong perspective of the image. Since the ASP program performs the compliance task with optimal accuracy in our benchmark images, the accuracy of the system corresponds with the one of the neural model.
One might wonder whether the ASP component is efficient thus in a further experiment the execution time of the ASP-based component was measured. We generated instances of compliance testing in a range of 6–50 labels (types of components), and of 12–75 components and averaged over 500 samples the execution time needed by our ASP engine DLV2 (Alviano et al. Reference Alviano, Calimeri, Dodaro, Fuscà, Leone, Perri, Ricca, Veltri and Zangari2017) to solve the instances. The results reported in Figure 6 show that our system provides answers in a short time, in the order of seconds for instances sized as real-world ones, and performance is acceptable (avg. 1.93 s, max about 18 s) also for instances of 75 components.
6 Related works
In this section, we survey some relevant works that try to address the product quality assurance problem by leveraging AI-based strategies, then we review some preliminary works proposing solutions to integrate ML techniques with logic programming.
Compliance Checking through Machine Learning. To the best of our knowledge, the problem of assessing the compliance of a product with its schematic through Artificial Intelligence techniques is new and quite unexplored however, some recent works tried to tackle similar tasks, in particular within Predictive Maintenance field. For instance, Tanuska et al. (Reference Tanuska, Spendla, Kebisek, Duris and Stremy2021) propose a comprehensive framework integrating Industrial Internet of Things (IIoT) devices, neural networks, and sound analysis for detecting anomalies in the production chain. Schmitt et al. (Reference Schmitt, Bönig, Borggräfe, Beitinger and Deuse2020) define a holistic solution for quality inspection based on merging Machine Learning techniques and Edge Cloud Computing technology. A Deep Learning based approach for monitoring the process of sealing and closure of matrix-shaped thermoforming food packages is proposed by Banus Paradell et al. (Reference Banus Paradell, Boada, Xiberta, Toldra and Bustins2021). Specifically, Computer Vision techniques are exploited to process the images and perform quality checking. A comparison analysis performed by ranging different Convolutional Neural Network architectures (e.g. ResNet50, VGG19, ImageNet, etc.) highlights the best solutions to address this task. Villalba-Diez et al. (Reference Villalba-Diez, Schmidt, Gevers, Meré, Buchwitz and Wellbrock2019) propose a deep neural network (DNN) soft sensor enabling fast quality control for the Printing Industry. Basically, the solution allows for comparing the scanned surface of the print with the correspondent file that generated it and performs an automatic quality control process by learning features through exposure to training data. Subakti and Jiang (Reference Subakti and Jiang2018) define and develop a deep learning-based framework to detect/recognize different machines and portions of machines for smart factories. MobileNets is used as the backbone for the machine recognition model, and it is deployed on mobile devices to support the operators in performing the machine classification through an augmented reality system. Experimental results on a real scenario show the capability of the approach in recognizing different machines and providing intuitive visualizations.
In Table 1, we compare the main approaches proposed in the literature and highlight the differences w.r.t. our solution. The main advantage of our approach (the only one based on a neuro-symbolic architecture) stays in the nature of the symbolic component that does not require additional training to deal with new (unseen) schematics. Another distinguishing feature is the ability to work with data scarcity (i.e. small training sets).
ML and ASP integration. The integration of inductive with deductive reasoning is an emerging problem in Artificial Intelligence (AI). Several proposals were made to implement the reasoning process in complex deep neural network (DNN) architectures (Kathryn and Mazaitis Reference Kathryn and Mazaitis2018; Rockt¨aschel and Riedel 2017; Yang et al. Reference Yang, Ishay and Lee2020; Lin et al. Reference Lin, Chen, Chen and Ren2019; Donadello et al. Reference Donadello, Serafini and Garcez2017). The integration of deductive logical reasoning with the Deep Learning paradigm is a novel and quite unexplored research topic, although some recent works introduced interesting preliminary solutions (Ebrahimi et al. Reference Ebrahimi, Eberhart, Bianchi and Hitzler2021). Concerning ASP, we recall that it is a declarative rule-based programming paradigm for knowledge representation and declarative problem-solving, that is known to be appropriate for executing complex knowledge-based applications (Erdem et al. Reference Erdem, Gelfond and Leone2016). One of the main issues is to incorporate high-dimensional vector space and pre-trained models for perception tasks as handled in deep learning, which limits the applicability of ASP in many practical applications involving data and uncertainty. Nonetheless, to overcome this issue a blending ASP with DL has been recently studied (Yang et al. Reference Yang, Ishay and Lee2020).
7 Conclusions and future works
Quality Control is a manually performed and prone-to-error task crucial for each company, indeed the release of defective products can damage the company’s reputation and lead to the payment of penalties to the customer.
This paper describes a Neuro-symbolic approach to checking the compliance of electrical control panels with their schematics. A picture of a control panel is fed as input to a neural network to recognize the installed components and their locations, then an ASP-based module is used to compare the scheme reconstructed from the picture with its original blueprint and detect possible mismatches/errors.
The system can handle the lack of labeled data and is resilient to noise and variety in the specifications of schematics (no additional training is required, just an updated logical representation of the schematic). The overall system has been exploited in a practical use case provided by an Italian SME leader in the production of ECPs, where it has been shown to fulfill the requirements both in terms of accuracy and evaluation time.
Despite its practical utility, there is still room for improving the proposed framework. In fact, we plan to extend it along two research directions. Concerning the model, we can improve the learning phase by adopting a Triplet Loss (Kaya and Bilge Reference Kaya and Bilge2019) architecture and by changing the model backbone (e.g. by resorting to Vision Transformers (Dosovitskiy et al. Reference Dosovitskiy, Beyer, Kolesnikov, Weissenborn, Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly, Uszkoreit and Houlsby2020)). Another potential issue is that the proposed model disregards the depth of the cabinet. In practice, we only consider a two-dimensional model where each component is placed on a plane. There are situations, however, where components partially overlap frontally but occupy different positions in depth. For these situations, a more accurate model that also addresses depth estimation should be considered.
The second line for possible is represented by the reasoning modules, where the logic programs can be calibrated to compute suggestions for the user, as well as suggest alternate schematic plans. Finally, one could study whether a tighter integration of the neural and logic-based components can enhance the results provided by the vision procedure.