Hostname: page-component-cd9895bd7-8ctnn Total loading time: 0 Render date: 2024-12-31T22:48:59.315Z Has data issue: false hasContentIssue false

Developing a data analytics toolbox for data-driven product planning: a review and survey methodology

Published online by Cambridge University Press:  18 November 2024

Melina Panzner*
Affiliation:
Digital Engineering, Fraunhofer Institute for Mechatronic Systems Design, Paderborn, Germany
Sebastian von Enzberg
Affiliation:
IWID, Hochschule Magdeburg-Stendal, Magdeburg, Germany
Roman Dumitrescu
Affiliation:
Heinz Nixdorf Institute, University of Paderborn, Paderborn, Germany
*
Corresponding author: Melina Panzner; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

The application of data analytics to product usage data has the potential to enhance engineering and decision-making in product planning. To achieve this effectively for cyber-physical systems (CPS), it is necessary to possess specialized expertise in technical products, innovation processes, and data analytics. An understanding of the process from domain knowledge to data analysis is of critical importance for the successful completion of projects, even for those without expertise in these areas. In this paper, we set out the foundation for a toolbox for data analytics, which will enable the creation of domain-specific pipelines for product planning. The toolbox includes a morphological box that covers the necessary pipeline components, based on a thorough analysis of literature and practitioner surveys. This comprehensive overview is unique. The toolbox based on it promises to support and enable domain experts and citizen data scientists, enhancing efficiency in product design, speeding up time to market, and shortening innovation cycles.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Introduction

Recent technical developments have enabled the collection and analysis of vast quantities of data from cyber-physical systems (CPS) during their operational phase. Manufacturers can leverage product usage data to gain a deeper understanding of how products are used and how they perform. Insights can then be fed back into product planning, where, for example, new product requirements or new product ideas can be derived. Product planning, the initial stage of product development, identifies future potential, discovers product ideas, and plans business strategies (Gausemeier et al., Reference Gausemeier, Dumitrescu, Echterfeld, Pfänder, Steffen and Thielemann2019). Data analytics, drawing from various disciplines such as statistics and machine learning (ML), plays a crucial role in optimizing existing and future products by integrating analytical insights into decision-making processes.

The research area of data-driven product planning encompasses CPS, product planning, and data analytics (Meyer et al. Reference Meyer, Wiederkehr, Koldewey and Dumitrescu2021). Although the utilization of data analytics in product planning is not a novel concept, new and previously unidentified options are being introduced by the emergence of novel data sources generated by smart and connected products (Kusiak and Smith Reference Kusiak and Smith2007; Wilberg et al. Reference Wilberg, Triep, Hollauer and Omer2017). The availability of use phase data, including failure messages, system status data, and customer feedback, has led to the emergence of several new strategic options and use cases. In this context, the term ‘use phase data’ refers to all data generated and collected by the product itself (through sensors and actuators), an associated service, or its users during the use phase of the product. This use phase data is particularly valuable for strategic product planning, as it is characterized by good availability and enables the systematic assessment of product characteristics that were essentially defined much earlier in the planning phase (Bosch-Sijtsema and Bosch Reference Bosch-Sijtsema and Bosch2015; Ehrlenspiel and Meerkamm Reference Ehrlenspiel and Meerkamm2013).

The implementation of data analytics in product planning presents significant challenges for companies (Hou and Jiao Reference Hou and Jiao2020; Wilberg et al. Reference Wilberg, Triep, Hollauer and Omer2017), particularly in terms of organizational hurdles such as a lack of expertise, a shortage of qualified staff, a dominance of domain specialists, and a limited awareness of the benefits of data analytics and AI, especially among small and medium-sized enterprises (SMEs; Coleman et al. Reference Coleman, Goeb, Manco, Pievatolo, Tort-Martorell and Reis2016; Hopkins and Booth Reference Hopkins, Booth and Marion2021). The successful implementation of data analytics processes necessitates a comprehensive understanding of the entire process, including all stages. It is of the utmost importance to note that the design of the data analytics workflow or pipeline involves the assembly of appropriate components for tasks such as data cleaning, preprocessing, feature extraction, modeling, and post-processing (Reinhart et al. Reference Reinhart, Kühn and Dumitrescu2017; Shabestari et al. Reference Shabestari, Herzog and Bender2019). There is no universal method or algorithm that can be applied to all problems and domains (Hilario et al. Reference Hilario, Kalousis, Nguyen and Woznica2009; Shabestari et al. Reference Shabestari, Herzog and Bender2019); The selection of components is dependent on the target application and the available data (Brodley and Smyth Reference Brodley and Smyth1995; Nalchigar et al. Reference Nalchigar, Yu, Obeidi, Carbajales, Green, Chan, Paolo and Barbara2019). It is essential to consider the entirety of the process, from the definition of use cases to the holistic evaluation of models.

It is currently not possible for implementers to benefit from many best practices, as companies rarely implement data-driven product planning projects. This is due to the fact that products are still insufficiently equipped with sensors (Meyer et al. Reference Meyer, Fichtler, Koldewey and Dumitrescu2022). In particular, in the context of engineering, where innovation cycles are lengthy and costly, the reliability and trustworthiness of analytics pipelines must be established (Saremi and Bayrak Reference Saremi and Bayrak2021). This can be achieved, for instance, through the traceability of the analytics process. The aforementioned factors render the design of a comprehensive data analytics pipeline for data-driven product planning a challenging and largely automatable task that necessitates expert knowledge.

In the context of skills shortages (Bauer et al. Reference Bauer, Stankiewicz, Jastrow, Horn, Teubner and Kersting2018), the democratization of data science through automated ML, no-code tools, and training initiatives represents a promising approach to empowering both non-experts and domain experts to engage in analytics tasks. Training and learning are crucial to prevent failures, with the provision of guidelines, best practices, and templates aiding continuous learning for citizen data scientists (Blackman and Sipes Reference Blackman and Sipes2022). The structuring of analytics knowledge in a toolbox streamlines the design and implementation of pipelines for data-driven product planning, offering insights into components such as use cases, data types, preprocessing methods, models, and evaluation metrics. This systematic approach reduces the vast solution space of possible pipeline components. In order to address research questions about potential applications, preprocessing methods, models, and evaluation metrics for data-driven product planning, a systematic literature review (SLR) and a practitioner survey were conducted. Our contributions include the collation of results in a morphological box and an investigation of the potential of the toolbox based on this to generate pipelines for data-driven product planning.

The paper is structured as follows: Section Foundations provides an overview of the data analytics process, outlining the steps of a data analytics pipeline and the challenges associated with data-driven product planning. At the end of the section, a generic pipeline is derived, which is to be populated with concrete components. Section Related work presents a brief review of related work. Section Research methodology then describes the research method, while Section Results presents the results. Section Data analytics toolbox for data-driven product planning translates the results into a methodological tool and provides an outlook on a potential application. Finally, the last section summarises the limitations of the study and outlines future work.

Foundations

Foundations in product planning

The activities that precede product development are critical to the success of new products (Cooper Reference Cooper1986). These activities are referred to as strategic product planning (Gausemeier et al. Reference Gausemeier, Dumitrescu, Echterfeld, Pfänder, Steffen and Thielemann2019) or phase zero of product development (Ulrich and Eppinger Reference Ulrich and Eppinger2016). Strategic product planning covers the process from determining the potential for future success to the creation of development orders (Gausemeier et al. Reference Gausemeier, Dumitrescu, Echterfeld, Pfänder, Steffen and Thielemann2019). It addresses the following areas of responsibility: potential identification, product identification, and business planning. The aim of potential identification is to find future success potential and the corresponding business options. The aim of product identification is to find new product ideas that take advantage of the recognized potential for success. Business planning starts with the business strategy, that is the question of which market segments should be covered. Based on this, the product strategy and the business plan are developed. The use of data analytics offers great added value, particularly in the context of potential and product identification. For example, by uncovering weaknesses, patterns, and trends in use phase data or extracting information from it through analyses such as defect detection or clustering, potential improvements to existing products can be uncovered and new ideas for product features developed.

Foundations in data science

The data analytics process typically comprises six iterative phases outlined in the Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer Reference Shearer2000): Business understanding, data understanding, data preparation, modeling, evaluation, and deployment. Other similar processes like the Knowledge Discovery in Databases (KDD) process (Fayyad et al. Reference Fayyad, Piatetsky-Shapiro and Smyth1996) or data mining methodology for engineering applications (DMME; Huber et al. Reference Huber, Wiemer, Schneider and Ihlenfeldt2019) share core tasks such as domain understanding, data understanding, preprocessing, model building, and evaluation (Kurgan and Musilek Reference Kurgan and Musilek2006).

One task in domain understanding is to transform the business goal into a data analytics goal (Chapman et al. Reference Chapman, Clinton, Kerber, Khabaza, Reinartz, Shearer and Wirth2000). While the business objective describes a business or economic perspective, the data analytics goal contains the tasks to be fulfilled from a data analytics view. For example, a data analytics goal of the business goal of product improvement can be fault diagnosis. Data analytics goals can further be described as data analytics problems (e.g., classification or clustering). These problem types can further specify the goal (e.g., fault diagnosis) by determining the analytics solution (e.g., dependency analysis).

The main goal of data understanding is to gain general insights about the data that will potentially be helpful for further steps in the data analysis process (Berthold et al. Reference Berthold, Borgelt, Höppner and Klawonn2010). Before the properties of the data can be analyzed, the relevant data must first be determined and collected. The data generated during the use phase of a product are very diverse and emerge at different locations throughout the company. Examples of use phase data or data in the middle of the product life cycle (MOL) are user manuals, product information, product status information, and usage environment information (Li et al. Reference Li, Tao, Cheng and Zhao2015). The multiplicity of different data sources is accompanied by the heterogeneity of their properties, such as structured and unstructured data or signal and text data.

Data preprocessing is the process of making real-world data more suitable for the data analytics process and quality input (Shobanadevi and Maragatham Reference Shobanadevi and Maragatham2017). Real-world data, and usage data in particular are very differently positioned in terms of their characteristics and data quality. There are different techniques to make the data suitable for analyzing purposes (Jane Reference Jane2021). It includes operations such as data cleaning (e.g., missing values handling, outlier detection), data transformation (numeralization, discretization, normalization, and numerical transformations), dimensionality reduction, and feature extraction (FE ; Li Reference Li2019).

Selecting the appropriate model during model building that provides the desired output is a well-known problem in ML. For use cases in data-driven product planning such as analysis of errors, many algorithms can be considered, since both unsupervised and supervised methods are useful, and the data basis can be so diverse.

After modeling, the topic of evaluation is central to any data analysis process. Evaluation serves the purposes of performance measurement, model selection, and comparison of models and algorithms (Raschka Reference Raschka2018). Since the space of possible models in data-driven product planning is currently very large, the potential metrics are also numerous, which also complicates the selection of evaluation metrics.

This process with an end-to-end sequence of steps to achieve the goal, is also referred to as a data analytics pipeline or workflow (Braschler Reference Braschler2019). Each data analytics task is unique and requires a tailored pipeline. Pipelines vary in granularity and need to be designed with specific components for specific projects or use cases. Selecting pipeline components to build the best-performing pipeline is an important task for data scientists, involving critical design choices and consideration of individual requirements and dependencies (Nalchigar and Yu Reference Nalchigar and Yu2018; Zschech Reference Zschech2022). Figure 1 summarizes the key steps in the data analytics pipeline. This pipeline forms the structuring basis for the toolbox to be developed.

Figure 1. Generic data analytics pipeline for data-driven product planning.

Foundations of democratization and enablement for data science

The democratization of data science and ML is becoming increasingly important due to the shortage of skilled professionals mentioned above. One approach to democratization is the use of efficient tools that automate parts of the analytics pipeline. These tools are often based on automated ML (Masood and Sherif Reference Masood and Sherif2021; Prasad et al. Reference Prasad, Venkata, Senthil Kumar, Venkataramana, Prasannamedha, Harshana and Jahnavi Srividya2021). However, these approaches reach their limits in some scenarios, such as those where trust and transparency are critical, and in exploratory situations where the problem is not well defined. The lack of human control and interpretability can be problematic here (Lee et al. Reference Lee, Macke, X, Lee, Huang and Parameswaran2019). Furthermore, such automated tools carry the risk that data science novices or untrained citizen data scientists in particular will use them without any contextual understanding, increasing the likelihood of errors. This is because AutoML tools do not compensate for gaps in expertise, training, and experience (Blackman and Sipes Reference Blackman and Sipes2022). For this reason, more educational approaches may be used in some places to build the necessary background knowledge and thus reduce the risk of error. While offerings such as online courses, training, and meet-ups promise to provide the necessary in-depth understanding, such one-off measures are not usually able to provide the necessary ongoing practical experience; guidelines, best practices, and templates can also help here.

Specification of objectives

In summary, the challenge is that the process of determining suitable pipeline components for use cases in product planning is particularly difficult for domain experts and so-called citizen data scientists due to the huge possible solution space of methods and algorithms and the strong interdependence of the steps and components. However, this process should not be fully automated for didactic reasons and due to the frequent iterations required in the context of explorative potential identification. Therefore, an overview of relevant analytics components is required first, followed by selection support in the next step. The following objectives should therefore be considered:

  • - Provision of the relevant components for the possible data analytics pipelines of data-driven product planning: goals and problems, data characteristics, preprocessing methods, algorithms, and evaluation metrics

  • - An approach for the transparent and independent selection of individual components relevant to the use case

  • - A transparent, explanatory, and methodical tool that can be further developed into a digital assistant.

Related work

In this section, existing approaches that have similar overall objectives-providing an overview of relevant analytics components in product planning as well as guiding the analytics process and the design of analytics pipelines-are introduced.

With regard to the first objective, it should be noted that, to the best of our knowledge, no study compiles relevant components specifically for product planning and the early identification of product potentials. However, several SLRs have already been conducted in the context of product development (Fournier-Viger et al. Reference Fournier-Viger, Nawaz, Song, Gan, Michael, Irena, Adrien, Tassadit, Benoît and Luis2022; Quan et al. Reference Quan, Li, Zeng, Wei and Hu2023; Shabestari et al. Reference Shabestari, Herzog and Bender2019; Souza et al. Reference Souza, Jesus, Ferreira, Chiroli, Piekarski and Francisco2022).

In her dissertation, Eddahab presents demonstrative functional elements of a smart data analytics toolbox, which is intended to support product designers in product improvement based on product usage data (middle-of-life data [MoLD]). Using various proprietary and existing algorithms, she implemented three functions; (1) the merging of different MoL data streams from multiple sensor sources and recommendations for the designer in the form of an action plan on what to do with the product, (2) recommendation of task-relevant data analysis tools such as Support Vector Machines, and (3) intelligent user identification to create a secure analysis environment. This data analytics toolbox is a knowledge-based system that recommends relevant data analysis algorithms, among other things. However, it focuses on the recommendation of very specific actions based on anomalies in sensor data and not on the exploratory detection of trends and initial tendencies in any operating data. Furthermore, it does not claim to empower its users for the data analytics process and individual use cases by enabling a transparent and independent selection of components from application definition, data identification, preprocessing, and modeling.

Flath and Stein introduce a data science toolbox for industrial analytics applications and highlight key data preparation and analysis steps to address challenges such as the acquisition of relevant data, data preprocessing, model selection, and result interpretation (Flath and Stein Reference Flath and Stein2018). The toolbox comprises five steps: data collection phase, exploratory data analysis, selection of evaluation metric, algorithm selection, and derivation of features. The toolbox therefore provides more of a framework; no specific procedures for the respective steps are provided.

Ziegenbein et al. present a systematic algorithm selection method for production contexts, utilizing the CRISP-DM process. Their approach involves integrating pre-selected ML algorithms and relevant data sources using quality function deployment (QFD) during the data understanding phase. The evaluation considers criteria such as data source availability, implementation effort, and ML procedure suitability for objectives. The method comprises two steps: entering data sources with weights based on implementation effort and representability, and selecting ML methods based on strengths and weaknesses, considering factors like model accuracy and computing effort. The final step prioritizes ML methods based on how well they match the properties of selected data sources, identifying the most suitable method for the application. The fact that the method uses decision criteria to select suitable algorithms creates a transparent procedure for the user. However, the selection support is limited to the algorithm; the other steps of the pipeline are not taken into account. In addition, product planning is not focused.

Riesener et al. introduce an evaluation method designed to assist users lacking in-depth knowledge of ML algorithms in identifying strengths and weaknesses of selected algorithms for potential use in the product development process (Michael Riesener et al. Reference Michael, Christian, Michael and Niclas2020) Through an SLR, they identified 11 dominant ML algorithms in product development, such as k-nearest Neighbor Algorithm, Support Vector Machine, and Decision Tree. The authors established nine evaluation criteria crucial for pre-selecting ML algorithms, covering aspects like learning tasks, accuracy, training duration, and computational effort. The identified algorithms were then evaluated using these criteria through literature findings and an expert survey. The authors propose selecting an algorithm based on a problem description, which outlines the task (e.g., clustering and classification) and solution requirements. The best-fitting algorithms are determined by assessing the minimal distance between the problem description and the algorithm evaluations. The approach provides relevant algorithms for product development and offers a fairly transparent tool thanks to criteria-based selection. However, the approach is limited to algorithms for modeling.

Nalchigar and Yu developed a modeling framework for analyzing requirements and designing data analysis systems. The framework combines three views: business, analytics design and data preparation to support the design and implementation of holistic analytics solutions (Nalchigar and Yu Reference Nalchigar and Yu2018). Their solution patterns for ML are built on this (Nalchigar et al. Reference Nalchigar, Yu, Obeidi, Carbajales, Green, Chan, Paolo and Barbara2019). These represent generic ML designs for commonly known and recurring business analytics problems such as fraud detection. Their meta-model describes the different elements of a solution pattern and shows their semantics relationships. The solution patterns offer a solution approach for the conception of data analytics pipelines for generally known business analytics problems. However, they only provide for the use of the patterns with defined components rather than the independent and learning-promoting selection of the relevant components for an individual use case.

Tianxing et al. propose a domain-oriented multi-level ontology (DoMO) by merging and improving existing data mining ontologies (Tianxing and Zhukova Reference Tianxing and Zhukova2021). It includes four layers: (1) restrictions described by data characteristics, (2) definition of domain data characteristics, (3) core ontology for a specific domain, and (4) user queries and the generation of a data mining process. As an intelligent assistant, the purpose of DoMO is to help especially non-experts in describing data in the form of ontology entities, choosing suitable solutions based on the data characteristics and task requirements, and obtaining the data processing processes of the selected solutions. Non-experts can benefit from the ontology in the form of assistants by querying suitable solutions based on specific task requirements and data characteristics. However, defined knowledge is not made accessible.

All in all, none of these approaches provides the necessary overview of possible pipeline components along the entire data analytics process for product planning. Furthermore, not all these approaches function as transparent learning tools for non-experts.

Research methodology

The objective of the presented research is to support the design of data analytics pipelines in data-driven product planning. To this end, we aim to identify typical data analytics pipeline components within this domain that can be arranged in a toolbox. To achieve this goal, a SLR was carried out. To obtain the whole picture and not only the academic view afterward a survey with data scientists was conducted.

SLR

The review follows the guidelines by Kitchenham et al. and Kuhrmann et al. In accordance with these guidelines, the review passes through three main phases (Kitchenham et al. Reference Kitchenham, Brereton, Budgen, Turner, Bailey and Linkman2009; Kuhrmann et al. Reference Kuhrmann, Fernández and Daneva2017): (1) planning and preparation of the review, (2) conducting the review including data collection and study selection, and (3) analysis. In the following subsections, we will describe each of these phases in more detail.

Planning the review

In this phase, the research objectives and the way the review will be executed were defined. We formulated the following research questions structured according to the general pipeline steps and the identified challenges (section 2):

RQ1: For which applications in product planning is data analytics used?

The use of data analytics in product planning offers a lot of potential, but one major challenge is the definition of suitable use cases that can be realized with data analytics. Even if the business goals are clear, the data perspective and cases that can be implemented through analytics may be missing or unclear. Therefore, we investigate in the literature which specific goals and problems in product planning are solved using data analytics techniques.

RQ2: Which preprocessing methods (cleaning, transformation, and FE) are typically used?

Since the data generated in the operational phase of products and used for product planning is very heterogeneous, and sometimes of poor quality, pre-processing is essential. Which techniques are important for the relevant use cases and associated algorithms will be investigated based on this research question.

RQ3: What algorithms/models are typically used for modeling?

At the center of the pipeline is the algorithm, which takes the pre-processed data as input and generates, for example, cluster assignments, classes, etc., depending on the problem type—depending on the type of problem. As mentioned before, the choice of the appropriate algorithm is not trivial. There are a large number of different possibilities, plus the factors that need to be considered, such as the objective and the data. To narrow down the solution space and simplify the selection, we analyze which models are increasingly used in the literature to solve product planning problems.

RQ4: What evaluation metrics are used?

The evaluation of the models is important to assess the results. Since both supervised and unsupervised methods are used in product planning, the number of possible evaluation metrics is large. Which ones are relevant to product planning models and problems will be answered by this question.

To gain a comprehensive picture of the research area, we created a concept map to identify key concepts and their synonyms (Rowley and Slack Reference Rowley and Slack2004). We used the concept map to iteratively develop and evaluate search strings by combining the concepts represented in the map. After several rounds of refinement, we finalized the two-part search string that addressed (1) the applications and (2) the analytics components (Figure 2). To identify the use cases and applications of data analytics in product planning, and to narrow the search results, we focused on review articles and case studies. Our search string was applied only to titles, abstracts, and keywords. To further reduce the number of hits, we defined inclusion and exclusion criteria (Table 1).

Figure 2. Procedure of the systematic literature review.

Table 1. Inclusion and exclusion criteria

Conducting the review

In the execution phase, the search string was used to search the databases: We performed an automated search in the online libraries IEEE Xplore, Scopus, SpringerLink, and ScienceDirect in October 2023. These are often mentioned among the best standard libraries (e.g., Kuhrmann et al. Reference Kuhrmann, Fernández and Daneva2017). The detailed procedure is shown in Figure 2.

Our queries returned a total of 1,603 papers. The dataset was reviewed and analyzed for publications relevant to our research questions. This step in the selection of studies was carried out using the principle of double-checking within the team of authors. Our first step was to check the title, abstract, and keywords of each paper. If these sections did not provide relevant information, we looked at the conclusion for possible relevance. Using the criteria outlined in Table 1, we then filtered out publications that did not meet our inclusion criteria and retained those that did for further analysis. After this review, 78 publications were included in this step. In addition, we used a snowballing search. This involved two main steps: backward snowballing and forward snowballing. Backward snowballing involved searching for additional studies cited in the papers we had selected. Similarly, forward snowballing aimed to find articles that cited the selected papers. Using these snowballing search methods and our study selection criteria, we identified 15 additional papers through backward snowballing and 12 through forward snowballing. After careful assessment of these papers against our inclusion and exclusion criteria, we determined that 22 of them were relevant to our study. After further quality control to ensure the quality of the selected set and duplicate removal, we obtained a final dataset of 82 relevant papers.

Analysis

The publications that passed the check were carefully studied and the applications, pre-processing techniques, models, and evaluation metrics were extracted. In the second step, unambiguous labels were derived from the extracted content. Two researchers independently assigned the extracted applications, techniques, and metrics to the defined categories. If the assignments did not match, this was discussed and, if necessary, a different assignment was made or a more appropriate category was defined. A quantitative evaluation was then performed to determine how often the different categories of applications and algorithms could be assigned to the papers. It is assumed that the importance of algorithms is higher when the frequency of references and details in the literature is high. Categories with very few matches were reviewed to see if they could be integrated into another category. Such an evaluation was not done for preprocessing methods, as only a part of the papers dealt with them and therefore rarely more than one mention could be found. However, all mentioned methods and metrics were clustered to define larger categories, such as scaling and normalization. These were then used to structure the preprocessing methods. Evaluation metrics were also not counted at this point due to the low number of mentions but were used as input for the survey.

Survey

To evaluate the results from the literature research from a practical perspective and to enrich them with practically relevant procedures, a survey was then set up and carried out following the guidelines by Linåker et al. (Reference Linåker, Sulaman, Host and Mello2015). These guidelines show how the conducting of a survey can be divided into a number of sequential steps, tailored for software engineering research:

Defining the objective: The objective of the survey was (1) to evaluate if the identified preprocessing and modeling techniques and metrics from the scientific literature are also applied in practice and (2) to identify other relevant techniques used in practice.

Defining the target audience/population: The Target audience are data science professionals in Germany who recently worked on industry projects with use-phase data or operational data such as sensor and log data. This is to ensure that procedures relevant to this data are mentioned. The implementation of product planning use cases was not assumed, as these have only rarely or occasionally been implemented in practice to date.

Formulating a sampling design: We chose accidental sampling and we recruited participants through our network, as the willingness to participate is increased by the personal connection. This resulted in the recruitment of 35 subjects. The survey was executed in November 2022. From the 35 subjects recruited, 20 participants answered all questions, resulting in a participation rate of 57.14%.

Designing a questionnaire: The survey consisted of 19 questions in total, of which two introductory questions about the company and experience with the analysis of operational data. This was followed by several questions about models in the form: “Which of the following do you use regularly?”—about (1) models, (2) preprocessing procedures, and (3) metrics, respectively, structured by (1) analysis problem, (2) data quality issues and data type, and (3) learning mode (unsupervised vs supervised).

The questionnaire was designed as a self-administered web-based variant in order to facilitate ease of administration and to prevent any potential influence on the part of the researcher. In order to minimize the issue of lower response rates, appropriate introductions were provided and a test run was conducted with external colleagues.

Analyzing of survey data: Since the responses are nominal data, we performed a frequency analysis of the mentions.

Results

In this section, we present the results of the SLR for the research questions RQ1 and RQ3 and those of the survey, which take the practical view on RQ2, RQ3, and RQ4. The literature references for the applications and algorithms are listed in a table (see Table 2 and Table 3).

Table 2. Data analytics applications for data-driven product planning-literature overview

Table 3. Algorithms in literature used in data-driven product planning-literature overview

SLR results

RQ1: For which applications (analytics goals and problems) in product planning is data analytics used?

Figure 3 illustrates the distribution of the applications addressed in the dataset. The most prevalent theme was that of user needs. The focus was on three key areas: (1) extracting satisfaction/sentiment about the product and concrete product attributes to identify needs for action on the development side, (2) clarifying customer needs in order to obtain clues for product adaptations and new requirements, and (3) directly determining customer requirements that can be passed on to product development. The most common approach to achieving these goals was classification, although text mining was also a popular choice in this context, particularly for processing review data. Failure diagnosis was also used to obtain information about the product, with factors influencing defects uncovered to identify potential weaknesses and their causes. Here, too, classification approaches were the most frequently used, with 10 mentions. Dependency analysis was used eight times for diagnosis. The third most frequently employed approach was association mining, which was used to derive dependencies in the form of association rules. This was followed by the detection of errors, which preceded diagnosis and detected errors and problems. Classification approaches were predominantly employed in this context. Clustering was also identified as a potential solution for error detection, particularly in the absence of labels.

Figure 3. Data analytics applications for data-driven product planning in literature.

Other applications in product planning that are relevant to research include user behavior analysis, trend analysis, and user segmentation analysis. User behavior analysis and user segmentation analysis examine users in order to identify their behavior and group them in meaningful ways. This can be accompanied by further indications for product adjustments. Clustering appears to be a popular approach in this context. Trend analysis focuses on new market trends and changes, which can result in new requirements for a product.

RQ3: What algorithms/models are typically used for modeling?

The majority of the papers present supervised classification approaches. The support vector machine (SVM) model is particularly prevalent, with 26 mentions. In addition, it is necessary to consider the contributions of other techniques, such as neural networks, convolutional neural networks, and decision trees, which are situated at a greater distance from the present study. Overall, regression methods are employed to a lesser extent than other techniques.

However, unsupervised methods are also frequently employed as they facilitate the discovery of previously unknown patterns and relationships. The majority of models presented in this area focus on dependency and association analysis. Bayesian networks and association rules (Apriori) are two popular research tools. K-Means is the most frequently employed clustering algorithm. All frequencies are shown in Figure 4. Additionally, a number of individual mentions were excluded from the diagram for reasons of clarity:

Figure 4. Algorithms in literature used in data-driven product planning.

Survey results

In order to address RQ3 regarding the algorithms, an examination of the practice reveals that SVM is a highly popular choice for classification. However, its usage is almost equal to that of random forests and decision trees. Additionally, data scientists also employ statistical and regression methods with similar frequency. The k-means algorithm is the most frequently used algorithm overall, with 17 mentions. In contrast, dependency analysis and its models are employed by only a minority of data scientists on a regular basis. Figure 5 illustrates the distribution of all the algorithms mentioned in the survey.

RQ2: Which pre-processing methods (cleaning, transformation, FE) are typically used?

Figure 5. Algorithms mentioned in the survey.

Figure 6 summarizes how popular various pre-processing methods are in practice: Outliers are typically addressed through the use of statistical techniques, including means and standard deviations, as well as limits (boxplots). Scaling and normalization are also common practices in the context of preprocessing, particularly when dealing with operational data. The min-max scaler appears to be a popular choice in this regard. Systematic errors, however, appear to be addressed with less frequency. To transform the data into the desired format, one-hot encoders play a crucial role. In order to extract features from time series, the most commonly employed techniques are fast Fourier transformation and time windows.

RQ4: What evaluation metrics are used?

Figure 6. Preprocessing techniques mentioned in the survey.

The majority of evaluation metrics can be assigned to supervised learning (Figure 7). The most commonly used metrics in this context are precision and recall. In unsupervised learning, external validation is the dominant approach, whereby the output is compared with that of other experts. In this context, classification metrics can be employed once more.

Figure 7. Evaluation metrics mentioned in the survey.

For detailed descriptions of all the methods, we recommend standard literature and papers (e.g., Alloghani et al. Reference Alloghani, Al-Jumeily, Mustafina, Hussain, Aljaaf, Berry, Mohamed and Yap2020; Kubat Reference Kubat2017; Nalini Durga and Usha Rani Reference Nalini Durga and Usha Rani2020).

Data analytics toolbox for data-driven product planning

To prepare the results of the SLR and survey for further application as a transparent and methodical tool, we have summarized the most important components, namely the applications and techniques that appeared at least three times during the SLR and the survey. This has been done together with the results of a previous research study defining typical data sources and combinations of data characteristics (Panzner et al. Reference Panzner, Enzberg, Meyer and Dumitrescu2022). This information has been presented in a morphological box. The toolbox comprises five dimensions, which correspond to the five stages of the generic data analytics pipeline (cf. Figure 1). These are domain understanding, data understanding, pre-processing, modeling, and evaluation (Figure 8).

Figure 8. Toolbox of data analytics components for pipelines in data-driven product planning.

At the top, the application level is shown with the two categories analysis goal and analysis problem, which form the intersection with the business use case. This is followed by the data understanding dimension, which includes data sources for data collection and data characteristics for description. The preprocessing dimension is followed by the tasks of data cleaning, data transformation, and feature engineering. Subsequently, models for the key analysis (e.g., clustering, classification, or regression) are represented in a layer before the evaluation components are depicted. The dimensions of preprocessing and modeling were further subdivided according to individual data characteristics, such as quality and data type (Panzner et al. Reference Panzner, Enzberg, Meyer and Dumitrescu2022), and the problem types resulting from the SLR. The respective methods and procedures are listed below.

The toolbox can be employed for the design of the data analytics pipeline by mapping out the solution space of potential and relevant applications and processes, thereby facilitating an initial pre-selection. The selection of individual components and the determination of a highly specific pipeline can be further supported by illustrating the dependencies and contexts of the respective models. To this end, we propose profiles, as illustrated in Figure 9, which summarize the most crucial knowledge for the application of each component. In addition to a brief description, the profiles provide a reference to the other pipeline steps of application, data, and evaluation. Furthermore, possible user requirements are also considered in the context of the human factor. A compilation of relevant criteria in this context is presented, for example, by Nalchigar et al. and Ziegenbein et al. (Nalchigar and Yu Reference Nalchigar and Yu2018; Ziegenbein et al. Reference Ziegenbein, Stanula, Metternich, Abele, Schmitt and Schuh2019).

Figure 9. Example algorithm profile (based on details by e.g., Kotsiantis Reference Kotsiantis2013 ).

The requisite information can be extracted from standard literature, with the assistance of experts and/or automatically. By utilizing tags, which are components from the toolbox, a concrete connection to the other levels and the dependencies there can be established. This simplifies the selection of suitable pipeline elements based on a model. All of these elements provide the user with transparent explanations and demonstrate the relationships between the various pipeline components, thus enabling even users with limited prior knowledge of analytics to design a pipeline for the focused problem. Other tools, such as AutoML tools, can provide support during implementation.

A potential application of the data analytics toolbox may be as follows: A citizen data scientist is interested in analyzing usage data for the first time with the objective of generating ideas for product improvements. To this end, the individual in question decides to first perform a fault diagnosis of the product in question, with the objective of identifying any potential weaknesses. In the field of research, classification or dependency analysis is the most commonly employed methodology for achieving this objective. The data scientist, basing their decision on personal preferences, opts for a classification approach and sets out the requirements for this, including the greatest possible transparency. A consultation with domain experts revealed that status and product behavior data were of particular interest for the use case.

Subsequently, the data that has already been acquired is subjected to further analysis and its characteristics are recorded. The data scientist notes that the data are time series with continuous values that have some missing values and are high dimensional. With this information about the analytics problem and the data characteristics, the selection of modeling methods can begin. The data scientist can then proceed to examine the relevant models from the toolbox in more detail, using the profiles to identify those that are most suitable for their needs. They can then combine these models with preprocessing methods and evaluation metrics that match the model and the data.

The data scientist’s objective is to develop an easily understandable and interpretable model that can be monitored to identify the factors associated with the error messages that occur intermittently. The input variables are various machine parameters, while the target variable is the error category. The decision tree is a suitable approach for this task. In the fact sheet, the data scientist is able to identify the dependencies that still require consideration. Given the minimal preprocessing requirements of the model, the data scientist opts for a basic approach to addressing the missing data, namely linear interpolation. As a feature engineering approach, the data scientist tests a range of techniques from the toolbox. One resulting pipeline is depicted in Figure 10.

Figure 10. Example of specific data analytics pipeline.

The degree of support can be freely dosed, for example, by considering the procedures of the construction kit only as a preliminary indication of the possible solution space and freely composing them according to one’s own discretion.

Conclusion, limitations, and outlook

This paper proposes a methodological toolbox for enabling data-driven product planning in the form of a morphological box based on the results of a comprehensive SLR. The box illustrates the different components that are suitable for designing pipelines in product planning to explore new product potentials, across the different dimensions of the data analytics pipeline application, namely pre-processing, modeling, and evaluation. A toolbox based on this is intended to support data scientists and citizen data scientists in the design of tailored pipelines that take into account dependencies and different contexts. The resulting pipelines can be used as a good starting point for implementation. In addition to serving as a structuring guide and knowledge base, the toolbox is also a basis for further automation of the implementation of data analytics pipelines for data-driven product planning. In further user-centered studies with experts from product innovation, product engineering, and data analytics, repetitive steps can be identified in the usage of the toolbox. Further methods as well as tools can be identified and developed, or the usability or plug-and-play capabilities of existing tools can be improved.

One limitation of the work presented is that only applications from industry, particularly product development, were considered. Another limitation is that the relevance of analytics techniques is determined based on the number of literature found in relation to these methods. Therefore, relatively new algorithms may be disadvantaged and might not be considered within the evaluated algorithms. However, an integration of new algorithms into the general framework of the toolbox is simple. In addition, since data-driven product planning is a relatively new research field, interesting new use cases might be missing in the literature and might be unknown in practice so far. Moreover, the survey was only able to fully consider the responses of 18 participants, which precludes any representative numbers. However, the toolbox is intended as an initial knowledge base that will be expanded.

In the future, the toolbox will be transformed into a software-based expert system that enables non-experts to understand the conception of data analytics pipelines and the resulting results. Depending on the user’s level of knowledge, such a system can also provide additional explanations of terms and important background knowledge in order to train users as comprehensively as possible. In addition, further extensions are conceivable, such as the recommendation of suitable tools for the respective pipelines.

In the future, we intend to assess the usability and benefits of the data analytics toolbox via a user study.

Funding support

This work is funded by the German Federal Ministry of Education and Research (BMBF). There are no relevant financial or non-financial competing interests to report.

Competing interest

None declared.

References

Abdelrahman, O and Keikhosrokiani, P (2020) Assembly line anomaly detection and root cause analysis using machine learning. IEEE Access 8, 189661189672. doi: 10.1109/ACCESS.2020.3029826.CrossRefGoogle Scholar
Abramovici, M, Gebus, P, Göbel, JC and Savarino, P (2017) Utilizing unstructured feedback data from MRO reports for the continuous improvement of standard products. In: DS 87–6 Proceedings of the 21st International Conference on Engineering Design (ICED 17) Vol 6: Design Information and Knowledge, Vancouver, Canada, 21–25.08, pp. 327336.Google Scholar
Abramovici, M and Lindner, A (2011) Providing product use knowledge for the design of improved product generations. CIRP Annals 60, 211214. doi: 10.1016/j.cirp.2011.03.103.CrossRefGoogle Scholar
Ademujimi, T, Brundage, M, Prabhu, V (2017) A review of current machine learning techniques used in manufacturing diagnosis. In: APMS 2017 International Conference Advances in Production Management Systems, pp. 407415.CrossRefGoogle Scholar
Alloghani, M, Al-Jumeily, D, Mustafina, J, Hussain, A, Aljaaf, AJ (2020) A systematic review on supervised and unsupervised machine learning algorithms for data science. In: Berry, M., Mohamed, A., Yap, B. (eds) Supervised and Unsupervised Learning for Data Science. Unsupervised and Semi-Supervised Learning. Springer, Cham.Google Scholar
Amna, AR. and Hermanto, A (2017) Implementation of BCBimax algorithm to determine customer segmentation based on customer market and behavior. In Amien, M (ed.) Proceedings of the 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT). August 8–10, 2017, Kuta, Bali, Indonesia. Institute of Electrical and Electronics Engineers; Korea Information Processing Society. Piscataway, NJ: IEEE, pp. 15.CrossRefGoogle Scholar
Amruthnath, N. and Gupta, T (2018) A research study on unsupervised machine learning algorithms for fault detection in predictive maintenance. 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), Singapore, 2018, pp. 355361. doi: 10.1109/IEA.2018.8387124CrossRefGoogle Scholar
Angelopoulos, A, Michailidis, ET, Nomikos, N, Trakadas, P, Hatziefremidis, A, Voliotis, S and Zahariadis, T (2019) Tackling faults in the industry 4.0 era—a survey of machine-learning solutions and key aspects. Sensors (Basel, Switzerland) 20(1). doi: 10.3390/s20010109.CrossRefGoogle ScholarPubMed
Antomarioni, A, Pisacane, O, Potena, D, Bevilacqua, M, Ciarapica, FE and Diamantini, C (2019) A predictive association rule-based maintenance policy to minimize the probability of breakages: application to an oil refinery. International Journal of Advanced Manufacturing Technology 105, pp. 115. Available online at https://api.semanticscholar.org/CorpusID:164444126.CrossRefGoogle Scholar
Ashton, T, Evangelopoulos, N and Prybutok, VR (2015) Quantitative quality control from qualitative data: control charts with latent semantic analysis. Qual Quant 49, 10811099. doi: 10.1007/s11135-014-0036-5.CrossRefGoogle Scholar
Balahur, A and Montoyo, A (2008) A feature dependent method for opinion mining and classification. In: International Conference on Natural Language Processing and Knowledge Engineering, 2008. NLP-KE ’08; Beijing, China, 19–22 October 2008. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 17.Google Scholar
Bandari, D, Xiang, S, Martin, J and Leskovec, J (2019) Categorizing user sessions at pinterest. In: 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). Kyoto, Japan. Institute of Electrical and Electronics Engineers; Han’guk-Chŏngbo-Kwahakhoe. Piscataway, NJ: IEEE, pp. 18.CrossRefGoogle Scholar
Bártová, B and Bína, V (2019) Early defect detection using clustering algorithms. AOP 27, 320. DOI: 10.18267/j.aop.613.CrossRefGoogle Scholar
Bauer, N, Stankiewicz, L, Jastrow, M, Horn, D, Teubner, J, Kersting, K et al. (2018) Industrial data science: developing a qualification concept for machine learning in industrial production. doi: 10.5445/KSP/1000087327/27.Google Scholar
Bentlage, A and Ullmann, G (2014) Data mining of life cycle information. In: Proceedings of the Symposium on Automated Systems and Technologies. Garbsen, Leibniz Universität Hannover.Google Scholar
Berthold, MR, Borgelt, C, Höppner, F and Klawonn, F (2010) Data understanding. In: Guide to Intelligent Data Analysis: How to Intelligently Make Sense of Real Data. London: Springer London, pp. 3379.CrossRefGoogle Scholar
Blackman, R and Sipes, T (2022) The risks of empowering “Citizen Data Scientists”. Edited by Harvard Business Review. Available online at https://hbr.org/2022/12/the-risks-of-empowering-citizen-data-scientists.Google Scholar
Bosch-Sijtsema, P, Bosch, J (2015) User involvement throughout the innovation process in high-tech industries. Journal of Product Innovation Management 32, 793807. doi: 10.1111/jpim.12233.CrossRefGoogle Scholar
Braschler, M (2019) Applied Data Science. Lessons Learned for the Data-Driven Business. With assistance of Thilo Stadelmann, Kurt Stockinger. Cham: Springer International Publishing AG. Available online at https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=5789413.CrossRefGoogle Scholar
Brodley, C and Smyth, P (1995) The process of applying machine learning algorithms. In: Working Notes for Applying Machine Learning in Practice: A Workshop at the Twelfth International Conference on Machine Learning. NRL, Navy Center for Applied Research in AI Washington, DC, pp. 713.Google Scholar
Carmona, CJ, Ramírez-Gallego, S, Torres, F, Bernal, E, del Jesus, MJ and García, S (2012) Web usage mining to improve the design of an e-commerce website: OrOliveSur.com. Expert Systems with Applications 39, 1124311249. doi: 10.1016/j.eswa.2012.03.046.CrossRefGoogle Scholar
Chalapathy, R and Chawla, S (2019) Deep learning for anomaly detection: a survey. ArXiv abs/1901.03407. Available online at https://api.semanticscholar.org/CorpusID:57825713.Google Scholar
Chan, KY, Kwong, CK, Wongthongtham, P, Jiang, H, Fung, CKY, Abu-Salih, B. et al. (2020): Affective design using machine learning: a survey and its prospect of conjoining big data. International Journal of Computer Integrated Manufacturing 33, 645669. DOI: 10.1080/0951192X.2018.1526412.CrossRefGoogle Scholar
Chapman, P., Clinton, J, Kerber, R, Khabaza, T, Reinartz, T, Shearer, C, Wirth, R (2000) CRISP-DM 1.0: Step-by-step data mining guide. CRISP-DM consortium, http://www.crisp-dm.org.Google Scholar
Chen, X, Chun-Hsien, C, Leong, KF and Jiang, X (2013) An ontology learning system for customer needs representation in product development. The International Journal of Advanced Manufacturing Technology 67, 441453. ddoi: 10.1007/s00170-012-4496-2.Google Scholar
Chi, X, Siew, TP and Cambria, E (2017) Adaptive two-stage feature selection for sentiment classification. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Banff Center, Banff, Canada, October 5–8, 2017. Piscataway, NJ: IEEE, pp. 12381243.CrossRefGoogle Scholar
Cintaqia, P and Inoue, M (2023) New Product Development (NPD) Through Social Media-Based Analysis by Comparing Word2Vec and BERT Word Embeddings. Available online at http://arxiv.org/pdf/2304.08369v1.Google Scholar
Coleman, Shirley, Goeb, Rainer, Manco, Giuseppe, Pievatolo, Antonio, Tort-Martorell, Xavier, Reis, Marco (2016) How can SMEs benefit from big data? Challenges and a path forward. Quality and Reliability Engineering International 32, doi: 10.1002/qre.2008.CrossRefGoogle Scholar
Cooper, R (1986). An investigation into the new product process: Steps, deficiencies, and impact. Journal of Product Innovation Management 3, 7185. DOI: 10.1016/0737-6782(86)90030-5.CrossRefGoogle Scholar
Dienst, S (2014a). Analyse von Maschinendaten zur Entscheidungsunterstützung bei der Produktverbesserung durch die Anwendung eines Feedback Assistenz Systems: Universitätsbibliothek der Universität Siegen. Available online at https://books.google.de/books?id=R0gcrgEACAAJ.Google Scholar
Dienst, S (2014b) Analyse von Maschinendaten zur Entscheidungsunterstützung bei der Produktverbesserung durch die Anwendung eines Feedback Assistenz Systems. Available online at https://dspace.ub.uni-siegen.de/bitstream/ubsi/817/1/Dissertation_Susanne_Dienst_bearbeitet.pdf.Google Scholar
Djelloul, I, Sari, Z and dit Bouran Sidibe, I (2018) Fault diagnosis of manufacturing systems using data mining techniques. In: 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT). April 10–13, 2018, The Grand Hotel Palace, Thessaloniki, Greece: Conference Digest IEEE Systems, Man, and Cybernetics Society; IEEE Control Systems Society. Piscataway, NJ: IEEE, pp. 198203.CrossRefGoogle Scholar
Duan, P, He, Y, Zhang, A, Cui, J and Liu, F (2018) Big data oriented root cause heuristic identification approach based on FWARM for quality accident. In: 12th International Conference on Reliability, Maintainability, and Safety. ICRMS 2018: 17–19 October 2018, Shanghai, China. With assistance of Way Kuo. Shanghai, China, Piscataway, NJ: IEEE, pp. 7–12.CrossRefGoogle Scholar
Oliveira, E, Miguéis, VL, Borges, JL (2023) Automatic root cause analysis in manufacturing: an overview and conceptualization. Journal of Intelligent Manufacturing 34, 20612078. doi: 10.1007/s10845-022-01914-3.CrossRefGoogle Scholar
Ehrlenspiel, K, Meerkamm, H. (2013) Integrierte produktentwicklung: Denkabläufe, methodeneinsatz, zusammenarbeit: Carl Hanser Verlag GmbH Co KG.CrossRefGoogle Scholar
Ezhilarasan, M, Govindasamy, V, Akila, V and Vadivelan, K (2019) Sentiment analysis on product review: a survey. In: 8th IEEE Sponsored International Conference on Computation of Power, Energy, Information and Communication. ICCPEIC’19. Melmaruvathur, Chennai, India. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 180192.CrossRefGoogle Scholar
Fayyad, U, Piatetsky-Shapiro, G, Smyth, P (1996) The KDD Process for Extracting Useful Knowledge from Volumes of Data. Commununications of ACM 39, 2734. DOI: 10.1145/240455.240464.CrossRefGoogle Scholar
Feng, Z, Liang, M and Chu, F (2013) Recent advances in time–frequency analysis methods for machinery fault diagnosis: a review with application examples. Mechanical Systems and Signal Processing 38, 165205. doi: 10.1016/j.ymssp.2013.01.017.CrossRefGoogle Scholar
Flath, CM and Stein, N (2018) Towards a data science toolbox for industrial analytics applications. Computers in Industry 94, 1625.CrossRefGoogle Scholar
Fournier-Viger, P, Nawaz, MS, Song, W, Gan, W (2022) Machine learning for intelligent industrial design. In Michael, K, Irena, K, Adrien, B, Tassadit, B, Benoît, F, Luis, G et al. (Eds.), Machine Learning and Principles and Practice of Knowledge Discovery, vol. 1525. [S.l.]: Springer Nature (Communications in Computer and Information Science), pp. 158172.Google Scholar
Gausemeier, J, Dumitrescu, R, Echterfeld, J, Pfänder, T, Steffen, D and Thielemann, F (2019) Produktinnovation. Strategische Planung von Produkten, Dienstleistungen und Geschäftsmodellen. München: Hanser.Google Scholar
Ge, Z, Song, Z, Ding, SX and Huang, B (2017) Data mining and analytics in the process industry: the role of machine learning. IEEE Access 5, 2059020616. doi: 10.1109/ACCESS.2017.2756872.CrossRefGoogle Scholar
Giannakis, M, Dubey, R, Yan, S, Spanaki, K and Papadopoulos, T (2022) Social media and sensemaking patterns in new product development: demystifying the customer sentiment. Annals of Operational Research 308, 145175. doi: 10.1007/s10479-020-03775-6.CrossRefGoogle Scholar
Han, Y, Nanda, G, Moghaddam, M (2023) Attribute-sentiment-guided summarization of user opinions from online reviews. Journal of Mechanical Design 145, article 041402. doi: 10.1115/1.4055736.Google Scholar
He, L, Zhang, N, Yin, L (2017) Research on the evaluation of product quality perceived value based on text mining and fuzzy comprehensive evaluation. In: 2016 International Conference on Identification, Information and Knowledge in the Internet of Things—IIKI 2016. Beijing, China 2016. Beijing shi fan da xue. Piscataway, NJ: IEEE, pp. 563566.Google Scholar
Hilario, Melanie, Kalousis, Alexandros, Nguyen, Phong, Woznica, A. (2009) A data mining ontology for algorithm selection and meta-mining, pp. 7687.Google Scholar
Hopkins, Aspen, Booth, Serena (2021). Machine learning practices outside big tech: how resource constraints challenge responsible development. In Marion, F (Ed.), Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. AIES ’21: AAAI/ACM Conference on AI, Ethics, and Society. Virtual Event USA, 19 05 2021 21 05 2021. ACM Special Interest Group on Artificial Intelligence. New York, NY, United States: Association for Computing Machinery (ACM Digital Library), pp. 134145.CrossRefGoogle Scholar
Hou, L and Jiao, RJ (2020) Data-informed inverse design by product usage information: a review, framework and outlook. Journal of Intelligent Manufacturing 31, 529552. doi: 10.1007/s10845-019-01463-2.CrossRefGoogle Scholar
Huber, S, Wiemer, H, Schneider, D and Ihlenfeldt, S (2019) DMME: data mining methodology for engineering applications—a holistic extension to the CRISP-DM model. Procedia CIRP 79, 403408. doi: 10.1016/j.procir.2019.02.106.CrossRefGoogle Scholar
Jane, VA (2021) Survey on IoT data preprocessing. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(9), 238244.Google Scholar
Jiang, H, Sabetzadeh, F and Kwong, CK (2021) Dynamic analysis of customer needs using opinion mining and fuzzy time series approaches. In: IEEE CIS International Conference on Fuzzy Systems 2021. Virtual conference, July 11th–14th 2021, Conference Proceedings. Luxembourg, Luxembourg,. Institute of Electrical and Electronics Engineers. Piscataway, NJ, USA: IEEE, pp. 16.Google Scholar
Joung, J and Kim, H (2023) Interpretable machine learning-based approach for customer segmentation for new product development from online product reviews. International Journal of Information Management 70, 102641. doi: 10.1016/j.ijinfomgt.2023.102641.CrossRefGoogle Scholar
Kilroy, D, Healy, G, Caton, S (2022) Using machine learning to improve lead times in the identification of emerging customer needs. IEEE Access 10, 3777437795. doi: 10.1109/ACCESS.2022.3165043.CrossRefGoogle Scholar
Kim, H-S and Noh, Y (2019) Elicitation of design factors through big data analysis of online customer reviews for washing machines. Journal of Mechanical Science and Technology 33, 27852795. doi: 10.1007/s12206-019-0525-5.CrossRefGoogle Scholar
Kim, MJ, Lim, CH, Lee, CH, Kim, KJ, Park, Y and Choi, S (2018) Approach to service design based on customer behavior data: a case study on eco-driving service design using bus drivers’ behavior data. Service Business 12, 203227. doi: 10.1007/s11628-017-0343-8.CrossRefGoogle Scholar
Kitchenham, B, Brereton, OP, Budgen, D, Turner, M, Bailey, J, Linkman, S (2009) Systematic literature reviews in software engineering-a systematic literature review. Information and Software Technology 51, 715.CrossRefGoogle Scholar
Klein, P, van der Vegte, WF, Hribernik, K and Klaus-Dieter, T (2019) Towards an approach integrating various levels of data analytics to exploit product-usage information in product development. In Proceedings of the Design Society: International Conference on Engineering Design 1 (1), pp. 26272636. doi: 10.1017/dsi.2019.269.Google Scholar
Koli, S, Singh, R, Mishra, R, Badhani, P (2023) Imperative role of customer segmentation technique for customer retention using machine learning techniques. In Yadav, SP (Ed.), 2023 International Conference on Artificial Intelligence and Smart Communication (AISC). 27–29 January 2023, G.L. Bajaj Institute of Technology and Management, Greater Noida, India. 1/27/2023–1/29/2023. G.L. Bajaj Institute of Technology & Management; Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 243248.CrossRefGoogle Scholar
Kotsiantis, SB (2013) Decision trees: a recent overview. Artif Intell Rev 39, 261283. doi: 10.1007/s10462-011-9272-4.CrossRefGoogle Scholar
Kubat, Mv (2017) An Introduction to Machine Learning. Springer. doi: 10.1007/978-3-319-63913-0.CrossRefGoogle Scholar
Kuhrmann, M, Fernández, DM and Daneva, M (2017) On the pragmatic design of literature studies in software engineering: an experience-based guideline. Empirical Software Engineering 22, 28522891.CrossRefGoogle Scholar
Kurgan, LA and Musilek, P (2006) A survey of knowledge discovery and data mining process models. Knowledge Engineering Review 21, 124.CrossRefGoogle Scholar
Kusiak, A, Burns, A, Shah, S and Novotny, N (2005) Detection of events causing pluggage of a coal-fired boiler: a data mining approach. Combustion Science and Technology 177, 23272348. doi: 10.1080/00102200500241115.CrossRefGoogle Scholar
Kusiak, A, Smith, M (2007) Data mining in design of products and production systems. Annual Reviews in Control 31(1), 147156. doi: 10.1016/j.arcontrol.2007.03.003.CrossRefGoogle Scholar
Lee, DJL, Macke, S, X, D, Lee, A, Huang, S, Parameswaran, AG (2019) A human-in-the-loop perspective on autoML: milestones and the road ahead. IEEE Data Engineering Bulletin 42, 5970.Google Scholar
Lee, H (2017) Framework and development of fault detection classification using IoT device and cloud environment. Journal of Manufacturing Systems 43, 257270. doi: 10.1016/j.jmsy.2017.02.007.CrossRefGoogle Scholar
Li, C (2019) Preprocessing methods and pipelines of data mining: An overview.Google Scholar
Li, J, Tao, F, Cheng, Y, Zhao, L (2015) Big data in product lifecycle management. The International Journal of Advanced Manufacturing Technology 81(1), 667684. doi: 10.1007/s00170-015-7151-x.CrossRefGoogle Scholar
Li, L, Ota, K and Dong, M (2018) Deep learning for smart industry: efficient manufacture inspection system with fog computing. IEEE Transactions on Industrial Informatics 14, 46654673. DOI: 10.1109/TII.2018.2842821.Google Scholar
Li, Y, Zhang, S and Zhang, J (2020) Research on innovative clustering method of product design optimized by ant colony algorithm. In Chen, G (Ed.), Proceedings of 2020 3rd International Conference on Safety Produce Informatization (IICSPI 2020). November 28–30, 2020, Chongqing, China. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE Press, pp. 2832.CrossRefGoogle Scholar
Lim, S and Tucker, CS (2016) A Bayesian sampling method for product feature extraction from large-scale textual data. Journal of Mechanical Design 138, Article 061403. doi: 10.1115/1.4033238.CrossRefGoogle Scholar
Linåker, J, Sulaman, S, Host, M, Mello, Rde (2015) Guidelines for Conducting Surveys in Software Engineering.Google Scholar
Lindemann, B, Fesenmayr, F, Jazdi, N and Weyrich, M (2019) Anomaly detection in discrete manufacturing using self-learning approaches. Procedia CIRP 79, 313318. doi: 10.1016/j.procir.2019.02.073.CrossRefGoogle Scholar
Lo, NG, Flaus, JM and Adrot, O (2019) Review of machine learning approaches in fault diagnosis applied to IoT systems. In: 2019 International Conference on Control, Automation and Diagnosis (ICCAD). Proceedings: 2–4 July 2019, Grenoble, France. With assistance of Zineb Simeu-Abazi. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 1–6.CrossRefGoogle Scholar
Ma, H., Chu, X, Lyu, G and Xue, D (2017) An integrated approach for design improvement based on analysis of time-dependent product usage data. Journal of Mechanical Design 139, doi: 10.1115/1.4037246.CrossRefGoogle Scholar
Masood, A, Sherif, A (2021) Automated machine learning. 1st edition. Erscheinungsort nicht ermittelbar, Boston, MA: Packt Publishing; Safari. Available online at https://learning.oreilly.com/library/view/-/9781800567689/?ar.Google Scholar
Meyer, M, Fichtler, T, Koldewey, C, and Dumitrescu, R (2022) Potentials and challenges of analyzing use phase data in product planning of manufacturing companies. AIEDAM 36. doi: 10.1017/S0890060421000408.CrossRefGoogle Scholar
Meyer, M, Wiederkehr, I, Koldewey, C, Dumitrescu, R (2021) Understanding usage data-driven product planning: a systematic literature review. Proceedings of the Design Society 1, 32893298. doi: 10.1017/pds.2021.590.CrossRefGoogle Scholar
Michael, R, Christian, D, Michael, M, Niclas, K (2020) Identification of evaluation criteria for algorithms used within the context of product development. Procedia CIRP 91, 508515. doi: 10.1016/j.procir.2020.02.207.Google Scholar
Munger, T, Desa, S and Wong, C (2015) The use of domain knowledge models for effective data mining of unstructured customer service data in engineering applications. In: 2015 IEEE First International Conference on Big Data Computing Service and Applications (BigDataService 2015). Redwood City, California, USA, 30 March–2 April 2015. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 427438.CrossRefGoogle Scholar
Nagaraj, K, Killian, C and Neville, J (2012) Structured comparative analysis of systems logs to diagnose performance problems. In: 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pp. 353366.Google Scholar
Nalchigar, S. and Yu, E (2018) Business-driven data analytics: a conceptual modeling framework. Data & Knowledge Engineering 117. doi: 10.1016/j.datak.2018.04.006.Google Scholar
Nalchigar, S, Yu, E, Obeidi, Y, Carbajales, S, Green, J, Chan, A (2019) Solution patterns for machine learning. In Paolo, G and Barbara, W (Eds.), Advanced Information Systems Engineering. Cham: Springer International Publishing, pp. 627642.CrossRefGoogle Scholar
Nalini Durga, S. and Usha Rani, K (2020) A perspective overview on machine learning algorithms. In: Advances in Computational and Bio-Engineering: Proceeding of the International Conference on Computational and Bio Engineering, 2019, Volume 1. Springer, pp. 353364.CrossRefGoogle Scholar
Panzner, M, Enzberg, S, Meyer, M, Dumitrescu, R (2022) Characterization of usage data with the help of data classifications. Journal of the Knowledge Economy. doi: 10.1007/s13132-022-01081-z.CrossRefGoogle Scholar
Park, J, Yang, D and Kim, HY (2023) Text mining-based four-step framework for smart speaker product improvement and sales planning. Journal of Retailing and Consumer Services 71,103186. DOI: 10.1016/j.jretconser.2022.103186.CrossRefGoogle Scholar
Park, S, Joung, J and Kim, H (2023) Spec guidance for engineering design based on data mining and neural networks. Computers in Industry 144, 103790. doi: 10.1016/j.compind.2022.103790.CrossRefGoogle Scholar
Phua, SJ, Ng, WK, Liu, H, Li, X, Song, B (2007) Customer information system for product and service management: towards knowledge extraction from textual and mixed-format data. In Chen, J (Ed.), 2007 International Conference on Service Systems and Service Management. ICSSSM ’07]; Chengdu, China, 9–11 June 2007. IEEE Systems, Man, and Cybernetics Society. Piscataway, NJ: IEEE Service Center, pp. 16.Google Scholar
Prasad, D, Venkata, V, Senthil Kumar, P, Venkataramana, Lokeswari, Prasannamedha, G, Harshana, S, Jahnavi Srividya, S et al. (2021) Automating water quality analysis using ML and auto ML techniques. Environmental Research 202, 111720. doi: 10.1016/j.envres.2021.111720.CrossRefGoogle Scholar
Qian, ZF, Li, LY, Tao, ZQ and Kun, LL (2020): Research on sentiment analysis of two-way long and short memory network based on multi-channel data. In: 2020 IEEE 6th International Conference on Computer and Communications (ICCC). December 11–14, 2020, Chengdu, China. Institute of Electrical and Electronics Engineers; Sichuan Institute of Electronics. Piscataway, NJ: IEEE, pp. 17281732.CrossRefGoogle Scholar
Qin, B, Li, Z and Qin, Y (2020) A transient feature learning-based intelligent fault diagnosis method for planetary gearboxes. Journal of Mechanical Engineering/Strojniški Vestnik 66).Google Scholar
Quan, H, Li, S, Zeng, C, Wei, H, Hu, J (2023) Big data and AI-driven product design: a survey. Applied Sciences 13, 9433. DOI: 10.3390/app13169433.CrossRefGoogle Scholar
Rangu, C, Chatterjee, S, Valluru, SR (2017) Text mining approach for product quality enhancement: (improving product quality through machine learning). Y. Padma Sai, Y, Deepak, G (Eds.), 7th IEEE International Advanced Computing Conference. IACC 2017: 5–7 January 2017, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, Telangana, India. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 456460.CrossRefGoogle Scholar
Raschka, S (2018) Model evaluation, model selection, and algorithm selection in machine learning. Available online at http://arxiv.org/pdf/1811.12808v3.Google Scholar
Reinhart, F, Kühn, A, Dumitrescu, R (2017) Schichtenmodell für die Entwicklung von Data Science Anwendungen im Maschinen- und Anlagenbau. In: Wissenschaftsforum Intelligente Technische Systeme (WInTeSys): Heinz Nixdorf MuseumsForum, pp. 321334.Google Scholar
Rowley, Jennifer, Slack, Frances (2004): Conducting a literature review. Management Research News 27, 3139. doi: 10.1108/01409170410784185.CrossRefGoogle Scholar
Saremi, M. L., Bayrak, A. E. (2021) A survey of important factors in human—artificial intelligence trust for engineering system design. In: ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, Vol. 6: 33rd International Conference on Design Theory and Methodology (DTM). Virtual, Online, 8/17/2021–8/19/2021: American Society of Mechanical Engineers (ASME).Google Scholar
Shabestari, SS, Herzog, M and Bender, B (2019) A survey on the applications of machine learning in the early phases of product development. In Proceedings of the Design Society: International Conference on Engineering Design 1, pp. 24372446. DOI: 10.1017/dsi.2019.250.Google Scholar
Shahbaz, M, Srinivas, M, Hardin, JA and Turner, M (2006) Product design and manufacturing process improvement using association rules. In Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture 220, pp. 243254. DOI: 10.1243/095440506X78183.Google Scholar
Shearer, C (2000) The CRISP-DM model: the new blueprint for data mining. Journal of Data Warehousing 5 (4), pp. 1322.Google Scholar
Shekar, KC, Chandra, P and Rao, KV (2014) Fault diagnostics in industrial application domains using data mining and artificial intelligence technologies and frameworks. In Batra, U (Ed.) 2014 IEEE International Advance Computing Conference (IACC 2014). Gurgaon, New Delhi, India, 21–22 February 2014. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 538543.Google Scholar
Shimomura, Y, Nemoto, Y, Ishii, T and Nakamura, T (2018) A method for identifying customer orientations and requirements for product–service systems design. International Journal of Production Research 56, pp. 25852595. doi: 10.1080/00207543.2017.1384581.CrossRefGoogle Scholar
Shobanadevi, A and Maragatham, G (2017) Data mining techniques for IoT and big data — a survey. In: Proceedings of the International Conference on Intelligent Sustainable Systems (ICISS 2017). 7–8 December 2017. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 607610.CrossRefGoogle Scholar
Singal, H, Kohli, S and Sharma, AK (2014) Web analytics: state-of-art & literature assessment. In: 5th International Conference - Confluence, the Next Generation Information Technology Summit (Confluence), 2014. Noida, India, 25–26 September 2014. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 2429.Google Scholar
Solé, M, Muntés-Mulero, V, Rana, AI, Estrada, G. (2017) Survey on models and techniques for root-cause analysis. Available online at http://arxiv.org/pdf/1701.08546v2.Google Scholar
Son, Y and Kim, W (2023) Development of methodology for classification of user experience (UX) in online customer review. Journal of Retailing and Consumer Services 71, 103210. doi: 10.1016/j.jretconser.2022.103210.CrossRefGoogle Scholar
Song, H and Cao, Z (2017) Research on product quality evaluation based on big data analysis. In: 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA 2017). March 10–12, 2017, Beijing, China. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 173177.CrossRefGoogle Scholar
Souza, JTde, Jesus, RHG, Ferreira, MB, Chiroli, DMde G, Piekarski, CM, Francisco, ACde (2022) How is the product development process supported by data mining and machine learning techniques? Technology Analysis & Strategic Management, 113. doi: 10.1080/09537325.2022.2099262.CrossRefGoogle Scholar
Tan, J, Bu, YY (2010) Association rules mining in manufacturing. AMM 34–35, 651–654. doi: 10.4028/www.scientific.net/AMM.34-35.651.CrossRefGoogle Scholar
Tianxing, M, Zhukova, N (2021) The data mining dataset characterization ontology. In: Intelligent Systems and Applications, Proceedings of the 2021 Intelligent Systems Conference (IntelliSys) Volume 2, pp. 231238.Google Scholar
Ulrich, K.T and Eppinger, SD (2016) Product Design and Development. 6th edN. New York, NY: McGraw-Hill.Google Scholar
van Eck, ML, Sidorova, N, van der Aalst, WMP (2016) Enabling process mining on sensor data from smart products. In España, S, Ralyté, J, Souveyet, C (Eds.), IEEE RCIS 2016. IEEE 10th International Conference on Research Challenges in Information Science: May 1st–3rd, 2016, Grenoble, France. Piscataway, NJ: IEEE, pp. 112.Google Scholar
Vukovic, M and Thalmann, S (2022) Causal discovery in manufacturing: a structured literature review. JMMP 6, 10. doi: 10.3390/jmmp6010010.CrossRefGoogle Scholar
Wang, J, Ma, Y, Zhang, L, Gao, RX and Wu, D (2018) Deep learning for smart manufacturing: methods and applications. Journal of Manufacturing Systems 48, 144156. doi: 10.1016/j.jmsy.2018.01.003.CrossRefGoogle Scholar
Wang, K, Tong, S, Eynard, B., Roucoules, L and Matta, N (2007) Review on application of data mining in product design and manufacturing. In Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) 4, pp. 613618. Available online at https://api.semanticscholar.org/CorpusID:24731153.CrossRefGoogle Scholar
Wang, L, Liu, Z, Liu, A and Tao, F (2021) Artificial intelligence in product lifecycle management. The International Journal of Advanced Manufacturing Technology 114, 771796. DOI: 10.1007/s00170-021-06882-1.CrossRefGoogle Scholar
Wang, Y, Jiang, M (2020 Topic mining based on online shopping users’ reviews. In: 2020 International Conference on Computer Information and Big Data Applications. CIBDA 2020: Guiyang, Guizhou, China, 17–19 April 2020. With assistance of Jizhong Zhu. Piscataway, NJ: IEEE, pp. 1114.CrossRefGoogle Scholar
Wang, Y, Luo, L and Liu, H (2022) Bridging the semantic gap between customer needs and design specifications using user-generated content. IEEE Transactions on Engineering Management. 69, 16221634. DOI: 10.1109/TEM.2020.3021698.CrossRefGoogle Scholar
Weichert, D, Link, P, Stoll, A, Rüping, S, Ihlenfeldt, S and Wrobel, S (2019) A review of machine learning for the optimization of production processes. The International Journal of Advanced Manufacturing Technology 104, 18891902. Available online at https://api.semanticscholar.org/CorpusID:197432866.CrossRefGoogle Scholar
Wilberg, J, Triep, I, Hollauer, C, Omer, M (2017) Big data in product development: need for a data strategy. In: 2017 Portland International Conference on Management of Engineering and Technology (PICMET). IEEE, pp. 110.CrossRefGoogle Scholar
Yuan, J and Tian, Y (2019) A multiscale feature learning scheme based on deep learning for industrial process monitoring and fault diagnosisIn IEEE Access 7pp. 151189151202. DOI: 10.1109/ACCESS.2019.2947714.CrossRefGoogle Scholar
Zakaria, AF and Lim, SCJ (2014) A preliminary survey on modeling customer requirements from product reviews under preference uncertainty. In: 2014 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM 2014). Petaling Jaya, Selangor Darul Ehsan, Malaysia, 9–12 December 2014. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp. 10961100.CrossRefGoogle Scholar
Zang, D, Liu, J and Wang, H (2018) Markov chain-based feature extraction for anomaly detection in time series and its industrial application. In: Proceedings of the 30th Chinese Control and Decision Conference (2018 CCDC). 09–11 June 2018, Shenyang, China. Piscataway, NJ: IEEE, pp. 10591063.CrossRefGoogle Scholar
Zhang, F, Liu, Y, Chen, C, Li, YF and Huang, HZ (2014a) Fault diagnosis of rotating machinery based on kernel density estimation and Kullback–Leibler divergence. Journal of Mechanical Science and Technology 28, 44414454. DOI: 10.1007/s12206-014-1012-7.Google Scholar
Zhang, S, Wang, B and Habetler, TG (2020) Deep learning algorithms for bearing fault diagnostics—a comprehensive review. IEEE Access 8, 2985729881. doi: 10.1109/ACCESS.2020.2972859.CrossRefGoogle Scholar
Zhang, Z, Qi, J and Zhu, G (2014b) Mining customer requirement from helpful online reviews. In Da Xu, L (Ed.) 2014 Enterprise Systems Conference (ES 2014). Shanghai, China, 2–3 August 2014. Institute of Electrical and Electronics Engineers. Piscataway, NJ: IEEE, pp.249254.CrossRefGoogle Scholar
Zhao, J, Zhang, W, Liu, Y (2010) Improved K-Means cluster algorithm in telecommunications enterprises customer segmentation. In Yang, Yixian (Ed.): 2010 IEEE International Conference on Information Theory and Information Security (ICITIS 2010). Beijing, China, 17–19 December 2010. 2010 IEEE International Conference on Information Theory and Information Security (ICITIS). Beijing, China, 12/17/2010–12/19/2010. Institute of Electrical and Electronics Engineers; Beijing you dian da xue. Piscataway, NJ: IEEE, pp. 167169.Google Scholar
Zhao, K, Liu, B, and Tirpak, TM and Schaller, A (2003) Detecting patterns of change using enhanced parallel coordinates visualization. In Wu, X (Ed.) Proceedings/Third IEEE International Conference on Data Mining, ICDM 2003. 19–22 November 2003, Melbourne, Florida. IEEE Computer Society. Los Alamitos, Calif.: IEEE Computer Society, pp. 747750.CrossRefGoogle Scholar
Zhou, F, Ayoub, J, Xu, Q and Jessie Yang, X (2020) A machine learning approach to customer needs analysis for product ecosystems. Journal of Mechanical Des 142, Article 011101. DOI: 10.1115/1.4044435.CrossRefGoogle Scholar
Zhu, J, He, S, Liu, J, He, P, Xie, Q, Zheng, Z, Lyu, MR (2019) Tools and benchmarks for automated log parsing. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Software engineering in practice. ICSE-SEIP 2019: 25–31 May 2019, Montréal, Canada: Proceedings. Montreal, QC, Canada, Institute of Electrical and Electronics Engineers; IEEE Computer Society; Association for Computing Machinery. Piscataway, NJ: IEEE, pp. 121130.CrossRefGoogle Scholar
Ziegenbein, A, Stanula, P, Metternich, J and Abele, E (2019) Machine learning algorithms in machining: a guideline for efficient algorithm selection. In Schmitt, R, Schuh, Günther (Eds.), Advances in Production Research. Proceedings of the 8th Congress of the German Academic Association for Production Technology (WGP), Aachen, November 19–20, 2018. Cham, 2019. Cham: Springer International Publishing, pp. 288299.Google Scholar
Zope, K, Singh, K, Nistala, SH, Basak, A, Rathore, P and Runkana, V (2019) Anomaly detection and diagnosis. In Manufacturing Systems. In PHM_CONF 11 (1). doi: 10.36001/phmconf.2019.v11i1.815.CrossRefGoogle Scholar
Zschech, P (2022) Beyond descriptive taxonomies in data analytics: a systematic evaluation approach for data-driven method pipelines. Information Systems and e-Business Management. DOI: 10.1007/s10257-022-00577-0.CrossRefGoogle Scholar
Figure 0

Figure 1. Generic data analytics pipeline for data-driven product planning.

Figure 1

Figure 2. Procedure of the systematic literature review.

Figure 2

Table 1. Inclusion and exclusion criteria

Figure 3

Table 2. Data analytics applications for data-driven product planning-literature overview

Figure 4

Table 3. Algorithms in literature used in data-driven product planning-literature overview

Figure 5

Figure 3. Data analytics applications for data-driven product planning in literature.

Figure 6

Figure 4. Algorithms in literature used in data-driven product planning.

Figure 7

Figure 5. Algorithms mentioned in the survey.

Figure 8

Figure 6. Preprocessing techniques mentioned in the survey.

Figure 9

Figure 7. Evaluation metrics mentioned in the survey.

Figure 10

Figure 8. Toolbox of data analytics components for pipelines in data-driven product planning.

Figure 11

Figure 9. Example algorithm profile (based on details by e.g., Kotsiantis 2013).

Figure 12

Figure 10. Example of specific data analytics pipeline.