1. Introduction
Conceptual design, which translates design requirements into preliminary design solutions, is a crucial phase in the product design process (French et al., Reference French, Gravdahl and French1985). However, exploring problems and generating design solutions place high demands on designers’ knowledge and reasoning abilities (Myrup Andreasen et al., Reference Myrup Andreasen, Hansen and Cash2015), reflecting the complex nature of conceptual design. Various design theories and methodologies have been proposed to help designers gain a more comprehensive understanding of the conceptual design process and to assist designers in developing creative ideas and solutions, such as TRIZ theory (Al’tshuller, Reference Al’tshuller1999), FBS model (Gero and Kannengiesser, Reference Gero and Kannengiesser2014) and C-K theory (Hatchuel and Weil, Reference Hatchuel and Weil2009). These theories have made the representation of the conceptual design process more structured. However, the effective application of these methodologies still depends on designers’ own knowledge and experience, which pose significant challenges for novice designers. With ongoing technological progress, some computational methods and tools have been proposed to alleviate novice designers’ cognitive burden (Sarica and Luo, Reference Sarica and Luo2024; Cantamessa et al., Reference Cantamessa, Montagna, Altavilla and Casagrande-Seretti2020). For example, semantic networks (Luo et al., Reference Luo, Sarica and Wood2019; Shi et al., Reference Shi, Chen, Han and Childs2017) and case databases (Robles et al., Reference Robles, Negny and Le Lann2009; Deldin and Schuknecht, Reference Deldin and Schuknecht2013) have been established to support designers during the conceptual design stages. Although these tools provide inspiring stimuli, they do not offer corresponding solution suggestions for the specific design situation encountered. This means that designers still need to reason from the case domain to the problem domain to generate concrete solutions.
Driven by technological advancements in machine learning, various Generative AI models including transformer (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017), diffusion models (Ho et al., Reference Ho, Jain and Abbeel2020) and GANs (Generative Adversarial Networks) (Karras et al., Reference Karras, Laine and Aila2019) have demonstrated significant potential and powerful performances. Building on these technologies, applications such as ChatGPT, Stable diffusion and Midjourney are making Generative AI more accessible and easy to use for consumers. Within the realm of design, text-to-text models and text-to-image models have attracted unique attention due to their ability to integrate seamlessly with the creative process and enhance the efficiency of design iterations. These models have become the most widely utilized generative techniques in research that combine Generative AI with design processes (Wu et al., Reference Wu, Cai, Sun, Ma and Lu2024). Specifically, researchers have explored the application of text-to-text models in guiding the design process (Chen et al., Reference Chen, Jing, Tsang, Wang, Sun and Luo2024b), assisting with divergent and convergent thinking (Wang et al., Reference Wang, Petridis, Kwon, Ma and Chilton2023) and generating innovative solutions (Zhu and Luo, Reference Zhu and Luo2023), holding great potential for creativity enhancement in the innovation process (Sarica and Luo, Reference Sarica and Luo2024). For text-to-image technologies, they help designers visualize their design ideas quickly and reduce the time and skill demand of manual sketching for human designers (Choi et al., Reference Choi, Hong, Park, Chung and Kim2024). Also, text-to-image technologies can generate visual stimuli for design ideation based on user-input text prompts (Liu et al., Reference Liu, Vermeulen, Fitzmaurice and Matejka2023; Wadinambiarachchi et al., Reference Wadinambiarachchi, Kelly, Pareek, Zhou and Velloso2024).
Although researchers have recognized the importance of Generative AI in the conceptual design process, there is still a lack of empirical evidence for the effect of Generative AI in different stages of conceptual design. This gap may impede researchers from reflecting on and improving the developed collaborative tool designs. To fill the research gap, this study attempts to explore how Generative AI assists humans in conceptual design processes. Specifically, we recruited four groups of participants to finish two design tasks with (or without) the assistance of Generative AI (ChatGPT or Midjourney). We assessed human–AI collaboration in the conceptual design process through multiple dimensions, including the stages in which Generative AI helped designers, the stages led by humans, participants’ assessments of the Generative AI tool’s performance, expert ratings of the design outputs and prompt analysis of the strategies utilized by human designers during the four stages of conceptual design. We found that Generative AI primarily assists humans in the problem definition and idea generation stages, while the idea selection and evaluation stage remains predominantly human-led. Additionally, with the assistance of Generative AI, the idea selection and evaluation stage was further enhanced.
Our study provides an empirical contribution to the Generative AI-powered creativity support research by illustrating how Generative AI supports humans in conceptual design at a stage level. It further elaborates on Generative AI’s role in different stages across conceptual design. Further, we demonstrate implications for future conceptual design support under Generative AI’s help.
2. Literature review
2.1. Conceptual design
According to previous research, the product design process can be divided into four phases: analysis of a problem, conceptual design, embodiment of scheme and detailing (French et al., Reference French, Gravdahl and French1985). Among these, conceptual design, which encompasses preliminary decision-making and design concept generation, is regarded as the key part of the design process (Eppinger and Ulrich, Reference Eppinger and Ulrich1995). A few conceptual design models have been proposed to explain the stages of conceptual design. For example, Goodman-Deane et al. (Reference Goodman-Deane, Waller, Bradley, Yoxall, Wiggins and Clarkson2016) outlined that the conceptual design process encompasses four stages: manage (deciding what actions to take next), explore (identifying needs), create (generating ideas) and evaluate (judging and testing the design concepts). Jasmine (Reference Jasmine2020) delineated the design process into several distinct phases: establishing design requirements, assessing technology availability, sketching concepts and layouts, performing analysis and making trade-offs, optimizing revisions and developing a preliminary design. Some researchers also promoted applying the conventional design process model to conceptual design, such as the double diamond design process (Design Council, 2019), which includes discover, define, develop and deliver. Building on previous frameworks and considering the integral role of Generative AI, this study defines the conceptual design process as consisting of four stages: problem definition, idea generation, idea selection and evaluation and idea evolution. This serves as the foundation for our experiment and underpins the research findings and conclusions presented in this study.
Although conceptual design is essential for design processes, it is challenging to obtain creative design ideas of high originality and novelty based on designers’ own efforts. Many computer-aided conceptual design support methods and tools have been proposed to offer creativity support to designers. For example, knowledge- or heuristics-based stimulation approaches can retrieve and mapping of source knowledge into the target design domain (Jiang et al., Reference Jiang, Hu, Wood and Luo2022). Some studies have attempted to utilize the information in patents, research papers, or encyclopedia data to construct semantic networks (Luo et al., Reference Luo, Sarica and Wood2019; Sarica et al., Reference Sarica, Luo and Wood2020). By computing the semantic distances between design goals and knowledge in a database, these methods could offer design stimuli or knowledge to human designers. However, these stimuli-based methods still require designers’ cross-domain reasoning to complete the final design concept adapting to the current problem scenario.
2.2. Generative AI in conceptual design
Driven by technological advancements in machine learning, such as Generative Adversarial Networks (GANs) (Goodfellow et al., Reference Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, Courville and Bengio2014), Variational Autoencoders (VAEs) (Kingma and Welling, Reference Kingma and Welling2014) and transformers (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017), various Generative AI models including GPTs (Radford et al., Reference Radford, Narasimhan, Salimans and Sutskever2018), BERT (Kenton and Toutanova, Reference Kenton and Toutanova2019) and StyleGAN (Karras et al., Reference Karras, Laine and Aila2019) have demonstrated significant potential and powerful performance. Among these, text-to-text and text-to-image models have garnered considerable attention in the field of conceptual design (Wu et al., Reference Wu, Cai, Sun, Ma and Lu2024), and have sparked a series of studies on how to smoothly integrate these two types of models into existing workflows (Mahdavi Goloujeh et al., Reference Mahdavi Goloujeh, Sullivan and Magerko2024; Guo et al., Reference Guo, Song, Fang, Lin, Zhang, Li and Wang2024).
Text-to-text tools, enhanced by Large Language Models (LLMs), such as ChatGPT, Llama and BERT, can generate natural and fluent answers to comprehend user input and provide contextual solutions in natural language. In the conceptual design domain, Generative AI-based text generation has been applied to requirement extraction (Shahin et al., Reference Shahin, Chen and Hosseinzadeh2024), creative ideation (Suh et al., Reference Suh, Chen, Min, Li and Xia2024), solution generation (Chen et al., Reference Chen, Song, Ding, Sun, Childs and Zuo2024c) and so on. For our experiments, as the experiment was carried out from May 2023 to June 2023, we specifically chose GPT-3.5 due to its robust capabilities in generating coherent and contextually relevant text outputs, which has been widely applied in Generative AI-assisted design research (Chen et al., Reference Chen, Jing, Tsang, Wang, Sun and Luo2024b; Chen et al., Reference Chen, Cai, Cheang, Sun, Childs and Zuo2024a; Chen et al., Reference Chen, Shuai, Zhang, Sun and Cao2024d).
Some text-to-image models have also been used to aid designers in early concept development by providing visual references and multimodal stimuli (Kwon et al., Reference Kwon, Rao and Goucher-Lambert2023). These models, such as Midjourney, DALL-E and Stable Diffusion, promote rapid exploration and iteration through visualization, enabling designers to better express their design concepts. Among these, Midjourney stands out both commercially and in terms of model performance, which has been widely adopted in human–AI collaboration research (Tan and Luhrs, Reference Tan and Luhrs2024; Wadinambiarachchi et al., Reference Wadinambiarachchi, Kelly, Pareek, Zhou and Velloso2024; Mahdavi Goloujeh et al., Reference Mahdavi Goloujeh, Sullivan and Magerko2024) due to its impressive image generation quality and user-friendly features.
Although various work has been done to develop Generative AI-based design tools and methodologies, there is still a lack of empirical evidence for the effects of Generative AI in conceptual design processes. Thus, in this study, we adapt experimental methods from traditional design research to explore the influence of two representative Generative AI models (i.e., text-to-text and text-to-image models) on different conceptual design stages, contributing new empirical evidence to the design community.
During the human–AI collaboration, Generative AI can assist human designers by generating concepts for selection, evaluation and iteration. Additionally, the output from Generative AI can inspire designers to develop more innovative ideas as the information can often expand designers’ knowledge and exploration scope. These capabilities create unprecedented opportunities, particularly for novice designers, by significantly lowering the barriers to cross-disciplinary design and rapid visualization. Therefore, this study focuses on novice designers as research subjects to ensure more targeted research conclusions.
3. Experimental study
To gain deeper insights into the human–AI collaboration paradigm, a human–AI co-design study was conducted. Midjourney (a text-to-image Generative AI model) and GPT-3.5 (a text-to-text Generative AI model) were selected as example Generative AIs. The selection of these general-purpose Generative AI tools, rather than design-specific alternatives, aligns with our research objective: to empirically investigate the role of Generative AI in conceptual design rather than to evaluate the existing design-specific tools. Furthermore, as identified in the literature review, limited design-specific tools are designed to support the entire conceptual design process (Lee et al., Reference Lee, Law and Hoffman2024). Generative-purpose AI, therefore, is more suitable for achieving our research aim. The whole study procedure is shown in Figure 1. Primarily, we aimed to address three research questions through this experiment:
RQ1: In which stages is Generative AI involved in?
RQ2: What are the performances of Generative AI?
RQ3: What are the characteristics of prompt content?

Figure 1. Representation of the experimental study procedure.
3.1. Participants
We recruited participants through a university social networking site, with recruitment criteria requiring participants to be novice designers with <4 years of design learning experience. Additionally, it was required that participants have prior experience with ChatGPT and Midjourney. This can enable participants to focus more on design tasks rather than adapting to new Generative AI tools during the experiment. There were no restrictions for major background. After the screening process, a total of 20 participants (13 females, 7 males, aged 18–26, SD = 2.7) who met these criteria were selected. Each participant was paid $10 per hour, and the average time for completion was about 70 minutes.
3.2. Procedure
Participants were randomly divided into four groups, which are ChatGPT Group, Midjourney Group, Combined Group and Human Group. It is important to note that in the Combined Group, participants can use both ChatGPT and Midjourney freely for both tasks, without limitations on orders. Before the experiment started, we conducted a 20-minute training session that covered the basics of conceptual design procedures and how to utilize GPT-3.5 and Midjourney to generate conceptual designs. Then, the experimental procedure and two conceptual design tasks were introduced to each participant. Two distinct design tasks were selected to mitigate potential biases, such as participants’ potential expertise in a single task-related area (Hu and Reid, Reference Hu and Reid2018). The first task required participants to design a baby chair in 20 minutes and the second task involved designing tangible music bricks in 20 minutes. The selection of the two design tasks was based on two considerations: first, the design ideation of the task could be conveyed through shape and external structure instead of intricate details related to internal structure. Second, the conceptual design task should involve objects that participants are familiar with but not commonplace to ensure participants can complete the design tasks while having room for creative divergence. After introducing the experiment tasks and addressing the participants’ questions about the procedure, the formal experiment began.
During the experiment, there are some specific requirements for each group:
-
• ChatGPT Group: Participants were asked to use ChatGPT to complete both of the design tasks.
-
• Midjourney Group: Participants were asked to use Midjourney to complete both of the design tasks.
-
• Combined Group: Participants were asked to use ChatGPT and Midjourney to complete both of the design tasks. The sequence of tool usage was not predetermined, allowing participants to choose the order based on their preferences.
-
• Human Group: Participants were asked to finish both of the tasks on their own (without the assist from Generative AI).
Finally, each participant was required to create an image for each task that illustrates the product’s design features, accompanied by essential text descriptions to clarify the design.
After the two tasks, each participant was invited to fill out a questionnaire. The questionnaire encompasses a 7-point Likert scale across five criteria regarding the evaluation of Generative AI’s performance (Participants in the Human Group were invited to evaluate their performance to serve as a baseline for comparing the performance of Generative AI). The questionnaire also explored the participants’ reflections about which stages Generative AI helped with, and which stages are human-led (see details in Section 3.3.1). In the semi-structured interview, we discussed the questionnaire results with participants and their attitudes, evaluations and suggestions regarding the Generative AI-assisted human–AI collaboration. Each interview lasted around 30 minutes. All study procedures conformed to the Institutional Review Board (IRB) guidelines on human subject study.
3.3. Data collection and analysis
3.3.1. Participants’ assessment of Generative AI tools performances
After completing the experimental tasks, a performance assessment questionnaire was distributed to each participant. In the questionnaire, participants needed to evaluate the performance of Generative AI tools which they used on six criteria: speed, subject, diversity, novelty, triggering more ideas and requirement satisfaction. The performance evaluation focused on the overall design process. These criteria were selected as they effectively reflect the impact of Generative AI in conceptual design. To be specific, the criteria for diversity and novelty were inspired by traditional conceptual design evaluations (Shah et al., Reference Shah, Smith and Vargas-Hernandez2003), while the other criteria (trigger more ideas, requirement satisfaction, speed and subject) were specifically formulated based on key factors in Generative AI-assisted conceptual design and related to the objectives of this study. Detailed explanations of the six criteria are described in Table 1. Additionally, for the Combined Group, the questions were designed to evaluate ChatGPT and Midjourney separately. For the Human Group, participants were asked to assess their own performance. This approach allowed us to collect firsthand feedback from human designers and gain insights into their modes of collaboration with AI.
Table 1. Participants’ evaluation criteria

Notes: Participants in the Human Group were invited to evaluate their own performance to serve as a baseline.
3.3.2. Expert ratings
Five professional designers (three males, two females, aged 25–29), who have more than 5-year design experience, were recruited as experts to evaluate the conceptual design solutions created by the participants in the four groups. During the evaluation, the 40 design solutions were randomly displayed. For each solution, assessors were first introduced to which task (Task 1 or Task 2) the solution was from. Then, assessors were asked to assess the solutions using 7-Likert scale (1: The performance is really poor; 2: The performance is poor; 3: The performance is below average; 4: The performance is average; 5: The performance is above average; 6: The performance is very good; 7: The performance is perfect). The experts assessed the design solutions based on five primary criteria: (1) Novelty: whether the design introduces new ideas or approaches that are not commonly found in similar products; (2) Feasibility: whether the design can be realistically implemented; (3) Usability: whether potential users can easily and effectively use the product to achieve their goals; (4) Functional diversity: the range of functions that the design can perform; and (5) Cost: the overall expenses involved in manufacturing, distributing and maintaining the product over its lifespan (high-cost score means poor performance). The assessment process lasted around 30 minutes.
3.3.3. Generative AI’s helpful stages in conceptual design
As this study aims to characterize human–AI collaboration in conceptual design at a stage level, we defined “actual-helping stages” as the stages where participants reported completing tasks with the assistance of Generative AI. To explore this, participants needed to fill the post-experiment questionnaire to report the stages in which Generative AI actually helped them. Additionally, participants needed to report which stages were led by humans, implying that human designers completed most of the work independently. These two questions were presented in the form of multiple-choice questions, allowing participants to select all the stages they felt were applicable. The questionnaire for the Combined Group is detailed in Appendix A as an example.
3.3.4. Participants’ prompt
All text inputs by participants to communicate with Generative AI during their conceptual design process were collected. In total, we gathered 114 prompts across the ChatGPT Group, Midjourney Group and Combined Group for Tasks 1 and 2, averaging 3.8 prompts per participant per task. For the data analysis process, we first categorized participants’ prompt into one of the four stages of conceptual design. Initially, a random sample of three participants’ prompt from each group was selected, and two researchers independently categorized these samples to develop a preliminary understanding. After discussing their individual classifications and explanations, they reached a consensus, which facilitated the finalization of a comprehensive codebook, detailed in Appendix B. After establishing the codebook, the two researchers independently coded the prompts for the remaining two participants’ prompt in each group, achieving an inter-rater reliability of
$ \kappa =0.74 $
, indicating a strong agreement between the coders. Ultimately, in the four stages analyzed, there were 31, 14, 10 and 9 prompts identified with ChatGPT, and 13, 20, 5 and 12 prompts identified with Midjourney, respectively.
We then moved to the next phase of our analysis, which involved systematically summarizing the strategies ChatGPT and Midjourney assisted human designers with during each stage of the conceptual design process. Specifically, the same two researchers independently reviewed the categorized prompts to identify the assistance strategies provided by ChatGPT and Midjourney for each stage. Discussions were frequently made to resolve any discrepancies. Specifically, we applied the affinity diagramming method to aggregate and analyze the topics reflected in participants’ prompt (Holtzblatt and Beyer, Reference Holtzblatt and Beyer1997). In this process, two researchers placed the original prompt contexts onto sticky notes, grouped them and iteratively labeled each group with descriptors to elucidate their shared themes. The summarized strategies, along with corresponding examples, are presented in Section 4.3.
3.3.5. Post-experiment interview
We conducted one-to-one interviews after the participants finished the two design tasks and questionnaire to gain deeper insights into how novice designers collaborate with Generative AI during the conceptual design process. The interview questions were tailored based on the participants’ questionnaire responses and the design solutions they completed, focusing on the following aspects:
-
(1) Why and how did you use ChatGPT/Midjourney during the [specific design stage]?
-
(2) In the questionnaire, you rated [specific criterion] with [specific score]. Why did you give this rating?
-
(3) In the conceptual design’s human–AI collaboration, you mentioned that the [specific design stage] should be human-led. Why do you think so? Furthermore, you indicated that [specific design stage] requires collaboration between humans and Generative AI. Could you please explain this in detail?
-
(4) What are your other feelings and views about the collaboration between humans and Generative AI in conceptual design?
4. Results
In this section, the three research questions are systematically answered based on the analysis of the collected data. First, we answer the question of which stages that Generative AI is involved in in RQ1. Second, we explore the performances of Generative AI in the conceptual design process both from participants’ perspective and expert ratings in RQ2. Third, in RQ3, we delve into the prompt analysis results from human designers.
4.1. RQ1: In which stages is Generative AI involved in?
We initially identified the stages where Generative AI assisted designers and those perceived as human-led. This analysis includes data from the ChatGPT Group, Midjourney Group and Combined Group. Figure 2 illustrated Generative AI’s helping stages and human-led stages in conceptual design processes by two Sankey diagrams. The percentages represent the proportion of responses among the 15 participants. Figure 2 (a) demonstrates that Generative AI predominantly supported humans during the idea generation stage, problem definition stage and idea evolution stage. Respectively, 86.7%, 73.3% and 60% of participants recognized the assistance of Generative AI in these stages. This indicates that text-to-text and text-to-image Generative AI tools are particularly effective in initiating and nurturing early-stage design thinking, where conceptual blending and broad brainstorming are crucial (Wang et al., Reference Wang, Petridis, Kwon, Ma and Chilton2023). Figure 2 (b) reveals that the idea selection and evaluation stage (86.7%), as well as the idea evolution stage (60%), are predominantly perceived as human-led. This could imply that human judgment remains essential when it comes to evaluating and making final decisions on these ideas.

Figure 2. Horizontal Sankey diagrams representing (a) the comparison of group types in relation to Generative AI’s helping stages and (b) the comparison of group types in relation to human-led stages. (Percentages in the figure represent the proportion of responses among the 15 participants in Generative AI-assisted groups.)
Overall, while Generative AI primarily supports the early stages of conceptual design, such as problem definition and idea generation, it still relies on human-led processes during the evaluation phase, highlighting the complementary roles of Generative AI and human expertise play in the design process.
4.2. RQ2: What are the performances of Generative AI?
This subsection synthesizes the assessments of Generative AI’s performance by participants with expert evaluations of the final design solutions. By integrating these perspectives, we aim to provide a multidimensional understanding of how AI tools contribute to and influence the conceptual design process. As for data analysis, ANOVA analysis was employed when the data followed a normal distribution. For a non-normal distribution, the Kruskal–Wallis H test, a non-parametric test, was utilized to detect significant differences between the four groups.
Table 2 presents the assessments of five evaluation criteria by participants during the experimental process. Through the statistical results, some insights could be gleaned regarding model characteristics and human–AI collaboration. Notably, ChatGPT Group excelled in speed and requirement satisfaction. The superior speed performance can be attributed to ChatGPT’s text-to-text model features, which allow for faster generation and real-time progress tracking. In contrast, Midjourney applies an iterative refinement process, starting with an initial visual pattern that progressively evolves into cleaner outputs through multiple enhancement steps. In addition, the requirement satisfaction scores were lower when participants used Midjourney, which means that instructions were not always effectively reflected in the final images produced. This reflects challenges in controlling Midjourney, as participants frequently reported that while they could manipulate shape design, the details did not align with their intentions (P2-Midjourney: “When I input some commands, the generated images could generally shape the overall appearance, but the finer details did not align with my intended design specifications.”). In contrast, Midjourney achieved the highest scores in terms of subject, diversity, novelty and triggering more ideas, likely benefiting from its visual representation which offers more direct stimuli. When comparing scores with and without Generative AI tools’ help, ChatGPT Group achieved a lower score than the Human Group in subject, primarily because the Generative AI’s outputs were probabilistic rather than fact-based (Brown et al., Reference Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry and Askell2020).
Table 2. Average scores and standard deviations of participants’ evaluation of different Generative AI of each group in experimental design

Notes:
1. * indicates our baseline condition.
2. bold indicates the best performance among four groups.
3. underline indicates performance worse than Human Group.
The statistical assessment of expert ratings, detailed in Table 3, shows that the Midjourney Group achieved the highest overall score with a mean of 4.34 and a standard deviation of 0.75. Significant differences were observed in the metrics of novelty
$ \left(p<{0.01}^{\ast \ast}\right) $
, cost
$ \left(p<{0.01}^{\ast \ast}\right) $
and overall performance
$ \left(p<{0.01}^{\ast \ast}\right) $
. Subsequent post hoc tests utilizing the Bonferroni correction revealed significant differences in novelty between the Human Group and the Midjourney Group
$ \left(p<{0.01}^{\ast \ast}\right) $
, Human Group and the Combined Group
$ \left(p<{0.01}^{\ast \ast}\right) $
. Additionally, the ChatGPT Group’s mean score of 3.98 also surpassed that of the Human Group’s 3.20, indicating that Generative AI tools could broaden the range of design options and introduce unique visual examples that enhance creativity. In addition, significant differences were found in the cost metric between the Human Group and both the ChatGPT Group
$ \left(p<{0.01}^{\ast \ast}\right) $
and the Midjourney Group
$ \left(p<{0.01}^{\ast \ast}\right) $
. Notably, a higher cost score implies poorer performance, suggesting that the use of Generative AI may increase the complexity of design ideas, according to the definition of cost in Section 3.3.2. The same pattern was observed in overall scores, where both the ChatGPT Group and the Midjourney Group outperformed the Human Group, with
$ p={0.02}^{\ast } $
and
$ p={0.01}^{\ast } $
respectively. The Combined Group’s mean score of 4.10 also surpassed the Human Group’s 3.59. These results reflect that, compared with Human Group, human designers with Generative AI tools consistently achieved higher scores regarding expert ratings. The average Cohen’s kappa among five assessors was 0.66 (with detailed results shown in Appendix C), which indicates an acceptable level of consistency.
Table 3. Expert rating results in combined two tasks

Notes:
1. ** denotes
$ p<0.01 $
and * denotes
$ p<0.05 $
.
2. Bolded scores indicate the best performance among the four groups.
4.3. RQ3: What are the characteristics of prompt content?
After completing the coding process described in Section 3.3.4, a summary of the strategies employed by human designers, alongside relevant examples, is presented in Appendix D due to space constraints. Subsequent sections will discuss these strategies as employed across the four stages of conceptual design: problem definition, idea generation, idea selection and evaluation and idea evolution.
4.3.1. Problem definition stage
For ChatGPT, the highest number of prompts was identified in this stage (31/64 = 48.4%), with five strategies summarized. First, it aided in identifying the target audience by providing demographic data, illustrated by a response detailing the age range for baby chair usage (P5-ChatGPT Group). Second, ChatGPT assisted in user needs analysis, as shown by an inquiry about parents’ needs for baby seats (P1-ChatGPT Group), gathering insights that traditionally depend on extensive research (French et al., Reference French, Gravdahl and French1985). Third, participants utilized ChatGPT to offer insights into existing products and their market status, which streamlined the research phase by providing comprehensive and organized answers, exemplified by a prompt about current music visualization tools (P1-Combined Group). Fourth, functionality considerations were explored through inquiries about necessary features for a baby seat (P5-ChatGPT Group). Finally, participants also employed ChatGPT to investigate suitable materials that meet both the functional and aesthetic needs of the product (P4-ChatGPT Group).
For Midjourney, it enhanced the problem definition stage by facilitating the exploration of various design intents. It enabled designers to experiment with different aesthetic and functional styles visually, thus aiding the formulation of their own design concepts. A representative prompt is demonstrated by the P4-Midjourney Group “baby chair, cute, bright colors,” which served as the initial input to explore a variety of design elements from an initial vague design direction.
4.3.2. Idea generation stage
ChatGPT facilitated idea generation in two distinct ways, differentiated by whether designers had initial design elements. These methods are identified as “key design points synthesis” and “intuitive idea generation.” For instance, with specific design elements in mind, the P4-ChatGPT Group formulated a prompt to “Design a baby chair that combines growth adaptability, non-toxic materials and music.” Meanwhile, the tool could also generate original designs spontaneously without specific directions from designers, as seen in the P1-Combined Group’s prompt: “Design an innovative baby chair.”
For Midjourney, the highest number of prompts was identified in this stage (20/50 = 40.0%), enhancing the idea generation stage in a relatively straightforward way by efficiently transforms design ideas into visual representations. This quick visualization saved much time and effort for manual sketching, which also facilitated the following idea selection and evaluation process.
4.3.3. Idea selection and evaluation stage
In the idea selection and evaluation stage, ChatGPT enhanced the design process by providing two essential types of support: creativity evaluation and feasibility assessment. Creativity evaluation primarily focuses on the novelty of design concepts. It involves ChatGPT aiding designers by highlighting innovative elements and suggesting areas for enhancement (P4-Combined Group). On the other hand, feasibility assessment concentrated on evaluating the practicality of the proposed concepts (P1-ChatGPT Group).
Meanwhile, Midjourney contributes by enabling the creation of multiple design variants. This feature facilitates the comparison and selection among different design concepts. For example, P3-Midjourney Group, who re-entered a prompt from an earlier step to generate more visualizations, explored various adaptations of a specific design idea. This process underscores Midjourney’s capability to quickly adapt and visualize numerous iterations.
4.3.4. Idea evolution stage
In the idea evolution stage, designers primarily leveraged ChatGPT to enhance the design process in two ways: refining design elements and facilitating concept iteration. For the refinement of design elements, designers employed ChatGPT to improve and elaborate on the proposed solution’s details. An example is the P3-ChatGPT Group’s use of ChatGPT to refine a children’s seat design by integrating more comfortable materials, as illustrated in the prompt: “Refine the integration of fabric and Lego to optimize comfort and functionality in the children’s seat design.” In terms of concept iteration, designers revisited and revised their initial design directions. The revision process is exemplified by the P5-ChatGPT Group’s request to “Propose an alternative design for this baby rocking chair with modular components,” which shifted the focus from a standard design to one featuring modular components.
Meanwhile, Midjourney enhanced the idea evolution stage by promoting design concept iteration and detailing visual enhancements. Specifically, it facilitated rapid visualization and iteration of revised design concepts, enabling designers to swiftly explore and visualize modifications (P2-Combined Group). Additionally, it refined and detailed visual elements based on the same design theme, adding aesthetic intricacies that enriched the overall design (P5-Midjourney Group).
5. Discussion
5.1. The role of Generative AI in human–AI collaboration in conceptual design
Generative AI expands designers’ solution exploration space and improves solution quality. Based on the results, all three Generative AI-assisted groups extensively utilized the provided Generative AI models during the idea-generation stage, as illustrated in Figure 2 (a). This extensive use is likely due to the contextual solution generation capabilities of Generative AI, a notable advantage highlighted in previous research (Wu et al., Reference Wu, Cai, Sun, Ma and Lu2024; Weisz et al., Reference Weisz, He, Muller, Hoefer, Miles and Geyer2024; Lee et al., Reference Lee, Law and Hoffman2024). From the perspective of designers’ evaluation, the assistance of Generative AI facilitates the exploration of a broader solution space. Specifically, with the support of Generative AI, scores for diversity, novelty and the ability to trigger more ideas are all higher compared with those in the Human Group. From the perspective of design solution quality, experimental groups using Generative AI tools achieved higher mean scores across all five metrics examined in this study compared with the Human Group. Furthermore, overall scores for both the ChatGPT Group and Midjourney Group were found statistically significant differences compared with Human Group. This empirical evidence underscores the effectiveness of Generative AI in aiding novice designers during the conceptual design process. However, as this study aims to investigate “How Generative AI supports humans in conceptual design,” our primary focus is on the role of Generative AI in “triggering more ideas.” This focus may overlook the issue of design fixation potentially caused by Generative AI (Jansson and Smith, Reference Jansson and Smith1991; Wadinambiarachchi et al., Reference Wadinambiarachchi, Kelly, Pareek, Zhou and Velloso2024), leaving room for future research to explore this further.
With the assistance of Generative AI, human’s idea selection and evaluation stage was further triggered. This finding stemmed from the post-interview data, where we asked participants to reflect on their design process from the perspective of conceptual design stages. When novice designers finish the design task on their own, the solution selection and evaluation stage may be overlooked (P2-Human Group, P4-Human Group). One possible explanation for this is that during the conceptual design process, designers often independently develop solutions starting from existing ideas (P2-Human Group: “My strategy is that when I create this design, it was based on the existing possible problem with the baby and the stroller. This direct design process did not have an idea selection and evaluation process. It is a direct design and ignores the selection and evaluation processes.”). In this context, they primarily engage in autonomous concept development and find it challenging to step outside their established cognitive frameworks to effectively evaluate and select among different solutions. However, the pattern changes when designers collaborate with Generative AI – the involvement in the solution selection and evaluation stage becomes more pronounced. This suggests that Generative AI may enhance the solution selection and evaluation stage by prompting designers to critically assess and justify the outputs it generates. This interaction may help break cognitive biases and encourage a more thorough evaluation process (P2-Combined Group: “I reviewed everything ChatGPT and Midjourney generated. Some evident flaws would be found. Following that, I also got some new ideas about solving the problem.”).
Comparison between text-to-text and text-to-image models in initial conceptual design processes. Although the experimental results indicated that Generative AI primarily assisted in the problem definition and idea generation stages, text-to-text models and text-to-image models played distinct roles in these two phases. Specifically, in the problem definition stage, ChatGPT was able to outline key points of product design and provide suggestions for innovative designs (P5-ChatGPT Group: “For the first task, I only have a general idea and did not know the specific details. So I asked GPT-3.5 to tell me what the needed functions should be. In the second task, I don’t really know about how to design musical bricks, and I command GPT-3.5 to tell me what the design of musical bricks commonly encompasses and which aspects I could make innovations in”), owing to the extensive knowledge base and the capacity for a certain level of reasoning. On the contrary, while Midjourney could offer help in the problem definition stage, it requires users to input solution-oriented prompts, which necessitates the user having a preliminary idea about the design solution, as P2-Midjourney Group noted: P5-ChatGPT Group: “Because it (Midjourney) relies on the initial keywords I provide. Without these keywords about design direction, it might deviate entirely from my intended idea.” In essence, while Midjourney offers assistance in the problem definition stage, it is insufficient on its own.
Although ChatGPT excelled in helping designers analyze individual design elements in the problem definition stage, the integrated design solutions generated from the filtered design points may confuse novice designers at idea generation stage. Conversely, Midjourney’s advantage in visualization saves designers time in expressing ideas related to shape, texture and color. For example, P1-ChatGPT Group inputted, Provide design ideas based on the elements I give you: “a baby seat, appearance of Super Mario, blue and red as main colors. Integrate the above design points and make it more complete and detailed.” However, ChatGPT’s response remained in the form of key points (such as theme and color scheme, shape and features and fabric and materials). “It could not provide me an overview of the design solution,” as expressed by P1-ChatGPT Group.
5.2. Implications for future conceptual design support under Generative AI’s help
Workflow guidance and system integration should be carefully considered when combining text-to-text and text-to-image models. In our experiment, the combination of ChatGPT and Midjourney did not yield a synergistic effect, both reflected in the participants’ assessment of Generative AI tools and expert ratings results. Interview results suggest a possible explanation, as P5-Combined Group noted, “In the experiment, I primarily copied results from GPT-3.5 to Midjourney, but these models interpret my commands and produce results differently.” This underscores how frustration could negatively impact the user experience with Generative AI, which might affect the outcome quality of human–AI co-creation solutions. Therefore, there is a necessity for methodologies and system designs that integrate the demands of various stages of conceptual design with the strengths of text-to-text and text-to-image models, respectively. Exploring better integration between these models could help leverage their combined potential.
Explore the effect of image stimuli on stimulating designers’ inspiration. In previous research on Generative AI-enhanced conceptual design, the problem exploration stage was primarily enhanced by text-to-text models (Norheim et al., Reference Norheim, Rebentisch, Xiao, Draeger, Kerbrat and de Weck2024), likely due to text being a fundamental mode of information expression. However, this study found that participants’ feedback indicated the highest novelty scores were achieved by Midjourney, and in expert ratings, the Midjourney Group also obtained higher mean scores than the ChatGPT Group. Although the text-to-text model leverages a big knowledge base to compensate for designers’ limitations in knowledge and experience, human designers may overlook potentially important details due to the extensive textual information. Therefore, future system development could consider aiding designers in integrating information output by text-to-text models with visual design elements, or exploring the potential of visual search (Son et al., Reference Son, Choi, Kim, Kim and Kim2024), which could help designers relate the LLMs’ response to possible design solutions and enhance the role of image stimuli in inspiring designers’ creativity.
5.3. Limitations and future directions
In this study, we focused exclusively on two representative generative models: a text-to-text model (ChatGPT) and a text-to-image model (Midjourney). Our decision regarding the specific choice of input and output modality was twofold. First, the choice of text as the primary input modality was driven by its accessibility and familiarity, particularly for novice designers, facilitating easier expression of design intents. Second, for text and images as output modalities, they are regarded as the most commonly utilized data modalities in conceptual design, making them appropriate for our output modalities.
Regarding continuous technical enhancements, on the one hand, more modalities can be incorporated to facilitate more flexible and naturalistic communication into the human–AI collaboration process, such as integrating voice, video and sketches. By incorporating more modalities of Generative AI, researchers can more closely investigate the actual workflows of designers in experimental settings. On the other hand, with improvements in the generative models used in our research, such as GPT-4 and GPT-4o, researchers could explore two main directions in future work. One direction involves assessing their performances in processing multi-modal inputs. The other examines how these models perform in various types of design tasks, particularly those requiring more reasoning abilities, since previous research has revealed that as the complexity of tasks increases, the accuracy of the outputs generated by LLMs decreases (Khot et al., Reference Khot, Trivedi, Finlayson, Fu, Richardson, Clark and Sabharwal2023). We believe these new avenues for subsequent empirical research could significantly contribute to the ongoing refinement and application of various Generative AI technologies across different design scenarios.
On the other hand, this study investigated the differences between groups assisted by Generative AI and those who completed tasks independently, which aims to uncover how these general-purpose Generative AI tools enhance designers’ conceptual design processes compared with undertaking design tasks independently. Considering the specialized nature of design-specific tools, which are usually tailored to particular stages of conceptual design (Lee et al., Reference Lee, Law and Hoffman2024), future work could explore how workflow instructions and prompt engineering methods might affect the stages where Generative AI proves most beneficial.
For the choice of control group in this study, we selected Human Group for the purpose of comparing the differences and performances of human designers with and without the assistance of Generative AI. This approach helped us obtain some insightful findings and implications for future research. For example, with the assistance of Generative AI, the idea selection and evaluation stage was further triggered. Future work could include a comparative analysis with other traditional design support methods and tools, which would help provide a more comprehensive understanding of the value added by Generative AI.
Finally, our experiment revealed that participants in the Combined Group, despite not being restricted on the order of tool usage, consistently used ChatGPT first, followed by Midjourney. Investigating the impact of the sequence of tool usage on experimental outcomes could provide valuable insights. Additionally, expanding our study to include a broader range of participants, such as more experienced designers, could help validate and extend our findings across different levels of expertise.
6. Conclusion
Our work aimed to investigate how Generative AI assists humans in the conceptual design process, especially for novice designers. Specifically, we conducted an experimental study involving 20 novice designers, assessing their performance with or without the help of text-to-text and text-to-image Generative AI models. The results revealed that Generative AI mainly assists humans in the initial stages of conceptual design, such as problem definition and concept generation, while the stages of idea selection and evaluation remain predominantly human-led. Despite the assistance of Generative AI, which improved participants’ feedback and expert ratings, the combination of text-to-text and text-to-image models did not exhibit a synergistic effect. Based on the findings, we discuss the role of Generative AI in human–AI collaboration and compare the efficacy of different models in design assistance. Ultimately, we propose several implications for enhancing the effectiveness and user-friendliness of human–AI collaboration in conceptual design.
Acknowledgements
We thank all the participants for their time and the anonymous reviewers for their valuable comments. This research is supported by National Key R&D Program of China (2022YFB3303304).
Appendix A. Questionnaire example for participants in Combined Group
-
1. Which stages do you think ChatGPT had helped you? (You may select multiple options.)
-
a) Problem definition
-
b) Idea generation
-
c) Idea selection and evaluation
-
d) Idea evolution
-
-
2. Under ChatGPT’s assistance, which stages do you think were human-led? (You may select multiple options.)
-
a) Problem definition
-
b) Idea generation
-
c) Idea selection and evaluation
-
d) Idea evolution
-
-
3. Which stages do you think Midjourney had helped you? (You may select multiple options.)
-
a) Problem definition
-
b) Idea generation
-
c) Idea selection and evaluation
-
d) Idea evolution
-
-
4. Under Midjourney’s assistance, which stages do you think were human-led? (You may select multiple options.)
-
a) Problem definition
-
b) Idea generation
-
c) Idea selection and evaluation
-
d) Idea evolution
-
Appendix B. The codebook for the stages classification of prompt
Table B1. Codebook for the prompt classification for four stages.

Appendix C. The Cohen’s kappa results in expert ratings
Table C1. The Cohen’s kappa results of expert ratings.

Appendix D. The strategies human designers employed when collaborating with Generative AI
Table D1. Human strategies for collaborating with Generative AI.
