More than half of college students admit to using Artificial Intelligence (AI) programs such as ChatGPT (i.e., an acronym for Generative Pre-trained Transformer) to complete assignments and exams (Nam Reference Nam2023). Although much attention has focused on the implications of this phenomenon for learning and plagiarism (Cotton, Cotton, and Shipway Reference Cotton, Cotton and Shipway2023; Wang Reference Wang2023), the emergence of AI tools raises broader questions about the implications for student writing. To what extent does AI outperform human writing? Are students better off relying on AI rather than generating original content?
We assessed the impact of AI on student writing by focusing on the capacity of one instrument, ChatGPT, for writing a specific type of paper: position papers in the context of Model United Nations (MUN). Position papers in MUN describe each country’s policies on the topics being discussed in specific committees. These documents are highly technical and draw on research from international, regional, and domestic treaties; conventions; declarations; and resolutions to propose initiatives that address problems related to the topic under discussion. Although they are composed in the context of an academic competition, their content and structure—which are grounded in factual sources and ongoing activities of countries operating within the international community—provide a proxy for research studies that students are asked regularly to undertake. The process of drafting a MUN paper is rigorous: students must make decisions about how to condense extensive information about a given topic, country, and policy to fit within a two-page limit. To ensure a quality paper, entire teams of students often may work together to pool their research and collective knowledge, drawing primarily on a background guide provided by the conference as well as United Nations (UN) databases, government websites, and media reports. Position papers typically require about 40 hours to write, with multiple rewrites and revisions before submission to be eligible for award recognition. Papers that propose multiple creative and feasible solutions, written as succinctly and as formally as possible, often are the most valued.
We evaluated the capacity of ChatGPT to substitute for students in composing these position papers by comparing blind evaluations of MUN position papers written by students alongside those composed by ChatGPT (De Maio et al. Reference De Maio, Kabalaki, Moshtael and Tejax2024). We organized the student-written papers into two categories: those that won awards at the MUN conference at which they were presented and those that were randomly selected from non-award-winning papers. This allowed us to compare the AI-generated papers against both “average” papers and those judged by the MUN awards committee to be especially excellent.
We assessed the papers across various dimensions, including accuracy, thoroughness, grammar, and clarity. We found that papers written by ChatGPT were evaluated more highly than those written by students. However, award-winning student-written position papers outperformed ChatGPT-generated papers. Although our study focused on a specific type of writing assignment, we expect that our analysis will contribute to deepening our understanding of the benefits and limitations of students using AI in their coursework.
Understanding the success of ChatGPT in mimicking—and even surpassing—student-written work requires recognizing that ChatGPT’s performance is only as good as the instructions that its users provided. The quality of AI-generated writing depends on human-generated input. As our study emphasizes, this recognition underscores the need for a human-centered approach to AI that not only ensures collaboration between student and machine but also preserves human agency and control.
The quality of AI-generated writing depends on human-generated input….[T]his recognition underscores the need for a human-centered approach to AI that not only ensures collaboration between student and machine but also preserves human agency and control.
THE CAPACITY OF CHATGPT
The emergence of ChatGPT in late 2022 as the first publicly available and easy-to-use AI-generative tool raised concerns about the implications of AI in the future of education (Miao and Holmes Reference Miao and Holmes2023).Footnote 1 As ChatGPT quickly became the fastest-growing app in history, these concerns focused primarily on the increased ability of students to cheat on assignments, rely on inaccurate and/or biased research, and not develop critical-thinking skills essential for learning (Anders Reference Anders2023).
A particular concern has been about what AI tools such as ChatGPT could mean for innovation and the development of new ideas. A form of machine learning, ChatGPT produces replies to prompts written in a conversational interface (Open AI 2023). ChatGPT does not operate based on a set of rules; instead, it converts prompts that are inputted by users into new content that draws from data collected from websites, blogs, and other digital media (Metz Reference Metz2024). Because ChatGPT searches for common patterns in the data that it accesses, it cannot produce original ideas and solutions to the types of real-world challenges confronted by organizations such as the UN. It also cannot guarantee accuracy (OpenAI 2023), and it depends on user knowledge of the subject to detect errors. Moreover, ChatGPT is being trained continuously by the input of data that increases its “parameters” or metaphorical “knobs” that can be adjusted to improve its performance (Miao and Holmes Reference Miao and Holmes2023). These parameters then determine how ChatGPT processes information and generates output. It is not clear, however, how the content that forms these parameters was created. It may include false or inaccurate information as well as reflect cultural norms that bias the content generated (Miao and Holmes Reference Miao and Holmes2023). Thus, whereas ChatGPT can provide shortcuts by synthesizing information, concerns about its accuracy suggest that it cannot replace human-led research.
To know how much we can trust the output generated by ChatGPT, we must have significant knowledge of the topic about which we are asking it to write. The 2023 United Nations Educational, Scientific and Cultural Organization report on guidelines for using AI suggests that ChatGPT’s efficacy as an academic tool depends on grounding it in a human-centered approach that can provide monitoring and oversight to limit the chance of producing (and reproducing) false statements (Miao and Holmes Reference Miao and Holmes2023). The results presented in this article were affected by the specific instructions that we gave ChatGPT—instructions that could come only from a human with a well-researched understanding of the topic.
METHODOLOGY
Critical components of MUN competitions include the researching, writing, and submitting of position papers. Position papers are brief, two-page documents prepared by delegates who represent a country or an entity on a particular MUN committee. These papers outline the country’s policies, interests, and proposed solutions to a pair of preassigned problems that are distributed in advance by the conference organizers. Position papers are essential tools for effective communication, cooperation, and debate within the MUN framework. They also are an opportunity for highlighting writing and reasoning skills, which is why our analysis of MUN position papers helps to understand the usefulness of ChatGPT as a replacement for students’ own efforts. The best papers are awarded the prize of Outstanding Position Paper to acknowledge their inclusion of relevant facts, statistics, and historical context to support the arguments presented, as well as the originality of the proposed solutions.
For our analysis, we collected position papers submitted to the 2023 National MUN (NMUN) Conference held in New York.Footnote 2 We then drew from a sample of 100 papers, stratified by committee (i.e., weighted by committee size) and by whether a paper had received an Outstanding Position Paper award.Footnote 3 For each student-written position paper that we selected, we instructed ChatGPT to write a parallel paper for the same committee on the same topic and representing the same country. The resulting dataset consisted of 200 position papers: 100 pairs, half student-writtenFootnote 4 and half composed by ChatGPT. The human or AI authorship (as well as—among the former—the status of the position papers as award-winning or not) was hidden from the coders who then were asked to evaluate their quality following a detailed rubric.Footnote 5 In developing this rubric, we adopted the guidelines used by the NMUN organization to evaluate position papers for awards. The rubric assesses various aspects of the position paper, including writing quality, adherence to style and grammar conventions, reference to relevant resolutions and documents, consistency with bloc/geopolitical and UN constraints, and the depth of analysis provided. Each criterion was evaluated on a scale from 0 to 4, with descriptors for exemplary, proficient, developing, elementary, and unsatisfactory levels of performance. The total possible points that a paper could earn was 36 across the nine parameters. Assessments were completed by trained research assistants who were blinded to the authorship and award status of the position papers.
GENERATING POSITION PAPERS USING CHATGPT
Each committee has a background guide that details the two topics to be discussed at the conference. Students have access to these background guides several months before the conference, and they rely on them to guide their research. Each background guide has seven distinct sections. The introductory section is dedicated to introducing readers to the two topics discussed in the position paper and in the upcoming committee session. Following the introduction, each of the two topics has three sections, all of which address different aspects of the issue. The first section provides background information and focuses on why the topic is important to the global community. The second section instructs student delegates to consider actions taken by the UN, regional institutions, and other member states to address the issue. The third section asks the writer to propose solutions to the issue by answering the questions in the further-research part of the background guide.
We instructed ChatGPT to write position papers following the same guidelines that the students were asked to follow. Rather than creating a single prompt with all of the background guide information, instructions, and examples about what to write for each paragraph, we created seven prompts that corresponded with the seven sections in the background guides. This step was necessary because in its current configuration, ChatGPT cannot support a prompt with all of the consolidated information. For the introduction, we instructed ChatGPT on which topics would be discussed in the position paper and the committee to which the paper was being submitted and to take a stance as “X” member state. We also uploaded the instructions provided in the MUN position paper guide so that ChatGPT could replicate the correct format. The information given to ChatGPT for each of the other sections was as follows: the topic discussed, which member-state stance it should take, the background information in the background guide, the instructions for the paragraph in the position paper guide, and an example of the section in the position paper guide. We also instructed it to write 125 words for the first section, 250 words for the second section, and 150 words for the third section. The number of words we asked ChatGPT to write per section was the same as that written in the examples given in the position paper guide. However, because this input was a suggestion, ChatGPT generated papers with varying word totals per section that more accurately reflected variation across the student-written papers.
ANALYSIS
We found that student-written, award-winning papers scored the highest across all dimensions. As shown in figure 1, the total score for award-winning papers was 26.3/36. The next-best performers were ChatGPT-written papers at 22.9/36 points; student-written, non-award-winning papers scored the lowest at 19/36 points.
Differences also emerged across particular components of our evaluations (table 1). ChatGPT papers most closely replicated student-written, award-winning papers related to writing. For the writing rubric, we considered grammar, spelling, sentence structure, style, and format that were appropriate to the assignment. On that dimension, the ChatGPT papers scored, on average, almost as high as the award-winning papers (i.e., 2.5 compared to 2.58 points), whereas the student-written, non-award-winning papers scored markedly lower (i.e., 1.88 points).
The ChatGPT papers also outperformed the student-written, non-award-winning papers with regard to international consistency and accuracy about global constraints. The difference was smaller when we evaluated consistency with regional and domestic constraints, which perhaps suggests that ChatGPT is better at handling broader, more general information requests than narrower, country-level information. In providing solutions to the issues addressed by the committee, ChatGPT was strong in making feasible suggestions given budgetary and committee constraints but was weaker in terms of creativity of approach.
Comparisons between AI-written and student-written, award-winning papers also produced interesting findings. The greatest differences were observed relevant to resolutions, treaties, and documents: award-winning papers performed much better, suggesting that ChatGPT does not provide the same quality of research to which the best delegates have access. The award-winning papers also were far more consistent with domestic constraints—again demonstrating that ChatGPT cannot replicate the nuanced, narrower research that more micro-level analysis requires. To illustrate the difference, we consider the following two passages from papers written for the UN Economic Commission for Africa. The first paper states:
As a supporter of safe and effective trading, Ghana supports the AfCFTA, which allows Africans to fulfill the economic possibilities of Agenda 2063. Ghana also endorses the utilization of the services of the Office of the United Nations High Commissioner for Human Rights (OHCHR) and their Human Rights Indicators (HRIs), which can be valuable tools for understanding the complexities of issues negatively affecting all groups of people.
The second paper states:
In light of the challenges posed by the African Continental Free Trade Area (AfCFTA) and its potential impact on human rights, Ghana emphasizes the need for a comprehensive approach.
The first passage is from a student-written, award-winning paper and the second is from a ChatGPT-generated paper. Both passages are factually correct, and the AI-generated paper is well crafted and captures exactly how students are trained to write and structure these essays. The key difference, however, is that the student paper includes a greater level of detail, suggesting that the writer had actively synthesized a volume of research that is absent in the AI-generated paper.
The most significant differences that we observed were between student-written, award-winning and non-award-winning papers. Across all dimensions—but particularly with regard to level of detail, the inclusion of relevant resolutions and documents, creativity, and feasibility of proposed solutions—the award-winning papers far outperformed the non-award-winning papers.
A potential concern with our coding procedures is that—notwithstanding our attempt to keep the coders blinded about the origins of the position papers they were evaluating—they may have been able to determine whether a given paper was AI-generated. This could be a problem if the recognition of a paper as (likely) written by ChatGPT changed the way the coder evaluated its quality. This concern appears to have been warranted—if a paper deviated even a little from the standard style of a position paper, coders reported that they often could infer that it had been written by a student.Footnote 6 However, this applied only to the student-written, non-award-winning papers because the award-winning papers almost always followed the established format and guidelines and thus were indistinguishable from the AI-written papers, at least on this dimension.
BENEFITS OF USING AI
Our analysis suggests that there are benefits to using ChatGPT for writing position papers, particularly for students who may not be prepared to write award-winning papers. Compared to the non-award-winning papers, ChatGPT papers were evaluated as having better writing, more consistency with geopolitical constraints, and a superior analysis of the issues. Crafting a position paper demands extensive research, analysis, and articulation of complex ideas. ChatGPT can streamline the process by generating drafts based on provided prompts. It also may decrease the time that students would spend brainstorming and give them a place to start their work, thereby allowing them to focus on refining their analysis—a benefit highlighted in other analyses of the impact of ChatGPT on student learning (Adiguzel, Kaya, and Cansu Reference Adiguzel, Kaya and Cansu2023; Qadir Reference Qadir2023). Perhaps more important, ChatGPT relies on a vast repository of information gathered from sources across the Internet. By leveraging this wealth of knowledge, students can gain insight into various viewpoints, policy approaches, and historical contexts relevant to their assigned country or topic. This exposure potentially could foster a deeper understanding of global issues, especially for those students who are not writing award-winning papers.
To use ChatGPT effectively, however, students must know how to phrase their queries and evaluate AI-generated responses. This requires an understanding of the topic and sufficient knowledge about the committee, country, and international dynamics to be able to identify credible information and construct well-supported arguments. Used in this way, ChatGPT can develop critical-thinking and analytical skills that could be applied to negotiations in MUN simulations.
One of the most significant benefits that we observed from using ChatGPT concerned the improvement of student writing. Because ChatGPT provides feedback and suggestions, it can serve as a virtual writing assistant, teaching students how to structure their papers most effectively, strengthen their arguments, and enhance the clarity of their analysis. This aspect of ChatGPT may promote equity because students at all skill levels could benefit from its resources. Novice delegates could rely on AI to generate foundational content and structure, and more experienced delegates could use it to explore more advanced and innovative policy solutions. In this context, ChatGPT can democratize access to resources regardless of a student’s prior experience in MUN.
Because ChatGPT provides feedback and suggestions, it can serve as a virtual writing assistant, teaching students how to structure their papers most effectively, strengthen their arguments, and enhance the clarity of their analysis. This aspect of ChatGPT may promote equity because students at all skill levels could benefit from its resources.
LIMITATIONS OF AI
Although the utilization of ChatGPT for drafting position papers in MUN provides some advantages, there are inherent limitations associated with this approach. Despite its capabilities, ChatGPT possesses certain constraints that may impact the quality, authenticity, and ethical integrity of position papers. In our scoring of the papers for consistency with real-world constraints, we noted a lack of contextual understanding and the inability to provide sophisticated political, cultural, and historical details that could undermine the credibility of the paper. In addition, there remains the risk of academic dishonesty by producing plagiarized or unoriginal content. Students may incorporate text generated by ChatGPT without conducting thorough research or critically evaluating the information. ChatGPT also cannot be relied on to provide accurate and reliable sources (Sallam Reference Sallam2023). For example, when prompted to provide a list of references on the capacity of ChatGPT to replicate student writing, the online tool generated nonexistent citations. This undermines the credibility of the information, compromises academic integrity, and diminishes the educational value of the MUN experience. Moreover, because ChatGPT responses are influenced by the data on which it was trained—which may contain biases and inaccuracies present in online content—students unknowingly may present flawed arguments and perspectives in their position papers. To compound the problem, ChatGPT relies on research conducted primarily in wealthier countries and draws from texts that are not universally applicable (Mbakwe et al. Reference Mbakwe, Lourentzou, Celi, Mechanic and Dagan2023; Robertson Reference Robertson2024). Furthermore, there also is the risk that ChatGPT can generate false or fake information and spread untruths (Baidoo-Anu and Owusu Ansah Reference Baidoo-Anu and Ansah2023; Megahed et al. Reference Megahed, Chen, Ferris, Knoth and Jones-Farmer2023; Metz Reference Metz2022; Qadir Reference Qadir2023). For students who rely exclusively on ChatGPT to generate their research, this could result in them unknowingly writing inaccurate position papers. Because AI “learns” from a primarily Western-centric–generated corpus, this could have a particularly adverse effect on writing position papers for countries of the Global South or for small, non-English-speaking developing countries by introducing biases and/or false and incomplete information.
Although ChatGPT can improve student writing, it cannot provide nuanced, context-specific feedback tailored to individual students’ needs. Unlike human instructors and mentors who provide personalized guidance and insight, the AI feedback may lack depth or relevance, thereby limiting students’ opportunities for meaningful learning and improvement. Relying solely on ChatGPT for drafting position papers may lead to a diminished emphasis on critical-thinking and independent-research skills. Students may become overly reliant on ChatGPT’s capabilities, neglecting the importance of conducting comprehensive research, analyzing diverse perspectives, and synthesizing complex information on their own. This undermines the educational objectives of MUN simulations, which aim to foster independent thinking and analytical skills. By relying on ChatGPT to write their position papers, students may be compromising their ability to be competitive at actual MUN competitions. So much of a student’s performance at the conference depends on them having a deep understanding of the country that they are representing. If they have outsourced their research to ChatGPT, they likely will not have the knowledge and preparation necessary to effectively interact with their peers, negotiate resolutions, and represent their assigned country’s interests.Footnote 7 Moreover, ChatGPT’s performance may vary depending on input prompts, data quality, and algorithm updates, thereby diminishing the promise of democratizing or leveling the playing field across delegates.
CONCLUSION
ChatGPT offers a convenient and accessible means of generating draft position papers for MUN; however, in its current iteration, it cannot replace human-guided research. What we learned from our analysis is that AI is highly dependent on human-generated input. On its own, it does not have the capacity to effectively replace student-led analysis. By understanding the constraints of AI technology and supplementing it with a human-centered approach, critical thinking, and ethical considerations, individuals in the classroom and beyond could leverage ChatGPT more effectively as a supportive tool in the writing process while preserving the authenticity and integrity of their work.
Whereas the evidence presented in this study drew on a specific subset of student writing, our lessons learned have implications for political science education more broadly. Most students will use tools such as ChatGPT at some point in their education. The question then arises: How do we make the best use of this technology while still promoting student learning, critical thinking, originality, and academic honesty? There is much to embrace about AI: it has the potential to address challenges in education and promote inclusive and equitable learning opportunities. Indeed, it could have the capacity to democratize education. It should not replace, however, human-led research and innovation. As demonstrated in our study, ChatGPT can handle high-level queries and synthesize information effectively to replicate and, in some cases, significantly improve student writing. However, it does not perform as well in providing more nuanced, creative, and sophisticated analyses. Perhaps the best way forward may be to train students how to use ChatGPT as an instrument to complement their research and writing but not as a replacement for their own original work. This would require a deep understanding of how ChatGPT works and handles queries as well as training in how to assess the reliability of AI-generated output.
ChatGPT can handle high-level queries and synthesize information effectively to replicate and, in some cases, significantly improve student writing. However, it does not perform as well in providing more nuanced, creative, and sophisticated analyses. Perhaps the best way forward may be to train students how to use ChatGPT as an instrument to complement their research and writing but not as a replacement for their own original work.
Supplementary material
To view supplementary material for this article, please visit http://doi.org/10.1017/S1049096524000799.
ACKNOWLEDGMENTS
The authors are grateful to Dean Yan Searcy of the College of Social and Behavioral Sciences at California State University, Northridge, for his support of this study. The authors also thank Daniel N. Posner for his feedback and suggestions.
DATA AVAILABILITY STATEMENT
Research documentation and data that support the findings of this study are openly available at the PS: Political Science & Politics Harvard Dataverse at https://doi.org/10.7910/DVN/5ZTSXK.
CONFLICTS OF INTEREST
The authors declare that there are no ethical issues or conflicts of interest in this research.