1. Introduction
Multilingual machine translation (MT), as a technological innovation, has permeated the realm of language learning and is commonly exploited by foreign language learners (Jiménez-Crespo, Reference Jiménez-Crespo2017). The recent development of Google Translate (GT), in particular, has attracted novel research attempts to understand its merits and limitations as an instructional tool. A GT-based approach to second language (L2) writing has recently received attention due to its potential to support L2 writing. However, when the purpose of using online translation tools is to foster L2 acquisition, lack of understanding of the nature and abilities of such tools can result in detrimental effects (Ducar & Schocket, Reference Ducar and Schocket2018). To better understand what GT can and cannot offer L2 writers, a thorough investigation of its strengths and limitations is warranted. Most previous research on GT has reported findings pertinent to the quality of older GT versions, compared with the new version, which was launched in 2016. Furthermore, many previous studies mainly focused on using MT in L2 writing in relation to proficiency levels (e.g. Garcia & Pena, Reference Garcia and Pena2011). The few studies that have explored genres (e.g. Chung & Ahn, Reference Chung and Ahn2021) have not thoroughly investigated the quality of GT’s outputs in relation to the main writing genres that L2 learners encounter in educational settings. Therefore, this study, from a linguistic standpoint rooted in the “learning-to-write” approach (Manchón, Reference Manchón, Hyland and Shaw2016), is set out to systematically examine GT’s output quality and explore EFL learners’ attitudes toward GT use across writing genres.
In language learning, translation has received considerable attention in the last few years as an instructional strategy (Wilson & González Davies, Reference Wilson and González Davies2017). According to Cook (Reference Cook2010), there is a natural connection between first language (L1) and L2 in EFL writers’ minds. As learners engage in translation naturally and spontaneously when they attempt to write in the target language (Beiler & Dewilde, Reference Beiler and Dewilde2020), Leonardi (Reference Leonardi2010) pointed out that this connection should be exploited. Moreover, research has suggested that translation can facilitate the improvement of L2 writing (Cohen & Brooks-Carson, Reference Cohen and Brooks-Carson2001; Lee, Reference Lee2020). Cohen and Brooks-Carson (Reference Cohen and Brooks-Carson2001) maintained that translation can facilitate exposure to lexical items beyond learners’ competence, providing linguistic support that can increase the improvement of syntactic complexity and cohesiveness in L2 writing.
Research has shown that when L2 writing tasks involve the use of technology, GT is the most widely utilized tool by L2 learners (Garcia & Pena, Reference Garcia and Pena2011). However, due to several issues with MT, language instructors might not be inclined to integrate this tool into their teaching practices (Clifford, Merschel & Munné, Reference Clifford, Merschel and Munné2013). For instance, a number of drawbacks have been reported in MT literature, such as its inability to provide support beyond the lexico-grammatical spectrum (Groves & Mundt, Reference Groves and Mundt2015), ambiguous and misleading renderings including issues in pragmatic expressions (Ducar & Schocket, Reference Ducar and Schocket2018), erroneous translation of idioms (Godwin-Jones, Reference Godwin-Jones2015), as well as linguistic inaccuracy and contextual errors (White & Heidrich, Reference White and Heidrich2013). Nevertheless, several researchers have called for utilizing, instead of rejecting, MT technology (Groves & Mundt, Reference Groves and Mundt2015; Jiménez-Crespo, Reference Jiménez-Crespo2017). Effective use of this technology demands an understanding of what MT can and cannot do (Ducar & Schocket, Reference Ducar and Schocket2018). Godwin-Jones (Reference Godwin-Jones2015) pointed out that the scaffolded learning that MT provides can foster the development of learners’ written products. Moreover, linguistic modeling (Amaral & Meurers, Reference Amaral and Meurers2011), extending linguistic knowledge, and raising awareness are merits that can be harnessed by using MT.
Previous research has found that recent advances in MT have contributed products with higher accuracy in vocabulary and grammatical structures (Lee, Reference Lee2020). Furthermore, recent research has found that MT has the potential to enhance the quality of L2 learners’ written production (e.g. Lee, Reference Lee2020; Tsai, Reference Tsai2019, Reference Tsai2020). For instance, MT can increase writing fluency and decrease errors (Garcia & Pena, Reference Garcia and Pena2011), promote lexical fluency (Chen, Huang, Chang & Liou, Reference Chen, Huang, Chang and Liou2015), and can be employed as corpora that cater to students’ needs in the language classroom (Bernardini, Reference Bernardini2016). Additionally, an important advantage of MT is the provision of feedback to L2 writers, providing assistance to correct mistakes that can take multiple forms, including syntactic alternatives and word choice (Lee, Reference Lee2020), thus raising language learners’ metalinguistic awareness (Abraham, Reference Abraham, Abraham and Williams2009).
In 2006, Google launched its MT tool, which was based on processing phrases and words as the unit of translation (Le & Schuster, Reference Le and Schuster2016). In 2016, Google shifted to a new system (neural MT) that processes the whole sentence instead of smaller units; nevertheless, according to Le and Schuster (Reference Le and Schuster2016), it is still prone to producing decontextualized sentences and errors, such as missing lexical items. With Google promoting its translation processes using a new artificial intelligence–based system (Ducar & Schocket, Reference Ducar and Schocket2018), MT is able to produce output of a higher quality (Jia, Carl & Wang, Reference Jia, Carl and Wang2019). Castelvecchi (Reference Castelvecchi2016) noted that, according to Google, the new technology can decrease the rate of errors by approximately 60%; thus, GT’s output, from a grammatical perspective, is close to, to say the least, the minimum English language requirement that should be met in many higher education institutions (Mundt & Groves, Reference Mundt and Groves2016). Ducar and Schocket (Reference Ducar and Schocket2018) pointed out that GT technology has developed enormously in terms of accuracy, with noticeable enhanced coherence at the sentential level; however, GT still has limitations regarding the accuracy of some grammatical structures, culture-specific expressions, and context.
Utilizing different approaches (L1 vs. direct L2 writing), several recent studies have investigated the potential of GT as an L2 writing assisting tool and compared student-generated texts (SGTs) with Google-translated texts (GTTs). Stapleton and Kin (Reference Stapleton and Kin2019) found that language teachers considered the difference in grammatical accuracy between SGTs and GTTs significant in favor of GT and similar in terms of vocabulary and comprehensibility. In another study, Lee (Reference Lee2020) reported that GT use can decrease lexico-grammatical mistakes and enhance writing revisions. Additionally, Lee reported learners’ positive views on using GT in L2 writing. Moreover, Tsai (Reference Tsai2020) found that GTTs had a higher accuracy and style level, with students’ satisfaction with GT use in L2 writing. More recently, Chon, Shin and Kim (Reference Chon, Shin and Kim2021) reported that GT use helped to reduce the gap between lower and higher proficiency students, GTTs had a higher syntactic complexity level, and utilizing GT fostered the use of less commonly used vocabulary. In another study, Tsai (Reference Tsai2019) found that GTTs had more words, richer content, more advanced vocabulary, more accuracy, and learners had positive views on GT in L2 writing. Furthermore, the findings in Chung and Ahn (Reference Chung and Ahn2021) suggested a high level of accuracy improvement, and the support GT provides L2 writers, or lack thereof, was found to be influenced by genre and proficiency. In addition, students’ attitudes were highly positive toward the use of GT. The above studies have suggested that GT has noticeably developed regarding grammatical accuracy, vocabulary, and content. There is, however, no conclusive evidence whether GT improvement is applicable to different writing genres. The previous research on GTTs has paid little attention to genre effect on GT’s output, which is tackled in the present study.
1.1 Research inquiry
This study examines an under-investigated area within L2 writing–related GT research, yet pivotal to maximizing the potential of GT to support L2 writing. Using an MT approach, this study investigates and compares GTTs with SGTs across four writing genres (narrative, descriptive, expository, and persuasive). Therefore, it aims to better understand GT’s potential to support L2 writing (Lee, Reference Lee2020; Stapleton & Kin, Reference Stapleton and Kin2019) and examine which L2 writing task(s) GT can be (more) effective as an assisting tool. Comparing GTTs with texts produced by higher proficiency students (English majors with a B1 level), as opposed to those produced by lower proficiency students, can yield more insight into GT’s strengths and weaknesses in EFL writing (Tsai, Reference Tsai2020). Research has shown that GT has improved; nevertheless, little is known about the extent to which GT’s output has enhanced in relation to different writing genres. Therefore, since GTTs would outperform SGTs produced by lower proficiency students, as suggested in the literature, this study attempts to set current GTTs against potentially higher literacy, style, and content levels of SGTs; thus, it can more precisely reveal GTTs’ abilities in different genres.
Despite the pervasiveness of MT use in language learning, research has not provided conclusive evidence relevant to the reliability of its output (Lee, Reference Lee2020). Therefore, the quality of linguistic output across writing genres can contribute to the establishment of GT’s reliability, or lack thereof. Previous findings on the quality of MT’s output might not be generalized to currently more developed tools (Stapleton & Kin, Reference Stapleton and Kin2019). Although genre plays a significant role in L2 writing (Yoon & Polio, Reference Yoon and Polio2017), research on genre effect on MT performance has not received due attention (Chung & Ahn, Reference Chung and Ahn2021). The few studies that have examined GT in relation to genres have not drawn clear distinctions among GTTs across genres and have not extensively investigated EFL learners’ attitudes toward the quality of GT’s outputs. Therefore, using multiple data sources, this study contributes to the recently growing body of research on GT-assisted L2 writing, and poses three research questions:
-
1. Do GTTs have a higher quality in relation to literacy, content, and style in particular genres?
-
2. Is there a significant difference between SGTs’ and GTTs’ quality in relation to literacy, content, and style across genres?
-
3. What are EFL learners’ attitudes toward GTTs’ quality in different genres?
2. Methods
2.1 Participants
Forty-one Arabic-speaking undergraduate students (Age: M = 22.55, SD = 1.28) majoring in English language and translation at a Saudi university participated in this study. The participants had studied English for a minimum of 7 years, in addition to a 4-month mandatory intensive English program, prior to enrollment in the undergraduate program. They had been enrolled in the English program for a minimum of 2 years prior to participating in this study. Their L2 proficiency (section 2.2.1) corresponded to B1 level on the Common European Framework of Reference for Languages (CEFR) scale. The participants attended a class taught by the teacher/researcher, meeting once a week for 2 hours and 30 minutes, and voluntarily participated after providing their informed consent.
2.2 Procedures
A mixed-methods approach was used, and data were collected in 7 weeks. The procedures (see supplementary material) consisted of two preliminary steps (proficiency tests and workshops), four computer-mediated L2 writing sessions, four computer-mediated L1 writing and translating sessions, written reflections, four questionnaires, focus group discussion, and individual interviews. The procedures were conducted in a language lab (equipped with computers and internet access), excluding the interviews and focus group discussion, due to class time constraints. Two available online tools were utilized in this study: (1) GT (https://translate.google.com) and (2) Google Docs (GD) (https://docs.google.com/). The latter was used to collect students’ texts in all writing sessions. The researcher shared a GD document with each student. Subsequently, two workshops were conducted to familiarize the students with GT and GD use. The following sections provide a detailed description of the procedures.
2.2.1 Proficiency tests
Several procedures were used for preparing the tests and calculating the results (Chung & Ahn, Reference Chung and Ahn2021). The first test was a cloze test (Cronbach’s α = .711) that measured English grammar and vocabulary knowledge by using items from Test Your English (https://testyourenglish.net). The purpose of utilizing a cloze test was based on its reliability as an effective language proficiency measure (Brown, Reference Brown2002). Twenty multiple-choice items, corresponding to a score of 20 points, were included in the test that was administered to the students using the Google Forms platform and completed in 15–20 minutes. In the second test, a writing task, a validated website (Cambridge English Write & Improve; https://writeandimprove.com) was used. In the lab, the students completed the test in 25–35 minutes by responding to an opinion task – children playing video games. This website provides automated assessment based on the CEFR six-point scale – from A1 (beginner) to C2 (mastery level). Subsequently, the scale was replaced with corresponding numerical values (A1 = 1–C2 = 7). Using a score ranging from 1 to 20 points, the numerical values were recalculated and the first (M = 13.60) and second (M = 13.04) tests scores were combined (M = 26.64).
2.2.2 Writing sessions
Prior to each session, the researcher posted a genre-related task to the GD documents. The students completed each task by directly writing in English for around 30 minutes, writing a corresponding text in L1 for around 30 minutes, and translating L1 texts into English using GT (see supplementary materials). In the lab, the researcher monitored the students’ writing sessions by using the shared GD documents shown on the instructor’s computer. The students were instructed not to use any tools that can provide linguistic support, and GT was only utilized when the students completed their L1 texts. Moreover, the students were instructed to copy GTTs as they appeared in the GT target text box without making any changes and paste them into GD – confirmed by the Version History in GD. Subsequently, the students compared SGTs with GTTs. The students engaged in the following tasks respectively: narrative (What is the most interesting story that you have ever heard?), descriptive (Describe your dream house.), expository (Compare and contrast studying in high school vs. at university.), and persuasive (In your opinion, should all people learn English?). The purpose of including these particular tasks was to control for background knowledge.
2.2.3 Written reflections
Following the writing, translating, and comparing tasks, the students reflected on the quality of GTTs. The purpose of the reflection was to enable the students to openly express their views about their comparisons. That is, it provided a less-structured strategy to elicit the students’ views than the structured questionnaires, semi-structured group discussion, and interviews.
2.2.4 Survey questionnaires
The researcher developed four 17-item genre-related questionnaires, based on an extensive review of literature and the scope of the study, to explore the students’ attitudes toward using GT in different genres. Each questionnaire was developed based on a five-point Likert scale (where 1 = “strongly disagree” and 5 = “strongly agree”). The items (Table 3) explored attitudes toward GT’s quality, including grammatical accuracy, vocabulary choice, content accuracy, context appropriacy, and general quality. Furthermore, they explored the extent to which GTTs helped the students to notice how SGTs can be improved in a particular genre(s). All questionnaires were validated by two expert language instructors and were piloted to a small sample to measure internal consistency, resulting in a high reliability (Cronbach’s α = .855, .858, .866, and .864) for narrative, descriptive, expository, and persuasive questionnaires respectively. The questionnaires were administered using the Google Forms platform, and the students took 12–15 minutes to submit their responses to each questionnaire. For ethical considerations, the researcher was not present in the lab when the students took the questionnaires.
2.2.5 Focus group discussion and individual interviews
The discussion was carried out in an online environment utilizing Blackboard – a learning management system – in which the participants (n = 12) were invited to a virtual meeting with the researcher. The discussion took around 1 hour and 15 minutes and included open-ended questions about the experience with GT across genres, GTTs’ quality, and strengths and limitations of GT use. The interviews (n = 4) were conducted using Blackboard, with extracts of each interviewee’s texts and their corresponding GTTs displayed as a stimulus. The interviews were mainly centered around GTTs’ quality in the four writing tasks and included several questions (e.g. about grammatical and content accuracy, vocabulary and phrases choices, and context appropriacy). Individual interviews lasted for around 45 minutes. The discussion and interviews were audio-recorded, using computer software, and transcribed for analysis.
2.2.6 Data analysis
Computational approach to text analysis was employed. Computational assessment of writing is a valid quantitative approach, as it can increase the extent of objectivity, consistency, and accuracy in assessment (Godwin-Jones, Reference Godwin-Jones2018; Tsai, Reference Tsai2017). Analysis was conducted by utilizing one writing assessment freeware (1Checker; http://www.1checker.com) and a web-based writing assessment tool (VocabProfilers (VP); http://www.lextutor.ca/vp/eng) to analyze SGTs’ and GTTs’ quality in relation to the following criteria: (1) literacy, (2) content, and (3) style (Tsai, Reference Tsai2020). Literacy was analyzed, using 1Checker, based on grammatical accuracy and errors probability – dividing the number of errors by the number of text words (Tsai, Reference Tsai2020). Content was analyzed, using VP, based on the inclusion of content words, as an indicator of information provision (Tsai, Reference Tsai2020), as opposed to functional words. Style was analyzed, using VP, based on different categories of vocabulary items. VP utilizes text corpora and, accordingly, can be employed to categorize vocabulary items in written texts into the following: lexical density, K1, K2, off-list vocabulary items, and academic word list (AWL) (Tsai, Reference Tsai2017). Lexical density refers to the ratio of words providing information to those indicating grammatical relationships, K1 and K2 refer to the most commonly first and second (more advanced) thousand used words respectively, and off-list vocabulary items refer to advanced words of which learners normally lack knowledge (Tsai, Reference Tsai2017). Since off-list vocabulary items and AWL generally comprise more advanced words, often beyond learners’ vocabulary knowledge, they were integrated into one category (Tsai, Reference Tsai2019).
Multiple analyses using SPSS Version 24 were conducted to measure the questionnaires reliability, obtain descriptive statistics, and examine possible significant differences among writing parameter means using one-way ANOVAs with post hoc tests and independent sample t-tests. The aforementioned parametric tests were utilized based on testing the assumptions of normality using the Shapiro–Wilk test (p > .05). For qualitative data, a thematic analysis was used (Braun & Clarke, Reference Braun and Clarke2006), utilizing an inductive approach to analysis. The data were analyzed by two researchers (the author and an expert language instructor and researcher) independently. Interrater reliability reached 93%, and all the remaining differences were fully resolved through discussion. Accordingly, major themes were developed based on code relevance, frequency, and importance. Data triangulation was utilized to further understand and explain the findings.
3. Findings
3.1 Quality of GTTs and SGTs across genres
Analysis (Table 1) of GTTs across genres (n = 164) suggests that total words in all genres are relatively similar, which validates the comparison among GTTs across tasks. The findings show that GTTs differ in grammatical accuracy. One-way ANOVA with post hoc test indicates a significant difference (p < .05) in favor of GTTs in the persuasive genre, with fewer grammatical mistakes and a lower probability of errors as compared with the descriptive genre. Concerning content words, GTTs have a significantly higher number in the persuasive (p < .01) and expository (p < .05) genres as compared with the narrative genre. Additionally, GTTs have a significantly higher number of K2 words in the narrative (p < .01) and descriptive (p < .01) genres as compared with the expository and persuasive genres. Furthermore, GTTs have a significantly higher number of off-list words/AWL in the narrative (p < .05) as compared with the persuasive genre.
Note. AWL = academic word list.
* Significant difference (p < .05). **Significant difference (p < .01).
Analysis of SGTs and GTTs (n = 328) across genres (Table 2) reveals that in the narrative genre, there are significant differences (p < .01) between SGTs and GTTs in literacy in favor of GTTs. Concerning the descriptive genre, there are significant differences in eight writing parameters in favor of GTTs, indicating a higher literacy level (p < .01), richer content (p < .01), and more off-list words/AWL (p < .05). Regarding the expository genre, there are significant differences in nine writing parameters in favor of GTTs, suggesting a higher literacy level (p < .01), more content (p < .01), more advanced words (p < .05), and more off-list words/AWL (p < .01). Similarly, there are significant differences in nine writing parameters in favor of GTTs in the persuasive genre, indicating a higher literacy level (p < .01), richer content (p < .01), more advanced words (p < .05), and more off-list words/AWL (p < .05).
Note. AWL = academic word list.
* Significant difference (p < .05). **Significant difference (p < .01).
3.2 Students’ attitudes toward GT use across genres
The findings (see Table 3) indicate that, compared with the overall mean, items 9 (M = 4.00), 11 (M = 3.95), and 1 (M = 3.85) have the highest means in the narrative genre. This shows that the most positive views on GT’s ability are pertinent to the production of new phrases, vocabulary beyond students’ knowledge, and correct sentences. In the descriptive genre, items 1 (M = 3.97), 7 (M = 3.95), and 11 (M = 3.87) have the highest means, showing more positive views on GT’s ability to produce correct sentences, texts of acceptable quality, and vocabulary beyond students’ knowledge. Concerning the expository genre, items 1 (M = 3.92), 2 (M = 3.86), 9 (M = 3.86), 11 (M = 3.86), and 5 (M = 3.84) have the highest means. These items show the highest agreement regarding GT’s ability to produce correct sentences, accurate vocabulary choices, new and appropriately contextualized phrases, and vocabulary beyond students’ knowledge. In the persuasive genre, the items with the highest means are 1 (M = 3.95), 7 (M = 3.87), 11 (M = 3.82), and 17 (M = 3.79). Thus, the students hold highly positive views on GT’s ability to produce correct sentences, texts of acceptable quality, vocabulary beyond students’ knowledge, and students’ willingness to keep using GT to enhance L2 writing. Contrastingly, the items that share the lowest means are item 13 in the narrative (M = 2.67), descriptive (M = 2.95), expository (M = 2.92), and persuasive (M = 2.79) genres, as well as item 3 in the narrative (M = 2.74), descriptive (M = 2.87), and expository and persuasive (M = 2.97) genres. This suggests students’ negative views on GT’s abilities to produce accurate idiomatic expressions and link content ideas in a manner beyond students’ grammatical competence. By and large, the findings suggest positive views on GT’s outputs in the narrative as compared with more positive views in the descriptive, expository, and persuasive genres.
3.3 Written reflections
In the narrative task, GT can produce comprehensible texts. A student reported: “I don’t think that GT is that bad because in general it gave the full meaning of the text.” In addition, some students noted the issue of GT’s literal translation: “there were lots of words that were very literal.” In the descriptive task, the students reflected more on grammatical structures and vocabulary: “I see that GT often excels in formulating sentences, also in vocabularies and choosing them.” GT has the ability to produce accurate words, sometimes outperforming the students’ word choices: “It seems that GT is picking some words that are more appropriate than mine.” Moreover, the students noted some grammatical issues: “but in Google translation, there are some errors in grammar, such as starting new sentences with the conjunction and.” In the expository task, the reflections showed that GT produced even more accurate grammatical structures: “Close enough to my writing but grammatically I think GT is better.” Furthermore, GT produced accurate lexical items: “GT was picking good words and correct phrases, and it might be more acceptable than my writing.” Nevertheless, GT might negatively affect text comprehensibility: “it’s kind of taking meanings of words only and that could make the reader confused.” The reflections on the persuasive task suggested more satisfaction with GT’s renderings and vocabulary choice in particular: “In general it is possible to use GT in persuasive texts. It is very helpful” and “GT always gives me more vocabularies that I have not thought about.” However, some syntactic issues and literal meanings were reported: “the other thing that I didn’t like is that it gives you the same order of the Arabic text and that’s not good when you read English texts” and “it has several mistakes in literal meaning for words, and also some phrases do not mean what I wrote.”
3.4 Focus group discussion
For most of the students, GTTs’ quality across genres was, broadly speaking, satisfying, as GT’s outputs can enhance various linguistic components in their written texts:
S7: In the four texts, GT always provides more choices for phrases. For me, content is somehow satisfying, and in grammar, GT is kind of excellent. GT provides more options in phrases and vocabulary.
Not unexpectedly, some students reported issues in GTTs’ quality, including literal translation in particular, which is conducive to producing mistakes:
S12: I would not fully rely on GT, because it might make some mistakes in words since the translation is literal. Also, since you know your L1, when you read the translated text, you have problems in some words and you know they are wrong.
One apparent strength of GTTs across genres, according to most of the students, is centered around the provision of lexical items beyond their knowledge, or those that could not be retrieved while writing. By translating words, GT extends the students’ abilities to express ideas. Moreover, the provision of phrasal choices and the ability to formulate them were observed in GTTs:
S9: Google translation helps with vocabulary, because sometimes one wants to express more about the topic but his target vocabulary is limited, but by using L1, one can express more, so GT can help.
S10: GT shows you whether a word can be used in this time or not, so you can produce a compatible text … GT still provides you with many options and more than one description and what is more compatible with the idea itself.
S4: GT always provides phrase choices. It helped me with adverbs writing. Usually I write “in a quick manner.” GT writes it “quickly.” So, it provides fewer words and constructs adverbs and makes it easier for me and produces fewer words.
GT can additionally provide assistance in grammatical structure, according to half of the students. In particular, the lower a writer’s level of grammatical competence, the more effective GT’s support can be:
S11: Narrative writing provides much space for grammatical structure. When you tell a story, GT is good, provides you with good grammatical structures. It forms sentences in different ways from what you form, and sometimes they are better, and as a person with some weakness in grammar, it can help me a lot.
S2: In grammar, if a student has a lower level than average, GT can help much. But if a student has an advanced level, it does not provide any support for grammar and sentences structure.
Most of the students were satisfied with the support GT can provide across genres, especially in the descriptive genre. As descriptive writing requires attending to details and the manner of description, GT has the ability to provide the needed support:
S1: The descriptive task demands specific vocabulary and description, so I think using GT with descriptive tasks is necessary, important, and very good … GT helped me the most in the descriptive type. It provided me with words and other options in words so I can choose from.
The majority of the students believed that the recent version of GT can effectively facilitate L2 writing process, and sometimes can be a useful preliminary step. The support GT can provide includes increasing efficiency and confidence in L2 writing:
S6: I think there is a benefit from using GT in the future and especially the way we used it. I think as a first stage. You use it as a reference tool but not as a basic tool. You refer to its translation quickly and structure language by yourself, I think GT can be very helpful.
S1: GT produced a variety of vocabularies and they help and increase vocabulary size, and give confidence. For me I would use it for sure. It opens my eye to more words. Also, I think the speed of translating as GT provides equivalents without delay.
The majority of the students reported their willingness to use GT in the future. Enhanced GT’s output seems to encourage exploiting this tool for various purposes:
S9: I would use GT in all the four tasks. I would use it for grammatical structure. In vocabulary, I think it would make an addition, even with spelling, I would refer to it. One translates in the beginning, notices mistakes, takes the correct things, identifies mistakes, and makes changes and additions to the text.
3.5 Individual interviews
The students reported that GT can produce accurate grammatical structures, with varying levels of accuracy, with half of the students reporting the impact of L1 input quality on accuracy:
S3: I think GT does the job, and becomes excellent when you provide the suitable source text. GT is going to provide a very excellent translation in grammar. But in general, if you provide it with any text, it is not going to provide you with something less than OK.
GTTs can facilitate L2 writing across genres, especially with vocabulary. Half of the students noted that GT can provide accurate words with which they are unfamiliar. Moreover, the majority of the students reported that GT can provide words that they could not retrieve while writing:
S1: For vocabulary, sometimes, GT provides me with words I actually did not think of or sometimes I do not know, and sometimes no, the words I wrote are better. But in general, GT words are good and made a nice addition to the text along with my words.
Although GTTs might have issues in terms of ideas accuracy, such issues did not deter the students’ positive views on GT’s content accuracy:
S3: GT transferred the content accurately. I do not see there were any issues in translation. It transferred the content accurately in all texts; but what I noticed as GT described it in a more accurate way was the comparison text.
Most of the students expressed that GT’s ability to link smaller units (sentences) is acceptable. Interestingly, all the students reported that when GT translates a whole text, it appears more accurately contextualized and coherent than smaller units individually:
S2: In sentences, GT was OK in both meaning and vocabulary structures. But for the whole text, GT was very good.
Context appropriacy in GTTs was viewed positively. Nevertheless, the students reported that a high level of context appropriacy was not shared across genres. Half of the students noted that the most appropriate contextualized GTTs were found in the descriptive genre, given that L1 input can influence the context appropriacy of GTTs across genres:
S4: GT translates using a very appropriate context. I remember in the descriptive [text] GT was good with context.
S3: It depends on the input text. But, most of the texts that we used as input in this activity were clear, and GT transferred the context in an excellent way.
Overall, the students were satisfied with GT’s general quality across genres, provided that post-editing is necessary with GTTs:
S4: GT is very good, or might be excellent in the four texts. But, of course, I have to check after translation because it will not be completely perfect, you see in the English text grammatical mistakes in writing that should be checked.
4. Discussion
GTTs in particular genres have a higher quality of literacy, content (persuasive and expository), and style (narrative and descriptive). Comparing SGTs with GTTs suggests a higher literacy level in GTTs in all genres, and higher style and content levels in the descriptive, and even higher in the expository, and persuasive genres. In line with these findings, the analysis of the questionnaires reveals that GT’s output in the persuasive and expository genres has relatively the most positive views, whereas it is less positively viewed in the narrative genre. Moreover, there are more positive attitudes toward particular writing parameter–related items within specific genres. The views on finding and learning new vocabulary in narrative GTTs are in harmony with the higher style level in the narrative genre. Additionally, attitudes toward grammatical accuracy, content, and motivation to utilize GT in expository and persuasive genres in the future reflect the higher quality of GTTs in these genres. The qualitative analysis suggests satisfaction with GTTs’ general quality in terms of literacy and style, supporting the findings indicated by higher levels of several GTTs writing parameters. Overall, the findings indicate the powerful effect of genre on GT’s output. Concerning overall words, with similar trends reported by Tsai (Reference Tsai2019) in the argumentative genre, the significant increase in favor of the descriptive, expository, and persuasive GTTs signals how the students were able to extend their expressions, providing more content by writing in L1 followed by using GT. However, it is crucial to interpret writing fluency in GTTs with caution, as it might otherwise be attributed to language pairs.
Persuasive GTTs significantly outperformed descriptive GTTs in literacy; however, these findings are, interestingly enough, not entirely in line with the other findings. Although written reflections indicate some grammatical issues in descriptive GTTs, there are more positive attitudes toward GT’s accuracy. Analysis suggests that five items in the descriptive questionnaire have higher means than their counterparts in other genres. That is, GT’s acceptable quality, advanced phrases and grammatical structures beyond students’ competence, and outputs that can increase L2 writers’ confidence might have positively influenced attitudes toward descriptive GTTs’ accuracy. Persuasive and expository GTTs significantly produced more content than narrative GTTs, with lexical density in persuasive GTTs providing further evidence. Given that narrative GTTs have the highest overall words mean, they produced more function words among all GTTs. This might be explained by the nature of the narrative genre, containing several function words associated with providing details in narration. Concerning style, narrative and descriptive GTTs have more advanced vocabulary than expository and persuasive GTTs. This is in line with the questionnaire and qualitative findings, as attitudes are more positive toward GT’s production of new vocabulary and phrases in narrative GTTs and specific descriptive vocabulary in descriptive GTTs. Interestingly, when L2 writing involves description, GT seems to produce high-quality texts. Similar trends have been reported in previous research. For instance, Chung and Ahn (Reference Chung and Ahn2021) found that more advanced vocabulary appeared in narrative than in argumentative GTTs. Additionally, the findings of Chon et al. (Reference Chon, Shin and Kim2021) revealed that descriptive GTTs had more advanced vocabulary. These findings indicate the effect of genre on GT’s output quality (Tsai, Reference Tsai2019), suggesting that GT’s renderings have distinct style levels among genres.
Regarding the comparison between SGTs and GTTs, the only significant difference in the narrative genre is found in literacy, suggesting similar levels in the other writing parameters. In reference to the CEFR scale, these findings show that narrative GTTs, relatively speaking, have reached a similar level of style and content and a higher level of accuracy as compared with texts produced by learners at B1 level. For the descriptive, expository, and persuasive genres, the significant differences are associated with several writing parameters, indicating richer content, higher literacy, and higher style levels (particularly in expository and persuasive GTTs). These findings reveal that descriptive, expository, and persuasive GT’s output has noticeably improved with a higher quality than texts produced by learners at B1 level. Furthermore, examining GT’s renderings in all genres suggests that GT is more reliable, as an assisting CALL-based writing tool, to utilize in the descriptive, expository, and persuasive genres than in the narrative genre for better L2 writing outcomes.
The findings pertaining to literacy, content, and style are consistent with previous research. The difference between SGTs and GTTs literacy supports the findings of Stapleton and Kin (Reference Stapleton and Kin2019), who reported a similar difference in the argumentative genre. Additionally, the higher style level of GTTs in the persuasive genre supports Tsai’s (2019) findings, as GT generated more advanced vocabulary. GT’s accuracy improvement (Ducar & Schocket, Reference Ducar and Schocket2018) is evidenced by the analysis of all genres, and supports previous research (Stapleton & Kin, Reference Stapleton and Kin2019; Tsai, Reference Tsai2019, Reference Tsai2020), reporting that GTTs had a higher literacy level than SGTs. Furthermore, GTTs have more content and better style in all genres except for the narrative. Similar findings were reported by Tsai (Reference Tsai2019, Reference Tsai2020) regarding the provision of more content and advanced words with GT use. Overall, in line with Jia et al. (Reference Jia, Carl and Wang2019) and Tsai (Reference Tsai2019), current advances in MT contribute to the production of higher quality outputs. The more positive views on GT in all genres except for the narrative support previous research (Chung & Ahn, Reference Chung and Ahn2021), indicating more positive attitudes toward GT’s argumentative than narrative outputs. These findings suggest that genre has an impact on students’ attitudes toward GT’s output quality (Chung & Ahn, Reference Chung and Ahn2021). For instance, the narrative genre requires more creativity, which can be achieved through the provision of various lexical items, whereas the argumentative genre requires less creative writing and concrete lexical items, which result in direct and more accurate written products (Yoon & Polio, Reference Yoon and Polio2017).
Students’ satisfaction with GT’s grammatical accuracy differs across genres. This is, however, not in harmony with the fact that all GTTs are significantly better in literacy than SGTs (Tsai, Reference Tsai2019, Reference Tsai2020). Such a discrepancy might be attributed to deep-seated perceptions of past experiences with GT failures (Tsai, Reference Tsai2020). The findings indicate that, with enhanced accuracy, GT can help to produce better texts, and by translating L1 texts into L2, GT extends students’ abilities to express meaning and familiarizes them with various ways to formulate their ideas (Bernardini, Reference Bernardini2016). Although GT’s accuracy seems to depend on L1 input quality, literal translation seems to impact GTTs’ quality in all genres, which is not unexpected of some GT’s outputs (Lee, Reference Lee2020; White & Heidrich, Reference White and Heidrich2013). Despite some issues with accuracy, GT still has the ability to produce similar texts to those of L1 with acceptable content accuracy. GT can accurately translate commonly used idioms (Ducar & Schocket, Reference Ducar and Schocket2018); however, students’ negative views in this regard might be ascribed to deep-rooted perceptions in previous GT use. Moreover, most of the questionnaires show moderate satisfaction with GTTs. Expectedly, due to their higher L2 proficiency, English majors might be highly critical of L2 use, which includes technology-based language output. Lee (Reference Lee2020) and Chung and Ahn (Reference Chung and Ahn2021) reported similar trends with higher proficiency students. Interestingly, there are more positive views on GT’s outputs as larger language units. This might signal the improvement of GT’s new technology when processing a whole text. With reference to context appropriacy, the positive views on GT’s quality across genres support Lee’s (2020) findings regarding GT’s ability to appropriately contextualize vocabulary. However, given that L1 input poses an impact on context, different levels of context appropriacy in GTTs were reported by the students, with more appropriate context found in descriptive GTTs.
The findings suggest that GT can be effective as a preliminary step in the writing process (Tsai, Reference Tsai2020). In addition to increasing L2 writing efficiency and confidence (Chung & Ahn, Reference Chung and Ahn2021; Tsai, Reference Tsai2020), one major advantage of GT across genres is the ability to provide new lexical items (Cancino & Panes, Reference Cancino and Panes2021; Chon et al., Reference Chon, Shin and Kim2021), or those that cannot be retrieved while writing. These findings support previous studies (Lee, Reference Lee2020; Tsai, Reference Tsai2019, Reference Tsai2020), indicating students’ positive views on GT’s ability in vocabulary use. Moreover, consistent with previous studies (e.g. Chung & Ahn, Reference Chung and Ahn2021), many students show their willingness to utilize GT in the future, suggesting a desire to use novel tools for L2 learning (Alrajhi, Reference Alrajhi2020). Interestingly, GT use in the future is the only item that has a gradual increase in the mean scores, from the narrative to the persuasive task, suggesting more willingness to use GT in other genres than in the narrative genre.
5. Conclusion
This study, unlike previous research, investigated genre effect on the quality of GTTs and found a strong effect in relation to literacy, content, and style. GT can produce similar (narrative) and higher quality (descriptive, expository, and persuasive) texts than those written by learners at B1 level (CEFR scale). A number of pedagogical implications can be drawn from this study. First, teachers need to be aware that current GT technology can impressively avoid several inaccuracies that have been reported in the past (Stapleton & Kin, Reference Stapleton and Kin2019). GT’s ability in different genres has impressively improved in literacy, content, and style. Using GT can serve as a scaffolding approach to L2 writing (Chon et al., Reference Chon, Shin and Kim2021) to support L2 learning (Mundt & Groves, Reference Mundt and Groves2016). Therefore, a GT-assisted L2 writing approach can be utilized as linguistic modeling (Amaral & Meurers, Reference Amaral and Meurers2011) across genres to extend learners’ linguistic knowledge and raise their awareness of L2 writing, thus contributing to the quality of learners’ written productions (Yoon, Reference Yoon2008). Instead of censuring GT for its drawbacks, teachers should raise learners’ awareness of its limitations identified in research (Chon et al., Reference Chon, Shin and Kim2021). Moreover, L2 learners should be instructed on how to effectively utilize GT (Beiler & Dewilde, Reference Beiler and Dewilde2020). GT can facilitate noticing new linguistic patterns and choices and provide opportunities for learners to learn how to use such patterns and alternatives in their L2 writing (Tsai, Reference Tsai2019). Second, given the issues with limited feedback from instructors to individual L2 writers, GT can serve as a valuable instant feedback tool (Chon et al., Reference Chon, Shin and Kim2021; Lee, Reference Lee2020). Third, previous research has indicated that MT provides more support for L2 students with lower proficiency in particular (Chung & Ahn, Reference Chung and Ahn2021; Garcia & Pena, Reference Garcia and Pena2011; Lee, Reference Lee2020). This study suggests that GTTs in the descriptive, expository, and persuasive genres highly outperformed texts generated by students whose proficiency level is not categorized as low. Thus, GT can effectively support students with higher proficiency in the aforesaid genres. Fourth, L2 instructors should be cognizant of genre effect on the support GT provides (Chung & Ahn, Reference Chung and Ahn2021). In spite of GTTs’ higher quality in almost all genres, GT seems to be more effective in particular genres. That is, for optimal use of GT in L2 writing, instructors should utilize it when it serves particular learning outcomes (e.g. with persuasive and expository tasks to increase grammatical accuracy and content, and with narrative and descriptive tasks to extend lexical repertoire).
It is important to note some limitations of this study. First, the recruited sample consisted of male students. Thus, future research in this area should explore female EFL students’ attitudes toward GT’s output. Second, this study examined four main writing genres. Therefore, examining GT’s output quality across more genres (e.g. technical and poetic) is worthy of exploration. Third, L2 learning gains from GT’s output were not investigated. Research should examine how GT’s products across genres can lead to cognitive gains harnessed by L2 learners.
Supplementary material
To view supplementary material referred to in this article, please visit https://doi.org/10.1017/S0958344022000143
Ethical statement and competing interests
The participants volunteered to participate in the study; consent forms were obtained in advance. All identities were anonymized in this report. The author declares no competing interests.
About the author
Assim Suliman Alrajhi is an associate professor of applied linguistics in the Department of English Language and Translation, College of Arabic Language and Social Studies, at Qassim University in Saudi Arabia. His research interests include technology-enhanced language learning, L2 writing, L2 vocabulary acquisition, and L2 assessment.
Author ORCID
Assim S. Alrajhi, https://orcid.org/0000-0002-6205-9943