Chinese spelling correction based on Long Short-Term Memory Network-enhanced Transformer and dynamic adaptive weighted multi-task learning

Mingying Xu; Jie Liu; Kui Peng; Zhen Li

doi:10.1017/nlp.2024.61

Chinese spelling correction based on Long Short-Term Memory Network-enhanced Transformer and dynamic adaptive weighted multi-task learning

Part of: NLP Editorial Board access (current content+all back content)

Published online by Cambridge University Press: 07 April 2025

Mingying Xu

Jie Liu ,

Kui Peng and

Zhen Li

Show author details

Mingying Xu: Affiliation:
School of Information Science, North China University of Technology, Beijing, China
Jie Liu*: Affiliation:
School of Information Science, North China University of Technology, Beijing, China China Language Intelligence Research Center, Beijing, China
Kui Peng: Affiliation:
School of Information Science, North China University of Technology, Beijing, China
Zhen Li: Affiliation:
Capital Normal University, Beijing, China
*: Corresponding author: Jie Liu; Email: [email protected]

Article contents

Abstract
Introduction
Related work
Task definition and approach
Experimental design and result analysis
Conclusion
References

Rights & Permissions

Abstract

Chinese spelling correction has achieved significant progress, but critical challenges remain, especially in handling visually and phonetically similar errors within complex syntactic structures. This paper introduces a novel approach combining a Long Short-Term Memory Network (LSTM)-enhanced Transformer for error detection and Bidirectional Encoder Representations from Transformers (BERT)-based correction with a dynamic adaptive weighting scheme. Transformer uses global attention mechanism to capture dependencies between any two positions in the input sequence. By processing each token in the sequence recursively, LSTM is able to more finely capture local context and sequential information within the sequence. Based on adaptive weighting coefficient, weights of multi-task learning are automatically adjusted to help the model better balance the learning process between the detection and correction network, enabling it to converge faster and achieve higher precision. Comprehensive evaluations demonstrate improved performance over existing baselines, particularly in addressing complex error patterns.

Keywords

Chinese spelling correction BERT Long Short-Term Memory Network Transformer dynamic adaptive weighting multi-task learning

Type: Article
Information: Natural Language Processing , First View , pp. 1 - 20

DOI: https://doi.org/10.1017/nlp.2024.61 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-ShareAlike licence (https://creativecommons.org/licenses/by-sa/4.0/), which permits re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

1. Introduction

Chinese spelling correction (CSC) aims to identify and correct spelling errors in Chinese while ensuring that correct words in sentences are not incorrectly corrected. Chinese spelling correction can improve text understanding, reduce the cost of manual review, and improve the efficiency of text processing. It has a wide range of applications in real life. For example, common recognition errors in speech recognition systems, spelling errors in user communication on social media, and so on require the assistance of Chinese spelling correction systems to improve the precision and efficiency of text comprehension. Overall, these errors are usually caused by the similarity in the visualization or pronunciation of Chinese characters (Wang et al. Reference Wang, Song, Li, Han and Zhang2018), which can be divided into pronunciation errors and visual errors. Pronunciation errors refer to Chinese characters that have the same pronunciation but differ in form and meaning, such as “ and ”, “ and ” and so on. Visual errors refer to Chinese characters that are very similar in form but differ in pronunciation and meaning, such as “ and ” and so on. Research has found that approximately 83% and 48% of errors (Liu et al. Reference Liu, Lai, Chuang and Lee2010) are caused by similarity in visual form and pronunciation, respectively. Many Chinese characters, although similar in visualization or pronunciation, have significant semantic differences. An example of Chinese spelling errors is shown in Table 1. In the input sentence, incorrect characters are highlighted in bold black. The correct target characters are represented in bold red.

Table 1. Examples of Chinese spelling errors

Compared to other languages, vocabulary combinations in Chinese have diversity and complexity. Chinese characters are highly dependent on context in terms of usage, semantics, and interpretation. This makes it necessary for CSC system to better focus on modeling global semantic expression of sentences and local subtle semantic relationships of words to achieve precise correction of Chinese typos. Early research on Chinese spelling correction mainly focuses on designing heuristic rules to detect different types of errors. These methods mostly rely on solid language knowledge and manually designed features, so they do not have generalization performance required for large-scale applications. In recent years, with the breakthrough progress of pre-trained language models (PLMs), Chinese spelling correction methods based on PLMs, especially BERT (Devlin et al. Reference Devlin, Chang, Lee and Toutanova2019), have been widely studied. Many BERT-based CSC methods have been proposed and have achieved significant correction effects (Mao et al. Reference Mao, Shan, Li, Chen and Zhang2023; Su et al. Reference Su, Lin, Xie and Cheng2023; Cao et al. Reference Cao, He, Wu and Dai2023). Most of these methods directly correct each character in the input sentence using BERT-based language models and confusion sets (Cheng et al. Reference Cheng, Xu, Chen, Jiang, Wang, Wang, Chu and Qi2020; Guo et al. Reference Guo, Ni, Wang, Zhu and Xie2021; Wang et al. Reference Wang, Che, Wu, Wang, Hu and Liu2021a). Cheng et al. (Reference Cheng, Xu, Chen, Jiang, Wang, Wang, Chu and Qi2020) propose SpellGCN, a specialized graph convolutional network that incorporates speech and visual similarity knowledge into the language model of CSC by constructing characters into a graph and generating vector representations of similar character interactions. Guo et al. (Reference Guo, Ni, Wang, Zhu and Xie2021) proposed global attention decoder (GAD). It learns the global relationship between potential correct input characters and potential incorrect candidate characters. It obtains rich global context information to mitigate the impact of local incorrect context information and designs confusion set oriented replacement strategy (CRS). Wang et al. (Reference Wang, Che, Wu, Wang, Hu and Liu2021a) proposed a new architecture called “Dynamic Connection Network (DCN),” which uses pinyin-enhanced generator to generate Chinese candidate characters and then models the dependency relationship between two adjacent Chinese characters using an attention-based network to improve the performance of CSC.

These methods indiscriminately correct each character in input sentence based on contextual information, causing the correction model to be misled by misspelled characters and potentially modifying originally correct characters into incorrect ones. To address this issue, some other works (Hong et al. Reference Hong, Yu, He, Liu and Liu2019; Zhang et al. Reference Zhang, Huang, Liu and Li2020; Li et al. Reference Li, Wu, Yin, Wang and Wang2021b) use error detectors to detect the positions of errors and guide the correction by masking these positions. Hong et al. (Reference Hong, Yu, He, Liu and Liu2019) propose a Chinese spelling method FASCell based on denoising autoencoder (DAE) and decoder. This model uses a relatively simple structure to achieve strong error detection and correction capabilities. It utilizes unsupervised pre-trained masked language model to reduce the amount of Chinese spelling correction data required for supervised learning. Zhang et al. (Reference Zhang, Huang, Liu and Li2020) proposed soft-masked BERT architecture. The error detection network and error correction network are connected through soft-mapping technology. Li et al. (Reference Li, Wu, Yin, Wang and Wang2021b) proposed a Chinese spelling correction framework called DCSpell, which consists of two modules: a detector and a corrector. The detector first finds the position of incorrect characters and then uses a transformer-based corrector to correct them. Although these methods can correctly identify the positions of errors, they weaken the visual or phonological features of misspelled characters, which are crucial for correction, and therefore cannot correct the wrong characters to the right ones. To leverage the visual and phonological features of misspelled characters while eliminating their misleading influence on the context, Zhu et al. (Reference Zhu, Ying, Zhang and Mao2022) proposed a new universal detector-corrector multi-task learning framework, in which the corrector obtains visual and phonetic features of original input, and uses post-fusion strategy to fuse the hidden state from corrector. Although great progress, it also has some shortcomings. The current popular detection network is based on Transformer structure. Transformer relies on global attention mechanism, which enables detection network to detect and correct spelling errors within a global context. However, in certain situations, it may lack sensitivity to local sequential information. Secondly, the multi-task learning suffers from insufficient adaptive ability, low precision, and high cost due to manually adjusted weighting coefficients.

In response to above issues, this paper proposes a CSC method based on Long Short-Term Memory Network (LSTM)-enhanced Transformer and dynamic adaptive weighted multi-task learning. The motivation for using an LSTM-enhanced Transformer in our model is grounded in the complementary strengths of LSTM and Transformer architectures, which together address both local and global dependencies critical for effective Chinese spelling correction. While Transformers excel at capturing long-range dependencies through global attention mechanisms, they can lack sensitivity to short-term, local sequential information (Vaswani et al. Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez and Kaiser2017), especially important for distinguishing visually and phonetically similar characters that rely on immediate context. By incorporating LSTM into the detection module, the model gains the ability to process characters sequentially and retain nuanced local dependencies, enhancing its sensitivity to contextual errors specific to Chinese. Additionally, in the multi-task learning framework with dynamic adaptive weighting, LSTM-enhanced Transformer enables the model to balance between detecting errors locally and correcting them globally, adapting its focus according to the task requirements. Furthermore, LSTM’s capacity to mitigate gradient vanishing over long sequences (Pascanu, Mikolov, and Bengio, Reference Pascanu, Mikolov and Bengio2013) supports stable information retention across the sentence, which is crucial for complex language structures in Chinese. Combining these capabilities, LSTM-enhanced Transformer framework offers a robust and flexible solution for accurate detection and correction in Chinese spelling tasks. Dynamic adaptive weighted multi-task learning strategy learns training parameters adaptively to achieve better Chinese spelling correction performance.

The contributions of this article are as follows:

(1) In the detection network, LSTM-enhanced Transformer can capture dependencies between any two positions in the input sequence while also finely capturing the local subtle semantic relationships of words and sequential information within the input sequence.
(2) A dynamic adaptive weighted multi-task learning framework is applied to help the model better balance the learning process between the detection and correction network, leading to faster convergence and higher performance.
(3) We conducted sufficient experiments on publicly available datasets SIGHAN13-15, and the experimental results validated the effectiveness of the proposed method.

2. Related work

Chinese spelling correction task is a long-standing and challenging endeavor that has been a focal point for researchers for many years. With the advancement of natural language processing and deep learning technologies, more and more new techniques are being applied to the field of Chinese spelling correction, resulting in continuous improvements in correction effectiveness. According to the development history of correction technologies, they can mainly be divided into rule-based methods, machine learning-based methods, and deep learning-based methods.

2.1 Rule-based methods

Rule-based correction methods utilize linguistic knowledge and rules to detect and correct errors in input text. The correction of some simple errors is relatively accurate, but for some complex errors, the error correction ability of rule-based methods is insufficient. Typical rule-based error correction methods include editing-distance-based methods, regular-expression-based methods, language model-based methods (Zhao et al. Reference Zhao, Cai, Xin, Wang and Jia2017), syntactic analysis-based methods, dictionary and grammar rule-based methods, and machine translation-based methods. Methods based on editing distance (Cucerzan and Brill, Reference Cucerzan and Brill2004) calculate the editing distance between the incorrect word and the correct word of input text and replace the incorrect word with the correct one with the smallest editing distance. This method can handle some simple spelling errors, but it is not effective for some grammar and semantic errors. Methods based on regular expressions utilize regular expressions to match error patterns of input text and replace them. This method can quickly correct some simple errors, but the effect is not ideal for some complex errors. Methods based on language models (Yuan and Briscoe, Reference Yuan and Briscoe2016) use language models to model input text and then compare the probability of incorrect words and correct words to determine whether error correction is necessary. This type of method can accurately identify and correct some simple errors. Methods based on syntactic analysis utilize syntactic analysis technology to analyze the input text to identify syntactic errors and make corrections. This type of method can handle some complex errors but requires high computational resources and language analysis techniques. Methods based on dictionaries and grammar rules (Ren, Shi, and Zhou, Reference Ren, Shi and Zhou2001) utilize dictionaries and grammar rules to detect and correct errors in input text. This type of method can accurately identify and correct some simple errors, but the effect is not ideal for some complex errors. Methods based on machine translation (Sennrich, Haddow, and Birch, Reference Sennrich, Haddow and Birch2015) utilize machine translation technology to translate erroneous text into other languages, and then retranslate the translation results back to the original language, identifying and correcting errors in input text. This type of method has good results for specific errors. But, in general error situations, the effect is average.

2.2 Machine learning-based methods

With the advancement of machine learning technology, machine learning-based Chinese spelling correction methods have achieved some success, especially Bayesian-based Chinese spelling correction methods. Bayesian methods for correction (Goldwater, Griffiths, and Johnson, Reference Goldwater, Griffiths and Johnson2009) use the Bayesian theorem to model errors of input text and then compare the conditional probabilities of incorrect and correct words to determine whether error correction is necessary. This type of method can accurately identify and correct some simple errors but requires a large text corpus and computational resources. Wang and Liao (Reference Wang and Liao2015) proposed a word vector/conditional random field (CRF)-based detector to detect Chinese spelling errors, especially for papers written by foreign learners. It projects each word in the test sentence onto a high-dimensional vector space to reveal and check their relationship through using CRF. Zhang et al. (Reference Zhang, Xiong, Hou, Zhang and Cheng2015) provided a relatively complete description of a continuous learning process pipeline: preprocessing-alternative generation-answer selection. Li et al. (Reference Li, Zhang, Zheng and Huang2021a) proposed a new method for Chinese spelling correction, which explores the model architecture through automatic machine learning tools based on Bayesian optimization while utilizing ensemble learning and confidence assessment to improve model performance. Bao et al. (Reference Bao, Li and Wang2020) proposed a method mainly based on chunking technology, which divides Chinese text into different chunks and uses a large-scale corpus and knowledge of phonetics and glyphs to correct the chunks to improve the precision of Chinese spelling correction. Li et al. (Reference Li, Zhou, Li, Li, Liu, Sun, Wang, Li, Cao and Zheng2022c) proposed a Chinese spelling check method called EDCPO, which uses an error model to identify spelling errors and determines the best correction suggestion by comparing probability optimization techniques. This method also introduces a historical correction model to utilize previous correction experience and improve the precision of correction.

2.3 Deep learning-based methods

With the success of deep learning, Chinese spelling correction methods based on deep learning have been widely studied. Ji et al. (Ji, Yan, and Qiu, Reference Ji, Yan and Qiu2021) proposed SpellBERT, which integrates pinyin and radical information through graph neural networks and designs pre-training tasks similar to masked language models to enhance character representation. Liu et al. (Reference Liu, Yang, Yue, Zhang and Wang2021) proposed PLOME, combining language understanding and spelling correction, using GRU networks to model the phonetic and visual similarity of characters. Xu et al. (Reference Xu, Li, Zhou, Li, Wang, Cao, Huang and Mao2021) introduced the REALISE spell checker, which directly utilizes multimodal information to predict correct outputs. Sun et al. (Reference Sun, Li, Sun, Meng, Ao, He, Wu and Li2021) proposed ChineseBERT, incorporating glyph and pinyin information into language model pre-training. Huang et al. (Reference Huang, Li, Jiang, Zhang, Chen, Wang and Xiao2021) introduced PHMOSPpell, which uses multimodal information and an adaptive gating mechanism to integrate pinyin and glyph representations. Zhang et al. (Reference Zhang, Pang, Zhang, Wang, He, Sun, Wu and Wang2021) integrated speech features into language model training through powerful pre-training and fine-tuning methods, using adaptive weighted objectives to jointly train error detection and correction networks. Wang et al. (Reference Wang, Wang, Duan and Zhang2021b) effectively detected spelling errors by integrating pinyin and glyph representations from audio and visual information into the pre-trained language model. Dai et al. (Reference Dai, Li, Zhou, Feng, Zhao, Qiu, Li and Tang2022) compared the effects of whole-word masking and character-level masking on spelling correction, finding that whole-word masking performs better when dealing with multiple characters. Liu et al. (Reference Liu, Song, Yue, Yang, Cai, Yu and Sun2022) improved model robustness to contextual noise caused by multiple typing errors by constructing noisy contexts for training samples and introducing a replication mechanism to avoid overcorrection. Lv et al. (Reference Lv, Cao, Geng, Ai, Yan and Fu2023) developed ECSpell spell checker, using an error consistency masking strategy to generate pre-training data. Li et al. (Reference Li, Zhang, Zhang and Yan2024) proposed a universal spelling correction framework that adaptively learns semantic knowledge through metric learning and employs a replication mechanism to maintain low-frequency correct expressions. Mao et al. (Reference Mao, Shan, Li, Chen and Zhang2023) introduced CLSpell, which combines phonetic and glyph information through multi-task joint learning to obtain both local and global information. Su et al. (Reference Su, Lin, Xie and Cheng2023) proposed CCCSpell, enhancing model performance through consistency and contrastive learning. Cao et al. (Reference Cao, He, Wu and Dai2023) designed a heterogeneous knowledge injection framework, enhancing spelling correction performance by introducing implicit hierarchical language knowledge through a Gaussian mixture model-driven auxiliary task strategy. He et al. (Reference He, Zhu, Wang and Xu2023) proposed UMRSpell, which flexibly handled missing, redundant, and spelling errors through multitask learning. Wei et al. (Reference Wei, Huang, Li, Liu and Yan2024) introduced a method that combines raw and masked inputs, establishing consistency using Kullback-Leibler divergence to reduce the impact of spelling errors on correction performance.

3. Task definition and approach

3.1 Task definition

Chinese spelling correction refers to use of natural language processing technology to identify and correct incorrect spelling errors in Chinese text, thereby improving the accuracy and readability of the text. In Chinese text, typos refer to the errors in input, nonstandard handwriting, and improper word selection using pinyin input methods. Given input sentence including $n$ characters $X = (x_1,x_2,x_3, \cdots, x_n)$ , the target is to convert it into another sequence of the same length $Y = (y_1,y_2,y_3, \cdots, y_n)$ . The incorrect characters in $ X$ are replaced by correct characters to form a new character sequence $ Y$ .

3.2 Approach

We propose a Chinese spelling correction model based on LSTM-enhanced Transformer and dynamic adaptive weighted multi-task Learning. In this section, we first introduce the model architecture, followed by a discussion of the specific details of the model.

3.2.1 Model architecture

The model architecture is shown in Figure 1. The model uses human-annotated datasets to provide error information and correct characters during training. Additionally, it employs some automatically generated data to enhance the diversity and coverage of training set. The model is divided into detection network and correction network. Run detection and correction networks in parallel to preserve important features and minimize misleading contextual impacts. The late fusion strategy used in the parallel approach ensures that correction network can effectively utilize visual and phonological features of the input characters while being informed by the detection network’s output, leading to better overall performance. Detection network is used to detect errors in input sequences. Detection network employs LSTM-enhanced Transformer encoder to encode input sentences. LSTM processes each token in the sequence recursively, enabling it to capture local subtle semantic relationships of words and sequential information within input sequence more precisely. Based on adaptive dynamic weight coefficients, the weight of multi-task learning is automatically adjusted to help the model better balance the learning process between detection network and correction network, leading to faster convergence and higher performance.

Figure 1. Chinese spelling correction based on LSTM-enhanced Transformer and dynamic adaptive weighted multi-task Learning.

3.2.2 Detection network

The detection network is mainly used to calculate probability of character errors at each position of the input sentence. Firstly, for text input sequences containing $ n$ characters $x = \{ x_1,x_2,x_3,x_4, \cdots, x_n\}$ , for example, in correction network, fed [sep] into the BERT-based word embedding module to obtain the word embedding representation $ \{ {e_1},{e_2},{e_3},\ldots, {e_n}\}$ . Then it is sent to LSTM-enhanced Transformer encoder.

Specifically, based on the global modeling capability of Transformer, it encodes input sequence embedding $ \{ {e_1},{e_2},{e_3},\ldots, {e_n}\}$ of Chinese characters. Multi-head attention and FC&Norm layers represent multi-head self-attention and feed-forward networks, respectively, which are the basic components of Transformer. Deep-level feature modeling is performed to obtain richer feature representations. Furthermore, LSTM is used to optimize the text representation of input sequence, finely capturing local sequential information of input sentences, and better locating the erroneous information.

Represent the last layer of the LSTM-enhanced Transformer encoder as ${G^d} = \{ g_1^d,g_2^d,g_3^d,\ldots, g_n^d\}$ . $ G$ is used not only to capture local subtle semantic relationships of words and sequential information of misspelled characters but also to transfer the information of corresponding characters to correction network. The calculation formula of $ G$ is as follows:

(1)

\begin{equation} {G^d} = LSTM(TransformerBlock({e_1},{e_2},{e_3},\ldots, {e_n})) = \{ g_1^d,g_2^d,g_3^d,\ldots, g_n^d\} \end{equation}

$ {G^d}$ is used to predict the positions of the misspelled characters and deliver position information to correction network. In detail, we deploy a fully connected layer as the output layer and use nonlinear function sigmoid to determine whether an error has occurred. For each character in original input text, the probability of error detection is calculated as follows:

(2)

\begin{equation} P_d(g_i=1|X)=sigmoid(gh_i^d+b) \end{equation}

where $ {P_d}({g_i} = 1|X)$ is conditional probability, which represents how likely the character corresponding to $g_i^d$ is misspelled, and $ W$ and $ b$ is a parameter of the dense layer.

3.2.3 Correction network

Correction network aims to correct spelling errors of input characters. For the text sequence that needs to be corrected, the model first divides the corresponding text into words, such as “”, and then represents it as $x = \{ x_1,x_2,x_3,x_4, \cdots, x_n\}$ according to the segmented text sequence. Fed it into the multi-level Transformer encoder of BERT to obtain the final representation $ {H^c} = [h_1^c,h_2^c,\ldots \ldots, h_n^c]$ . Integrate the local subtle semantic relationships of words and global information of detection network into correction network. The dimensions of last hidden layer of the detection network and the correction network are set to be the same, and they are directly added to obtain the fused representation $F$ of the vector.

(3)

\begin{equation} F = G + H \end{equation}

where $ G$ is hidden state from final layer of LSTM-enhanced Transformer-based detection network, and $ H$ is hidden state from final layer of BERT-based correction network. Finally, for correction tasks, consider it a multi-classification task (Zhu et al. Reference Zhu, Ying, Zhang and Mao2022). If the character at a certain position in the input is correct, the final representation should be very similar to the input character. However, if a character in a position is incorrect, the final representation should be similar to the word embedding of corrected character. The classifying task is defined as follows:

(4)

\begin{equation} {P_c}({y_i}|X) = softmax(W{F_i}) \end{equation}

where $ X$ is the input, $ {y_i}$ is the correct character corresponding to input, and $ W$ is the network parameter.

3.2.4 Dynamic weighted adaptive training for multi-task learning

The traditional loss of multi-task learning $ L_{sum}$ for Chinese spelling correction is as follows:

(5)

\begin{align} L_{sum}=\lambda L_c+(1-\lambda )L_d \end{align}

(6)

\begin{align} {L_d} = - \sum \limits _{i = 1}^n {\log {p_d}} ({g_i}|x) \end{align}

(7)

\begin{align} {L_c} = - \sum \limits _{i = 1}^n {\log {p_c}} ({y_i}|x) \\[7pt]\nonumber\end{align}

where $ {L_c}$ is the loss of correction network, $ {L_d}$ is the loss of detection network, and $ \lambda$ is the weight which is used to balance two tasks. This weighting method is mainly used to reduce the impact of one task on another task during backpropagation during model training or to make one task more effective in updating parameters during backpropagation than another task. Therefore, the weight relationship between the loss of detection network and correction network is extremely important. If the weight is manually modified and corresponding tests are conducted every time, the time cost and configuration cost are both high.

So, this paper applies dynamic weighted adaptive learning for multi-task training. Considering that the detection network and correction network are based on different tasks from the same dataset, inspired by the adaptive loss dynamic weighting previously applied in the image field (Liang, Hu, and Feng, Reference Liang, Hu and Feng2021; Li et al. Reference Li, Yuan, Chen, Wang and Vasconcelos2021c), the dynamic adaptive weight adjustment is applied to spelling correction tasks. We use the comprehensive loss function as follows:

(8)

\begin{equation} L_{sum}(x,y,{y^{\prime}};\,w) = \frac {1}{{2 \cdot c_1^2}} \cdot {L_c}(x,{y_1},y_1^{\prime};\,{w_1}) + \ln (1 + c_1^2) + \frac {1}{{2 \cdot c_2^2}} \cdot {L_d}(x,{y_2},y_2^{\prime};\,{w_2}) + \ln (1 + c_2^2) \end{equation}

where $ {L_c}$ is the loss of correction tasks, $ {L_D}$ is the loss of the detection task, $ {c_1}$ and $ {c_2}$ are the dynamic weighting coefficient of the error detection network and error correction network, $ \ln (1 + c_1^2)$ and $ \ln (1 + c_2^2)$ are regularization terms. Regularization terms are typically used to prevent model overfitting. By adding regularization terms to the loss function, the complexity of the model is constrained, preventing it from overfitting the training data during the training process. Specifically, these terms take the form $ \ln (1 + c_1^2)$ and $ \ln (1 + c_2^2)$ , which regularize the parameters $ {c_1}$ and $ {c_2}$ respectively. The regularization terms $\ln (1 + c_1^2)$ and $\ln (1 + c_2^2)$ use a logarithmic function to ensure that the regularization terms are always positive. When the parameter values are small, the value of the regularization terms is also small. As the parameter values increase, the value of the regularization terms gradually increases, effectively limiting the growth of the parameters. Compared to traditional $ {L_1}$ or $ {L_2}$ regularization, this form of regularization might provide better stability and constraint effects during the model’s training process. Here, $x$ is input, $ {y_1}$ is the true value of correction network, $ y_1^{\prime}$ is the predicted value of correction network, $ {y_2}$ is the true value of detection network, $ y_2^{\prime}$ the predicted value of detection network, $ {w_1}$ and $ {w_2}$ is the parameter of the model. Adaptively optimize model training and stability based on dynamic weighting.

4. Experimental design and result analysis

4.1 Dataset introduction

The training set mainly used data samples from different fields such as news, blogs, and social media, including 271K samples proposed by Wang et al. (Reference Wang, Song, Li, Han and Zhang2018) and SIGHAN 13-15 Chinese correction dataset. The test set also used Chinese sentences from SIGHAN 13-15. The statistics of the dataset are shown in Table 2.

Table 2. Dataset statistics

4.2 Training details and parameter settings

The entire training process includes two stages. First, use almost all 271K training data to train the model. Secondly, fine-tune the model on SIGHAN13-15 training data. The specific settings of experimental parameters are as follows in Table 3.

Table 3. Experimental parameters

4.3 Compared methods

Hybrid (Wang et al. Reference Wang, Song, Li, Han and Zhang2018): Hybrid proposes a new method to construct a CSC corpus with automatically generated spelling errors that are visually or phonetically similar, corresponding to OCR and ASR-based methods, respectively.

FASpell (Hong et al. Reference Hong, Yu, He, Liu and Liu2019): FASpell is based on DAE and decoder. This model uses a relatively simple structure to achieve strong error detection and correction capabilities. It utilizes unsupervised pre-trained masked LM to reduce the amount of Chinese spelling check data required for supervised learning. Moreover, the decoder enhances the flexibility of the model and the ability to utilize significant features of Chinese character similarity.

SpellGCN (Cheng et al. Reference Cheng, Xu, Chen, Jiang, Wang, Wang, Chu and Qi2020): SpellGCN uses a specialized graph convolutional network to incorporate speech and visual similarity knowledge into the language model of CSC. This model constructs characters into a graph and generates vector representations of similar character interactions. SpellGCN combines graph representation with BERT and utilizes similarity knowledge to achieve correct correction.

SCOPE (Li et al. Reference Li, Wang, Mao, Guo, Yang and Zhang2022a): SCOPE enhances the performance of CSC by introducing an auxiliary task of fine-grained character pronunciation prediction and employing an adaptive task weighting mechanism.

REALISE (Xu et al. Reference Xu, Li, Zhou, Li, Wang, Cao, Huang and Mao2021): REALISE improves the precision of Chinese spelling check by using a multimodal approach that combines the semantic, phonetic, and graphical information of Chinese characters. This not only allows for better identification of easily confusable characters but also enables generalization across a broader range of similar character relationships.

ECOPO (Li et al. Reference Li, Zhou, Li, Li, Liu, Sun, Wang, Li, Cao and Zheng2022c): By optimizing the knowledge representation of Pre-trained Language Models(PLMs), the model is guided to avoid predicting common but incorrect characters, aiming to narrow the gap between the knowledge learned by PLMs and the goals of the CSC task. ECOPO improves the performance of the CSC task by introducing an error-driven contrastive probability optimization mechanism, teaching the model to learn and improve from past mistakes.

LEAD (Li et al. Reference Li, Ma, Zhou, Li, Li, Huang, Liu, Li, Yunbo and Zheng2022b): LEAD enhances the performance of CSC model by introducing heterogeneous knowledge from dictionaries (including phonetic, visual, and definitional information) to guide the fine-tuning of the CSC model. It employs a unified contrastive learning-based training scheme to optimize the representations of the CSC model, thereby improving precision.

DR-CSC (Huang et al. Reference Huang, Ye, Zhou, Li, Li, Zhou and Zheng2023): DR-CSC provides a new approach for Chinese spelling check by designing an easy-to-use and effective detection and inference module. It addresses the issues of interpretability and efficiency when incorporating external expert knowledge in existing models.

GAD (Guo et al. Reference Guo, Ni, Wang, Zhu and Xie2021): GAD models global relationship between potential correct input characters and potential incorrect character candidates. It designs confusion set-oriented replacement strategy and models global contextual to mitigate the impact of local incorrect context.

DCN (Wang et al. Reference Wang, Che, Wu, Wang, Hu and Liu2021a): DCN generates candidate Chinese characters through pinyin enhanced generator, and then models the dependency relationship between two adjacent Chinese characters using an attention-based network to improve the performance of CSC.

BERT (Devlin et al. Reference Devlin, Chang, Lee and Toutanova2019): The settings are the same as MDCSpell (Zhu et al. Reference Zhu, Ying, Zhang and Mao2022).

Soft-masked BERT (Zhang et al. Reference Zhang, Huang, Liu and Li2020): Soft-masked BERT consists of an error detection network and an error correction network. The detection network and error correction network are connected through soft-mapping technology.

MDCSpell (Zhu et al. Reference Zhu, Ying, Zhang and Mao2022): MDCSpell is a new universal detector-corrector multitasking framework. The corrector uses post fusion strategy to fuse the hidden state of the corrector and the hidden state of the detector to minimize the misleading impact of misspelled characters.

UMRSpell (He et al. Reference He, Zhu, Wang and Xu2023): UMRSpell uses a detection converter to simultaneously learn the self-attention matrix for detecting and correcting partial tasks from a multi-task learning perspective, and flexibly handles missing, redundant, and spelling errors through re-labeling rules.

MCRSpell (Li et al. Reference Li, Zhang, Zhang and Yan2024): MCRSpell is a new Chinese spelling correction universal framework. This framework utilizes metric learning to adaptively learn semantic knowledge from multiple intermediate features, without using any data augmentation techniques. It only provides input text and target text from parallel corpora to the model separately for spelling error correction tasks. In addition, design a replication mechanism to maintain low-frequency correct expressions and avoid overcorrection. Previous work has mainly focused on data recognition errors based on local context while neglecting the importance of sentence-level information.

FSpell (Wei et al. Reference Wei, Huang, Li, Liu and Yan2024): FSpell simultaneously utilizes both original input and masked input and learns authentic sentence semantic information that is not affected by spelling errors. Use the original input to prevent the loss of input information caused by masking operations. Finally, by maintaining consistency between the two output distributions, comprehensive input information can be effectively obtained to reduce the impact of spelling errors.

4.4 Evaluation metrics

We take sentence-level precision, recall, and F1 score (Hong et al. Reference Hong, Yu, He, Liu and Liu2019; Zhu et al. Reference Zhu, Ying, Zhang and Mao2022) as performance indicators of Chinese spelling correction. We believe that only when all errors in a sentence have been correctly corrected can the sentence be considered correctly corrected. Precision refers to the proportion of positive predictions that are actually correct. Recall refers to the proportion of actual positives that are correctly predicted by the model. F1 score is the harmonic mean of precision and recall, used to comprehensively evaluate the model’s performance.

(9)

\begin{align} \text {Precision} = \frac {TP}{TP + FP}\\[-8pt]\nonumber \end{align}

(10)

\begin{align} \text {Recall} = \frac {TP}{TP + FN}\\[-8pt]\nonumber \end{align}

(11)

\begin{align} F_1 = 2 \times \frac {\text {Precision} \times \text {Recall}}{\text {Precision} + \text {Recall}} \\[9pt]\nonumber\end{align}

TP (True Positive) is the number of inputs with spelling errors that are correctly identified by the spelling checker. FP (False Positive) is the number of inputs incorrectly identified as containing spelling errors. TN (True Negative) is the number of inputs correctly identified as having no spelling errors. FN (False Negative) is the number of inputs with spelling errors that were not detected. Where the evaluation data for the correction phase is based on the actual (non-optimal) output of the detection model. The correction task can only be evaluated if the detection model successfully identifies an error. If the detection model fails to identify an error, then even a correct correction provided by the correction model will not be counted in the evaluation.

4.5 Experimental results and analysis

Experimental results are shown in Tables 4–6. Our proposed method achieves the best performance on three datasets compared to other baselines. In baseline approach, MDCSpell is similar in framework to the model we proposed. DR-CSC achieved the best performance among the baselines.

Table 4. Experimental parameters

For SIGHAN13 dataset, ours achieves the highest F1 score. For detection network, both DR-CSC (88.5) and MDCSpell (89.1) outperform ours (85.2) in precision, indicating they are more accurate in detecting errors. Ours (88.5) outperforms DR-CSC (83.7) and MDCSpell (78.3) in recall, indicating ours detects more actual errors. Ours (86.8) slightly outperforms DR-CSC (86.0) and significantly outperforms MDCSpell (83.4) in F1 score, showing better overall performance in detection. For correction network, both DR-CSC (87.7) and MDCSpell (87.5) outperform ours (84.4) in precision, indicating they are more accurate in correcting errors. Ours (87.7) outperforms DR-CSC (83.0) and MDCSpell (76.8) in recall, indicating ours corrects more actual errors. Ours (86.0) slightly outperforms DR-CSC (85.3) and significantly outperforms MDCSpell (81.8) in F1 score, showing better overall performance in correction. In conclusion, while DR-CSC and MDCSpell have higher precision in both detection and correction tasks, ours has better recall and overall F1 score, demonstrating better overall performance.

For SIGHAN14 dataset, ours achieves the highest F1 score. For detection network, ours (76.2) outperforms both DR-CSC (70.2) and MDCSpell (70.2) in precision, indicating a higher precision in detecting errors. Ours (78.5) outperforms both DR-CSC (73.3) and MDCSpell (68.8) in recall, indicating it detects more actual errors. Ours (77.3) outperforms both DR-CSC (71.7) and MDCSpell (69.5) in F1 score, showing better overall performance in detection. For correction network, ours (74.7) outperforms both DR-CSC (69.3) and MDCSpell (69.0) in precision, indicating a higher precision in correcting errors. Ours (76.9) outperforms both DR-CSC (72.3) and MDCSpell (67.7) in recall, indicating it corrects more actual errors. Ours (75.8) outperforms both DR-CSC (70.7) and MDCSpell (68.3) in F1 score, showing better overall performance in correction. In conclusion, our model (ours) demonstrates superior performance compared to both DR-CSC and MDCSpell in both detection and correction tasks on the SIGHAN14 dataset, with higher precision, recall, and F1 scores, indicating a better balance and overall effectiveness in spelling error detection and correction.

For SIGHAN15 dataset, ours still achieves the highest F1 score. For detection network, DR-CSC (82.9) slightly outperforms ours (82.3) in precision, indicating a higher precision in detecting errors. MDCSpell (80.8) is slightly lower in precision than ours. Ours (86.8) outperforms both DR-CSC (84.8) and MDCSpell (80.6) in recall, indicating it detects more actual errors. Ours (84.5) outperforms both DR-CSC (83.8) and MDCSpell (80.7) in F1 score, showing better overall performance in detection. For correction network, DR-CSC (80.3) slightly outperforms ours (79.8), indicating a higher precision in correcting errors. MDCSpell (78.4) is slightly lower in precision than ours. Ours (84.1) outperforms both DR-CSC (82.3) and MDCSpell (78.2) in recall, indicating it can correct more actual errors. Ours (81.9) outperforms both DR-CSC (81.3) and MDCSpell (78.3) in F1 score, showing better overall performance in correction. In conclusion, while DR-CSC has slightly better precision in both detection and correction tasks, ours demonstrates superior recall and F1 scores, indicating better overall performance.

Table 5. The experimental results on SIGHAN14

Meanwhile, we provide the performance of the proposed method under different epochs on SIGHAN13-15 dataset, which are as shown in Figures 1–4. From the figure, it can be seen that the proposed model performs the best on SIGHAN13 dataset at epoch = 4. The model performs best on SIGHAN14 dataset and SIGHAN15 dataset when epoch = 5. The convergence speed of the model is very fast, indicating that the proposed method can help the model quickly find the optimal parameters to improve the performance of correct Chinese spelling correction.

The best results of our proposed model correspond to adaptive weighting parameters $ {c_1}$ and $ {c_2}$ are both 0.5. In 8, it involves two parts: one is the weighted loss of the reciprocal of the parameter squared, and the other is regularization term. This regularization method can balance the size of the parameters, adjusting the model parameters to avoid overfitting, thereby making the model more stable during training.

4.6 Ablation study

To further verify the effectiveness of our proposed method, we conducted ablation experiments. We have set up three compared methods, namely ours, ours_ loss, and ours_LSTM. Ours is our proposed method which uses LSTM to recursively process each token in the sequence, capturing the local context and sequential information within the sequence. At the same time, ours designs multi-task learning method based on adaptive weight coefficient adjustment to improve the loss function of Chinese spelling correction tasks and to help the model better balance the learning process between error detection and correction modules. Ours_ LSTM is the method that does not use adaptive weight coefficient adjustment to improve the loss function. Ours_loss is the method that only encodes input sentences using Transformer in the detection network. The experimental results are shown in Table 7.

Table 6. The experimental results on SIGHAN15

Figure 2. Performance of the proposed method under different epochs on SIGHAN13.

Compared to ours_loss and ours_LSTM, ours has shown the best F1 score on SIGHAN13, SIGHAN14, and SIGHAN15 datasets. Moreover, the convergence speed is fast. Analyzing the reasons, ours integrates the advantages of LSTM and Transformer. The LSTM-enhanced Transformer part can better capture long-range dependencies and local details. Through the adaptive multi-task weighting mechanism, the model dynamically adjusts the weights of different tasks (detection and correction) to better balance these two tasks at different stages, thereby improving overall performance. The adaptive weighting mechanism enables the model to dynamically adjust the weights of different tasks during the training process, quickly finding the optimal learning path and accelerating the convergence speed. The feature extraction method combining LSTM and Transformer allows the model to efficiently extract and utilize feature information in fewer epochs. Although ours_LSTM uses an adaptive multi-task weighting mechanism, it may not balance different tasks as well as ours in its implementation, which may lead to suboptimal performance. Training reaches the best results at the 7th epoch, and its convergence speed is slightly slower than ours, possibly because the weight adjustment strategy requires more time to optimize to the best state. Ours_loss lacks the LSTM layer, so the model may not capture local details as well as ours. Although its performance on some datasets is close to ours, overall, it lags slightly in performance due to not utilizing the enhanced features of LSTM.

Table 7. Ablation experiment results

Figure 3. Performance of the proposed method under different epochs on SIGHAN14.

Figure 4. Performance of the proposed method under different epochs on SIGHAN15.

In summary, the proposed model performs the best in Chinese spelling correction tasks mainly because it integrates the advantages of LSTM and Transformer and adopts an adaptive multi-task weighting mechanism, allowing the model to better balance different tasks and converge faster. The performance of ours_LSTM and ours_loss is not as good as ours primarily because they are not as effective in weight adjustment strategies and feature extraction capabilities as ours.

4.7 Case study

In order to visually verify the effectiveness of the proposed method, this section conducts case study through several typical cases. Our method is structurally similar to MDCSpell, and DR-CSC is the best-performing correction model among the baselines. So, we compare our method with MDCSpell and DR-CSC. At the same time, in order to verify the correction ability of the currently popular LLMs, we also simply use CHATGPT for Chinese spelling correction. Our prompt to ChatGPT is: “The following sentence contains spelling errors or typos. Please correct it without changing the number of characters in the sentence." Given input text, the correction results of MDCSpell, DR-CSC, ChatGPT, and our proposed method are shown in Table 8.

Table 8. Examples of Chinese spelling correction (CSC) of MDCSpell, DR-CSC, ChatGPT, and our proposed method

In the input sentence, incorrect characters are highlighted in bold black. The words highlighted in blue in “Correct Input” are the correct ones, and they are misspelled in input. In various correction methods, incorrect characters are corrected and highlighted in bold red. Wrong characters that have not been corrected or have been corrected are highlighted in bold green.

From Table 8, it can be seen that the proposed method can correctly correct all errors in two input sentence. Although MDCSpell can correct some errors, it does not correct all errors in the sentence. For example, MDCSpell cannot detect spelling errors in “” in the first sentence and “” in the second sentence.DR-CSC can detect and correct all errors in the second sentence through contextual reasoning. However, in the first sentence, DR-CSC does not detect the spelling errors “” and “”. Although ChatGPT can also correct some errors, there is still a problem of insufficient error correction ability. For the first sentence, ChatGPT does not detect the spelling error “”. This indicates that the error correction ability of ChatGPT is not as good as that of small models without background knowledge.

Compared to other models, our proposed method not only corrects the errors that other models can correct but also accurately identifies and corrects errors that other models cannot detect. Our model can more finely capture local context and sequential information. Specifically, in the first sentence, the distinction between “” and “” depends on the logical relationship in the context, such as “”. The distinction between “” and “” relies on the overall semantics of the surrounding text. For the second sentence, “” should be “” and this correction requires understanding the logical relationship and context within the sentence. LSTM can recursively process each character in the sequence, capturing local semantic relationships and thus identifying these subtle errors. While the dynamic adaptive weighted multi-task learning framework dynamically adjusts the weights of the detection and correction tasks, allowing the model to better balance the learning process between these two tasks, thereby achieving higher correction performance.

Through the above cases, we discuss the advantages of the LSTM-enhanced Transformer from the following three aspects. 1) Sensitivity to Local Sequential Information: While transformers rely on a global attention mechanism that can capture dependencies across an entire input sequence, this often comes at the cost of sensitivity to short-term, localized dependencies. LSTMs process each token sequentially, retaining information through hidden states at each time step, which helps capture local dependencies effectively. For example, in a phrase where similar-sounding characters appear (e.g., “” vs. “”), LSTM can help maintain context-specific details through its hidden states, offering more nuanced local context processing than a transformer alone. 2) Combining Global and Local Context: Transformer provides a global context by attending to all tokens simultaneously, while the LSTM reinforces sequential connections between adjacent tokens, making the model better at resolving ambiguities that depend on word order and proximity. This is essential in cases where context affects word meaning on a localized basis. For instance, in phrases like “” (should be “”), the sequence processing ability of LSTM helps the model disambiguate characters based on immediate neighbors. 3) Handling Long-Distance Dependencies without Losing Local Context: The model leverages the ability to capture long-range dependencies of Transformer while LSTM preserves local context within those spans. This hybrid approach mitigates the limitations of global-only or local-only methods, effectively enhancing performance on tasks that require both.

5. Conclusion

In this work, we proposed a Chinese spelling correction model that effectively combines LSTM-enhanced error detection with BERT-based correction through a dynamic adaptive weighting mechanism. Through extensive evaluations, we demonstrated the model’s ability to address visually and phonetically similar errors, providing robust results on challenging test cases. However, the model’s performance varies based on error type and sentence complexity, suggesting that further refinement of weighting parameters could enhance adaptability to diverse linguistic contexts. Future research could explore expanding this approach to incorporate semantic-enhanced correction, potentially improving the model’s effectiveness in real-world applications.

Acknowledgments

This work is supported by National Science and Technology Major Project (2020AAA0109703), Joint Fund Key Program of the National Natural Science Foundation of China (U23B2029), National Natural Science Foundation of China (62076167), Yuxiu Innovation Project of NCUT (2024NCUTYXCX102).

References

Bao, Z., Li, C. and Wang, R. (2020). Chunk-based chinese spelling check with global optimization. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 2031–2040.CrossRef Google Scholar

Cao, Y., He, L., Wu, Z. and Dai, X. (2023). Make bert-based chinese spelling check model enhanced by layerwise attention and gaussian mixture model. In 2023 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–9.Google Scholar

Cheng, X., Xu, W., Chen, K., Jiang, S., Wang, F., Wang, T., Chu, W. and Qi, Y. (2020). Spellgcn: Incorporating phonological and visual similarities into language models for chinese spelling check. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 871–881.CrossRef Google Scholar

Cucerzan, S. and Brill, E. (2004). Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 293–300.Google Scholar

Dai, Y., Li, L., Zhou, C., Feng, Z., Zhao, E., Qiu, X., Li, P. and Tang, D. (2022). Is whole word masking always better for chinese bert?: Probing on chinese grammatical error correction . In Proceedings of the Association for Computational Linguistics, pp. 1–8.Google Scholar

Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding . In Proceedings of NAACL-HLT, pp. 4171–4186.Google Scholar

Goldwater, S., Griffiths, T. L. and Johnson, M. (2009). A bayesian framework for word segmentation: exploring the effects of context. Cognition 112(1), 21–54.CrossRef Google Scholar PubMed

Guo, Z., Ni, Y., Wang, K., Zhu, W. and Xie, G. (2021). Global attention decoder for chinese spelling error correction. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 1419–1428,CrossRef Google Scholar

He, Z., Zhu, Y., Wang, L. and Xu, L. (2023). Umrspell: Unifying the detection and correction parts of pre-trained models towards chinese missing, redundant, and spelling correction . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 10238–10250.CrossRef Google Scholar

Hong, Y., Yu, X., He, N., Liu, N. and Liu, J. (2019). Faspell: A fast, adaptable, simple, powerful chinese spell checker based on dae-decoder paradigm . In Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pp. 160–169.Google Scholar

Huang, H., Ye, J., Zhou, Q., Li, Y., Li, Y. L., Zhou, F. and Zheng, H.-T. (2023). A frustratingly easy plug-and-play detection-and-reasoning module for chinese spelling check . In Proceedings of the Association for Computational Linguistics, pp. 11514–11525.Google Scholar

Huang, L., Li, J., Jiang, W., Zhang, Z., Chen, M., Wang, S. and Xiao, J. (2021). Phmospell: Phonological and morphological knowledge guided chinese spelling check . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 5958–5967.CrossRef Google Scholar

Ji, T., Yan, H. and Qiu, X. (2021). Spellbert: A lightweight pretrained model for chinese spelling check . In Proceedings of the 2021 conference on empirical methods in natural language processing, pp. 3544–3551.CrossRef Google Scholar

Li, C., Zhang, C., Zheng, X. and Huang, X. (2021a). Exploration and exploitation: Two ways to improve chinese spelling correction models . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 441–446.CrossRef Google Scholar

Li, C., Zhang, M., Zhang, X. and Yan, Y. (2024). Mcrspell: a metric learning of correct representation for chinese spelling correction. Expert Systems with Applications 237, 121513.CrossRef Google Scholar

Li, J., Wang, Q., Mao, Z., Guo, J., Yang, Y. and Zhang, Y. (2022a). Improving chinese spelling check by character pronunciation prediction: The effects of adaptivity and granularity . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 4275–4286.CrossRef Google Scholar

Li, J., Wu, G., Yin, D., Wang, H. and Wang, Y. (2021b). Dcspell: A detector-corrector framework for chinese spelling error correction . In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1870–1874.CrossRef Google Scholar

Li, S., Xie, M., Gong, K., Liu, C. H., Wang, Y. and Li, W. (2021c). Transferable semantic augmentation for domain adaptation . In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11516–11525.CrossRef Google Scholar

Li, Y., Ma, S., Zhou, Q. Z., Li, Z., Li, L. Y., Huang, S. H., Liu, R., Li, C., Yunbo, C. and Zheng, H.-T. (2022b). Learning from the dictionary: Heterogeneous knowledge guided fine-tuning for chinese spell checking . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 1–12.CrossRef Google Scholar

Li, Y., Yuan, L., Chen, Y., Wang, P. and Vasconcelos, N. (2021d). Dynamic transferDynamic transfer for multi-source domain adaptation . In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10998–11007.CrossRef Google Scholar

Li, Y., Zhou, Q., Li, Y., Li, Z., Liu, R., Sun, R., Wang, Z., Li, C., Cao, Y. and Zheng, H.-T. (2022c). The past mistake is the future wisdom: Error-driven contrastive probability optimization for chinese spell checking . In Proceedings of the Association for Computational Linguistics, pp. 3202–3213.CrossRef Google Scholar

Liang, J., Hu, D. and Feng, J. (2021). Domain adaptation with auxiliary target domain-oriented classifier . In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16632–16642.CrossRef Google Scholar

Liu, C.-L., Lai, M.-H., Chuang, Y.-H. and Lee, C.-Y. (2010). Visually and phonologically similar characters in incorrect simplified chinese words. In Coling 2010: Posters, pp. 739–747.Google Scholar

Liu, S., Song, S., Yue, T., Yang, T., Cai, H., Yu, T. and Sun, S. (2022). Craspell: a contextual typo robust approach to improve chinese spelling correction. In Findings of the Association for Computational Linguistics: ACL 2022, pp. 3008–3018.Google Scholar

Liu, S., Yang, T., Yue, T., Zhang, F. and Wang, D. (2021). Plome: Pre-training with misspelled knowledge for chinese spelling correction . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2991–3000.CrossRef Google Scholar

Lv, Q., Cao, Z., Geng, L., Ai, C., Yan, X. and Fu, G. (2023). General and domain-adaptive chinese spelling check with error-consistent pretraining. ACM Transactions On Asian and Low-Resource Language Information Processing 22(5), 1–18.Google Scholar

Mao, X., Shan, Y., Li, F., Chen, X. and Zhang, S. (2023). Clspell: contrastive learning with phonological and visual knowledge for chinese spelling check. Neurocomputing 554, 126468.Google Scholar

Pascanu, R., Mikolov, T. and Bengio, Y. (2013). On the difficulty of training recurrent neural networks . In International conference on machine learning. In Proceedings of NAACL-HLT, pp. 1310–1318.Google Scholar

Ren, F., Shi, H. and Zhou, Q. (2001). A hybrid approach to automatic chinese text checking and error correction. In 2001 IEEE International Conference On Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat. No. 01CH37236). 3, pp.1693–1698, IEEE.Google Scholar

Sennrich, R., Haddow, B. and Birch, A. (2015). Neural machine translation of rare words with subword units . In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1715–1725.Google Scholar

Su, J., Lin, X., Xie, Y. and Cheng, Z. (2023). Cccspell: A consistent and contrastive learning approach with character similarity for chinese spelling check . In 2023 International Joint Conference on Neural Networks (IJCNN), IEEE, pp. 1–8.Google Scholar

Sun, Z., Li, X., Sun, X., Meng, Y., Ao, X., He, Q., Wu, F. and Li, J. (2021). Chinesebert: Chinese pretraining enhanced by glyph and pinyin information . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 2065–2075.Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L. N., Gomez, A., and Kaiser, U. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008.Google Scholar

Wang, B., Che, W., Wu, D., Wang, S., Hu, G. and Liu, T. (2021a). Dynamic connected networks for chinese spelling check. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2437–2446.Google Scholar

Wang, D., Song, Y., Li, J., Han, J. and Zhang, H. (2018). A hybrid approach to automatic corpus generation for chinese spelling check . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2517–2527.CrossRef Google Scholar

Wang, H., Wang, B., Duan, J. and Zhang, J. (2021b). Chinese spelling error detection using a fusion lattice lstm. Transactions On Asian and Low-Resource Language Information Processing 20(2), 1–11.Google Scholar

Wang, Y.-R. and Liao, Y.-F. (2015). Word vector/conditional random field-based chinese spelling error detection for sighan-2015 evaluation . In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pp. 46–49.CrossRef Google Scholar

Wei, C., Huang, S., Li, R., Liu, Y. and Yan, N. (2024). A fusion scheme for eliminating input interference induced by spelling errors. Engineering Applications of Artificial Intelligence 127, 107341.Google Scholar

Xu, H.-D., Li, Z., Zhou, Q., Li, C., Wang, Z., Cao, Y., Huang, H. and Mao, X.-L. (2021). Read, listen, and see: Leveraging multimodal information helps chinese spell checking . In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics.CrossRef Google Scholar

Yuan, Z. and Briscoe, T. (2016). Grammatical error correction using neural machine translation . In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 380–386.CrossRef Google Scholar

Zhang, R., Pang, C., Zhang, C., Wang, S., He, Z., Sun, Y., Wu, H. and Wang, H. (2021). Correcting chinese spelling errors with phonetic pre-training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2250–2261.CrossRef Google Scholar

Zhang, S., Huang, H., Liu, J. and Li, H. (2020). Spelling error correction with soft-masked bert . In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 882–890.Google Scholar

Zhang, S., Xiong, J., Hou, J., Zhang, Q. and Cheng, X. (2015). Hanspeller++: A unified framework for chinese spelling correction . In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pp. 38–45.Google Scholar

Zhao, H., Cai, D., Xin, Y., Wang, Y. and Jia, Z. (2017). A hybrid model for chinese spelling check. ACM Transactions On Asian and Low-Resource Language Information Processing (TALLIP). 16(3), 1–22.Google Scholar

Zhu, C., Ying, Z., Zhang, B. and Mao, F. (2022). Mdcspell: a multi-task detector-corrector framework for chinese spelling correction . In Findings of the Association for Computational Linguistics: ACL 2022, pp. 1244–1253.CrossRef Google Scholar