Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-23T01:04:23.450Z Has data issue: false hasContentIssue false

Automated annotation of parallel bible corpora with cross-lingual semantic concordance

Published online by Cambridge University Press:  25 January 2024

Jens Dörpinghaus*
Affiliation:
Federal Institute for Vocational Education and Training (BIBB), Bonn, Germany University of Bonn, Bonn, Germany University of Koblenz, Mainz, Germany
Rights & Permissions [Opens in a new window]

Abstract

Here we present an improved approach for automated annotation of New Testament corpora with cross-lingual semantic concordance based on Strong’s numbers. Based on already annotated texts, they provide references to the original Greek words. Since scientific editions and translations of biblical texts are often not available for scientific purposes and are rarely freely available, there is a lack of up-to-date training data. In addition, since annotation, curation, and quality control of alignments between these texts are expensive, there is a lack of available biblical resources for scholars. We present two improved approaches to the problem, based on dictionaries and already annotated biblical texts. We provide a detailed evaluation of annotated and unannotated translations. We also discuss a proof of concept based on English and German New Testament translations. The results presented in this paper are novel and, to our knowledge, unique. They show promising performance, although further research is needed.

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2024. Published by Cambridge University Press

1. Introduction

Building a concordance of texts, automated text alignment, and automated text translation are well-studied research topics. A semantic concordance is a widely used approach to link text corpora with data and values in lexicons (see Landes, Leacock, and Tengi Reference Landes, Leacock and Tengi1998). Even in the humanities, much research has been done in this broad field of text mining and automated text processing. Coming to the field sometimes called Digital Theology as a subfield of Digital Humanities and its intersection with ancient languages, we still see several challenges, although the problems themselves may seem simple and a standard task.

Here we want to address the challenge of automatically annotating words within New Testament texts in order to create parallel Bible corpora in different languages. Our goal is to create cross-lingual concordances for New Testament texts and translations. These are widely used for research and teaching, see Fig. 1 for a typical use case.

Figure 1. Illustration of a parallel bible view provided by https://www.stepbible.org . It shows two English translations (ESV and KJV) and a Greek text (SBLG).

Thus, the research problem can be stated as follows: Given a Bible text in English or German language, how can we annotate the corresponding Greek or Hebrew word in the original text? Usually this annotation is done using Strong’s numbers which we will introduce in the next Section. For example, in John 4:4 (“And he had to pass through Samaria”) the word “and” should be annotated by the Strong’s number G1161 referring to Greek $\delta \varepsilon$ and “Samaria” to G4540. Thus, the task is to assign of Strong’s numbers to bible versions which currently do not have these numbers.

Our approach is limited to the mapping between translated words, given both the translation with or without further information, and the Greek source with morphological information. This paper is a revised, improved, and extended version of Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021). Here, it was shown that AI approaches based on CRFs do not perform as well as rule-based algorithmic approaches, for example, with an $F_{1}$ -score of 0.13 compared to 0.84 for Luther 1912. However, this work had several challenges, such as a lack of detailed analysis and limited evaluation on a few already annotated texts. We will present improvements that lead to a well-functioning environment for some use cases: First, we generalized the approach to work with all available POS categories. Second, we have improved the algorithm with a generic function to enhance different values (e.g., by the position of words in a sentence) and a generic threshold that helps to analyze different scenarios. Third, we evaluate different strategies to limit the number of words that can be assigned to categories. Finally, we present an evaluation of several texts.

Research on biblical texts and translations has a long tradition, and translations have been widely used. In the nineteenth century, there was a great increase in the number of different Bible translations, and thus research in this field also increased (see Metzger Reference Metzger2001).

New approaches from computer science have also been used to evaluate translations and texts, but have only really taken off in the last 30 years as they have become more accessible to scholars with different backgrounds. It is possible to use these methods to understand the manual curation and understanding of texts. For example, automated identification of actors and locations can help to understand (social) networks in literary texts (see Dörpinghaus Reference Dörpinghaus2021). In addition, these methods may also be useful for other early Christian texts and may help to improve the automated processing of, for example, ancient church orders or biblical apocrypha. In addition, these methods can help improve the technological solutions for automated approaches. First, it is possible to use these methods to understand the manual curation and understanding of the text. Second, it would be possible to improve the technological solutions for automated approaches. Here, Clivaz (Reference Clivaz2017) states that very little research has been done in this area. Anderson (Reference Anderson2018) underlines the lack of interest of theologians in digital and modern text mining methods a year later. The field of digital theology is emerging, but shows more interest in current trends in digitization (see Sutinen and Cooper Reference Sutinen and Cooper2021). For a detailed discussion of this topic (see also Dörpinghaus Reference Dörpinghaus2022). Only the areas of digital manuscripts, digital academic research, and publishing show some progress (see Clivaz, Gregory, and Hamidović Reference Clivaz, Gregory and Hamidović2013). This work tries to be a step toward closing this gap.

Because scholarly editions and translations of biblical texts are often not freely available, and because annotation, curation, and quality control of alignments between these texts are expensive, there is a lack of available biblical resources for scholars. The goal of this work is to develop and evaluate novel approaches for automatically generating alignments for parallel Bibles, leading to cross-lingual semantic concordance. We have based our work on the Sword Project,Footnote a which provides a full API available under the GNU license, and has been able to work with Greek texts, English and German translations available.

In this work, we present an improved approach for automated annotation of New Testament corpora with cross-lingual semantic concordance based on Strong’s numbers. We introduce two improved approaches to the problem, based on dictionaries and already annotated biblical texts. We provide a detailed evaluation of annotated and unannotated translations. We also discuss a proof of concept based on English and German New Testament translations. The results presented in this paper are novel and, to our knowledge, unique. They show promising performance, although further research is needed.

The remainder of the paper is organised as follows: The second section gives a brief overview over the state of the art and related work. The third section is dedicated to the data foundation. We will also discuss the annotation style and the selection of training and test data. In the fourth section, we present two approaches to tackle the problem. The fifth section is dedicated to experimental results on annotated and non-annotated translations. Our conclusions are drawn in the last section. The results presented in this paper are novel and, to the best of our knowledge, unique. They show promising performance, although further research is needed.

2. Related work

Since little research has been done in this area, we list all available materials, even if their tasks are only tangentially related. In biblical research, The Exhaustive Concordance of the Bible from 1890 is widely used to link words from biblical texts to dictionary entries. These so-called Strong’s numbers can be used to create automatically aligned parallel texts, see Cysouw, Biemann, and Ongyerth (Reference Cysouw, Biemann and Ongyerth2007) or Wälchli (Reference Wälchli2010), who created semantic maps from parallel text data. Here, texts in several languages are presented together (see Simard Reference Simard2020). It is important to note the discrepancy between other fields of research and the study of biblical texts: Although several approaches in biblical research are based on machine translation, these texts are still mainly hand-crafted, see for example the Greek-Hebrew-Finnish corpus of Yli-Jyrä et al. (Reference Yli-Jyrä, Purhonen, Liljeqvist, Antturi, Nieminen, Räntilä and Luoto2020) or the approach described by Rees and Riding (Reference Rees and Riding2009) and Riding and Steenbergen (Reference Riding and Steenbergen2011). Even though the Bible is often used as a training or reference model for unsupervised learning models for translation, see for example Diab and Finch (Reference Diab and Finch2000), Resnik, Olsen, and Diab (Reference Resnik, Olsen and Diab1999), Christodouloupoulos and Steedman (Reference Christodouloupoulos and Steedman2015), only few approaches have been made to analyze religious or theological texts with methods from AI and text mining.

For example, McDonald (Reference McDonald2014) applied statistical methods to religious texts to evaluate their similarity based on word vectors. Another simple analysis was carried out by Verma (Reference Verma2017), and research on the reuse of historical texts was done by Büchler et al. (Reference Büchler, Geßner, Eckart and Heyer2010) using text mining technologies. Usually word frequencies are used to discuss the common authorship of biblical books (see e.g., Erwin and Oakes Reference Erwin and Oakes2012). These so-called stylometric studies are not without critique (see Eder Reference Eder2013).

To cover the linguistic question, other scholars have examined the impact of computer technology on Bible translation and discussed its limitations (see Riding Reference Riding2008). Since Bible translations are not usually the subject of linguistic research, but are interesting for the history of languages, there is a wide range of publications and analyses of recent translations, see for example Renkema and van Wijk (Reference Renkema and van Wijk2002) and De Vries (Reference De Vries2000). There is also a considerable amount of literature on Bible translation (see Scorgie et al. Reference Scorgie, Strauss and Voth2009). It is important to note that Bible translation is not just a matter of choosing between translation strategies such as formal or dynamic equivalence.Footnote b

Encoding linguistic information in multilingual documents produces Interlinear Glossed Text (IGT). Biblical texts are usually well studied, so both Strong’s numbers and morphological information are available for Hebrew and Greek texts. Automated glossing is also a widely studied area for other texts and languages (see Rapp, Sharoff, and Zweigenbaum Reference Rapp, Sharoff and Zweigenbaum2016; McMillan-Major Reference McMillan-Major2020; Zhao et al. Reference Zhao, Ozaki, Anastasopoulos, Neubig and Levin2020). Much work in this area has been devoted to methods based on neural networks and word embeddings. Sabet et al. (Reference Sabet, Dufter, Yvon and Schütze2020) applied static and contextualized embeddings and showed that an approach without parallel data or dictionaries could produce multilingual embeddings. However, their approach did not generalize to all languages in their test environment. Another approach was presented by Dou and Neubig (Reference Dou and Neubig2021): They used fine-grained embeddings and parallel corpora. Their work is limited to several modern languages and generally shows comparable results to other models. Notably, the performance differs from one language to the other, showing that there is a lot of detailed work to be done depending on the particular language. Current AI approaches have the disadvantage that it is usually difficult to understand and adapt details in models. Therefore, we will pay special attention to translation approaches and their properties. Recently, Yousef et al. (Reference Yousef, Palladino, Shamsian, d’Orange Ferreira and dos Reis2022a) not only introduced a gold standard for ancient Greek texts (to English and Portuguese) but also worked on tuning translation alignments by combining unsupervised training on mono- and bilingual texts with supervised training on manually aligned sentences. This clearly shows that large corpora of training data are very important (see also Palladino, Shamsian, and Yousef Reference Palladino, Shamsian and Yousef2022). For biblical texts, many parallel texts are available. However, Koine Greek is different from other variants of Ancient Greek. Their work is accompanied by a tool for manual creation of alignment corpora, see Yousef et al. (Reference Yousef, Palladino, Shamsian and Foradi2022b), and visual evaluation of models (see Yousef, Heyer, and Jänicke Reference Yousef, Heyer and Jänicke2023). See also the survey by Sommerschield et al. (Reference Sommerschield, Assael, Pavlopoulos, Stefanak, Senior, Dyer, Bodel, Prag, Androutsopoulos and de Freitas2023). AI-based approaches have also been used by Tyndale House in Cambridge to create parallel Bible corpora (see Instone-Brewer Reference Instone-Brewer2023). However, their work based on the Berkeley Word Aligner required 70 volunteers to complete the work. Problems with AI methods on biblical texts have also been identified by Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021). Only a little research has been done on the Qur’an (see Muhammad Reference Muhammad2012). Some research has been done on word-for-word translation, especially for word-for-word translation without parallel data (see Conneau et al. Reference Conneau, Lample, Ranzato, Denoyer and Jégou2017; Li et al. Reference Li, Zhang, Yu and Hu2021). Other work has been done on misalignment (see Tsvetkov and Wintner Reference Tsvetkov and Wintner2012). For automated translation, there are no resources for ancient Greek (see Biagetti, Zanchi, and Short Reference Biagetti, Zanchi and Short2021). Other approaches, such as GASC (see Perrone et al. Reference Perrone, Palma, Hengchen, Vatri, Smith and McGillivray2019), build a Bayesian model to describe the evolution of words and meanings in ancient texts. They note “a lack of previous works that focussed on ancient languages”. Thus, not only are the target texts a new field, but we have very little work to build on in the field of automated translation.

In summary, the combination of different methods is the key to obtaining high-quality alignments, see for example Fei, Zhang, and Ji (Reference Fei, Zhang and Ji2020), Steingrimsson, Loftsson, and Way (Reference Steingrimsson, Loftsson and Way2021), Vu et al. (Reference Vu, He, Phung and Haffari2021). It is a crucial point for the creation of interlinear glossed biblical texts to really understand the detailed concepts of the languages and either use large training corpora and supervise the results of these methods, for example, by manually curating the texts. Because of these various complexities, we decided to apply and improve classical algorithmic approaches to identify the underlying challenges.

3. Data

3.1. Overview

Here we will focus on the original Greek text and its representation in the English and German translations of the Bible, although this approach can be applied to any other language. There are several software packages available for accessing biblical texts. Some commercial software, such as Logos, provide no or very limited access to their API.Footnote c So we have based our work on the SWORD project, which provides a full API available under the GNU license.Footnote d We selected biblical texts based on their availability under an open license that ensures reproducibility and diverse translation approaches. For the Greek text, we used the SBLGNT 2.0 from Tyndale House, based on SBLGNT v.1.3 from Crosswire. This text is comparable to the Nestle-Aland/United Bible Societies text with some minor changes. The English texts are based on the KJV (King James Version, 1769), ASV (American Standard Version, 1901), and ESV (English Standard Version, 2011). The German texts are based on Luther (1912), Leonberger Bible (2017, based on Nestle-Aland 28 or Robinson-Pierpont 18). All data are available under a free license.Footnote e

There are several approaches to translating biblical texts. The KJV, ESV, and ASV follow a traditional word-for-word approach, also known as formal equivalence. The Leonberger Bible follows the same approach, while the Luther 1912 also includes elements of the thought-for-thought approach known as dynamic equivalence. For testing purposes, we will also consider translations that use a paraphrase approach. We will use the New Revised Standard Version (NRSV), the World English Bible (WEB), Luther 2017, Hoffnung for alle (HFA), and later the very free text of the German VOLXBIBEL. See Table 1 for an overview. For a detailed overview of Bible translations (see Metzger Reference Metzger2001).

Table 1. Overview of training and test data. Here tft refers to thought-for-thought, pa to paraphrase approach, and wfw to word-for-word (formal equivalence). Texts with Strong’s numbers are used for training and testing, and texts without Strong’s numbers only for testing. The Remarks column indicates special cases: For the Leonberger Bible, translations based on two different Greek texts are available, and the VOLX-Bible provides a text based on the German colloquial youth language

There are several annotations that can be displayed in different ways. Here we rely on XML output.Footnote f Both lemmatical and morphological information are contained in w-tags. For an example on Acts 1:1, see Fig. 2.

Figure 2. A snippet of the XML output for Acts 1:1 from diatheke.

However, additional morphological information may be available: <w lemma=“strong:G3588” morph=“robinson:T-ASM” savlm=“strong:G3588” src=“21”> $\tau o\nu$ </w>. They are usually stored according to RMAC (Robinson’s Morphological Analysis Codes see Robinson Reference Robinson1973). Thus, we will use the existing morphological information, if available. If not, we will proceed as described in the next section. In summary, we will use this XML-based annotation style for extracting information as well as for storing and comparing data.

3.2. Training and test data

To collect the training data, we can use the complete New Testament texts mentioned above. This results in 7957 verses in each version. There are 5624 entries in Strong’s dictionary. We tested our models on a complete corpus, or a random subset of both the same and different translations (see Dörpinghaus Reference Dörpinghaus2023) for data. Comparing fully annotated texts is easy because we can use the whole corpus. That is, we computed precision, recall, and $F_{1}$ -score for annotating Strong’s numbers on the entire New Testament corpus. Here, the corresponding gold standards are available as Sword modules.Footnote g

To evaluate non-annotated texts, we created gold standards for several verses and translations. Some of these will be discussed in more detail because they show certain limitations of our approach. However, to evaluate the quality measures, we selected 20 verses from different books, both from the Gospels and the Acts of the Apostles, and from different epistles. It was important to select a variety of verses, both from narrative texts (Mk 10:3; Lk 1:9; Jn 12:2, 21:1; Acts 8:14) and from Gospel-specific verses (Mark 1:1; John 19:35), enumerations (Acts 27:5), apocalyptic texts (Rev 1:19; 14:5), and letters (Rom 1:1; 12:4; Eph 1:8; 1 Peter 1:10; 1 John 1:5; Jude 1:8). To make our approach comparable to other methods, we have published this gold standard (see Dörpinghaus Reference Dörpinghaus2023).

In addition, we will test our model on some verses from newer versions, such as the new German VOLX Bible. Here the verses are evaluated manually. For a detailed overview, see Table 1.

4. Methodology

4.1. Workflow

All steps have been implemented using SWORD 1.9.0.3874,Footnote h diatheke 4.8Footnote i as CLI front end, and Python 3.8. We used the following libraries: BeautifulSoup Footnote j for XML parsing, spaCy Footnote k for POS tagging and jellyfish Footnote l for measuring the difference between two strings, for example, by Levenshtein distance. Using different texts from SWORD and different language models in spaCy shows that we can easily switch the language-specific components. Thus, at least for similar input and output languages, the proposed workflow could in principle be language independent. However, our examples are based on English and German texts, which only shows that this assumption holds for Germanic languages. The extent to which this holds for other languages needs to be investigated.

4.2. Modeling

We have biblical texts that contain verses. Each verse $X$ contains a sequence of words, so

(1) \begin{align} X^L&= x^L_1, \ldots, x^L_N \end{align}
(2) \begin{align} X^{L'}&= x^{L'}_1, \ldots, x^{L'}_M. \end{align}

However, without loss of generality, let us assume that $L$ contains annotations to Strong’s numbers. Then we want to model the target glossing $f\;:\;X^L\rightarrow X^{L'}$ , which contains mappings from a word origin $x^L_i\in X^L$ to another word $x^{L'}_j\in X^{L'}$ with the same Strong’s number. Let $Y$ be a sequence of all mappings, then we have to compute $P(Y|X^L)$ .

We need to add a short note about verses in biblical texts: Verse content and numbering can differ slightly, for example, Catholic Bible texts use a different numbering especially for Old Testament texts (see Mayer and Cysouw Reference Mayer and Cysouw2014). However, all considered texts use the same numbering scheme, and in any case, the Sword Library can handle these different schemes.

Due to the amount of annotated data for biblical texts, we have several options:

  • If $L$ and $L'$ are different languages, we must find the appropriate syntactic equivalent in $L'$ . This can be complicated, especially for certain grammatical constructions that have a different form in $L'$ . For example, if $L$ is ancient Greek, it is unclear whether this approach will work for languages such as English or German. Also, there are no language models for Ancient Greek or Hebrew, see Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021) and the survey provided by Sommerschield et al. (Reference Sommerschield, Assael, Pavlopoulos, Stefanak, Senior, Dyer, Bodel, Prag, Androutsopoulos and de Freitas2023).

  • If $L$ and $L'$ are the same language and annotated texts exist, the task is reduced to finding the match for a given part of speech. However, this is only true for syntactically close translations and may have several other restrictions, for example, for varieties of languages.

Here we propose a two-step method. As input, we use the target text (a translated text) verse by verse, if necessary, the original Greek text with Strong’s annotations and some additional information from dictionaries and biblical translations. Then we annotate the target glossing. See Fig. 3 for an illustration.

4.3. Preprocessing and matching

After detecting parts of speech in the target text, we can sort words from the original annotated source text and the target text based on parts-of-speech. This helps to reduce the target word set. Since we know the Greek Strong’s numbers, we can use lemmatization to compare words and assign the best match. A first algorithmic approach was presented in Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021), see Algorithms 1. Here, the authors preprocessed with a parts-of-speech tagger limited to a list of five categories: nouns, verbs, conjunctions, prepositions, and pronouns. We will call this approach $POS_0$ . However, we did not change the libraries used for parts-of-speech recognition. This is due to the fact that spaCy shows one of the best results on current German texts, and there is very little difference in performance between spaCy and other libraries like NLTK or StanfordNLP (see Ortmann, Roussel, and Dipper Reference Ortmann, Roussel and Dipper2019). In addition, we wanted to compare the former and the new method. However, the underlying algorithm does not depend on the library used for parts-of-speech detection.

Algorithm 1 Dictionary-based-matches I

Figure 3. The proposed method with example data (Acts 1:1). In general, we use as input the target text (a translated text) verse by verse and an existing annotated text. The original Greek text with annotations could be used when adding a translation or dictionary. Existing dictionaries can be used, or dictionaries can be created from the use of Strong’s numbers in a given translation. First, we use POS tagging and lemmatization to extract the matching words. Then we annotate the target gloss by finding the best matches, either by grouping words at POS or by considering all available terms.

In this paper, we will also apply an approach that covers all parts-of-speech detected by spaCy,Footnote m which we will denote by $POS_1$ . We get a set

\begin{equation*}\mathfrak {P}=\{POS_{0},POS_{1}\}.\end{equation*}

Thus, $POS_1$ and $POS_0$ differ in the number of parts of speech considered for matching. However, in these approaches, matches are found only within one category, for example, only verbs are matches, see for example line 1, in Algorithm 2. However, we may modify this approach, since the usage of conjunctions, prepositions, and pronouns is not consistent across languages, and may even vary within a single language. We denote the mixing of all parts-of-speech by $all$ and the mixing of only conjunctions, prepositions and pronouns by $cpp$ . We get a set

\begin{equation*}\mathfrak {C} = \{all, none, cpp\}.\end{equation*}

Thus, each approach is a triple set defining input, parts-of-speech approach, and matching approach: $(input, \mathfrak{p}\in \mathfrak{P}, \mathfrak{c}\in \mathfrak{C} )$ . We will use $\ast$ to indicate that we will consider multiple approaches for this element, for example $(bible, POS_{1}, \ast )$ includes all approaches in $\mathfrak{C}$ . See Table 2 for an overview.

Algorithm 2 Dictionary-based-matches II

Table 2. Overview of the approaches evaluated in this paper. POS and POS 0 differ in the number of parts of speech considered for matching. The column “categories” describe whether only elements within a category are matched (all), whether all elements are mixed (none), or whether only conjunctions, prepositions, and pronouns are mixed (cpp)

Algorithm 3 Extract Dictionary

In the Algorithms 1 and 2, the function $\delta$ refers to a distance function (such as Levenshtein distance or cosine similarity). The function $dict$ returns dictionary entries for a given word, which can be used for mapping between different languages. However, in the case of equal languages, we define $dict(w)=w$ . To distinguish between dictionary-based and a dictionary extracted directly from the original text, we use $\Delta$ as input when relying on one or more dictionaries.

In our case, we extracted all the translations used by a particular Bible for a given Strong’s number, since Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021) showed several difficulties when working with existing dictionaries. Although further research could be done using resources such as the lexical-semantic network for German “GermaNet” (see Kunze and Wagner Reference Kunze and Wagner2001), or the lexical database “WordNet” (see Miller Reference Miller1995). Thus, the main difference between a Bible and $\Delta$ as input is that the former matches only lemmata from a particular source, while $\Delta$ matches all usages of a particular Strong’s number within the source.

To make this data available, we wrote an importer that creates a list of words in the target language that are associated with a Strong’s number. This dictionary-based approach is a lazy learner approach, since we learn the dictionaries first, but the comparison and mapping are done in a separate step.

However, it is clear that this approach can be optimized. Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021) showed several weaknesses of this naive approach. For example, even for the same source and target corpus, the results were not perfect. In line 8 of the Algorithm 1, we can define a threshold $\varepsilon$ and adjust line 6 to reinforce special cases, for example, the order of words.

Thus, in Algorithm 2, we introduce a set of functions $f$ to reinforce different values. For example, we can replace $f=pos_z(x,y)$ with

\begin{equation*}pos_{z}(x,y)=\begin {cases} z & pos(x)=pos(y)\\ 0 & else \end {cases}\end{equation*}

to reinforce labels that are at a similar position in the source and target text.

4.4. Extracting dictionaries

In Algorithm 3, we present an approach to extract dictionaries from annotated translations. Given a Bible text $b$ with Strong’s numbers, we iterate over all 5625 Greek terms in line 2. The function $find$ in line 5 returns all verses containing a given Strong’s number. In line 3, we use the function $getWords$ which returns the lemmatized usage of a Strong’s number within a given verse.

The result highlights the specification of certain translations. For example, G0033 (“age”, used only in James 4:13; 5:1) is not annotated by the Leonberger Bible, while Luther 1912 annotates the term “Wohlan”. Other translations are different, for example, Luther 1912 translates G0001 as “A” and Leonberger Bible as “Alpha”. ASV and ESV translate G0032 similarly with “angel” and “messenger”. However, the ASV does not annotate G0029, while the ESV uses “force” and “compel”.

This already foreshadows some problems we will discuss in the next section.

4.5. Evaluation

The performance of the approaches is evaluated by comparing each annotation in its final output with the test dataset of annotated biblical texts. Thus, we need to cross-evaluate different input scenarios against different and similar output scenarios.

Since our approach produces Strong’s numbers annotations for words in the translated text, the first question is whether this leads to correct assignments on the same text. We will also evaluate whether combining different models leads to better solutions. Since these approaches may predict Strong’s numbers that have more or fewer occurrences in the text, we add both precision and recall to our evaluation, defined as follows:

(3) \begin{equation} Recall=\frac{TP}{FN+TP} \end{equation}

(4) \begin{equation} Precision=\frac{TP}{FP+TP} \end{equation}

Here TP means true positives (a correct assignment), FN false negatives (assigning no or the wrong Strong’s number to a word which originally has one), and FP false positives (assigning a Strong’s number to a word which does not have one). Thus, $FP+TP$ returns all positive results and $FP+TP$ are all samples that should have been identified as positive. The $F_1$ score is the harmonic mean of $Precision$ and $Recall$ . The best value is 1 and the worst value is 0. The formulas used are

(5) \begin{equation} F_1 = 2 \cdot \frac{\textit{Precision} \cdot \textit{Recall}}{\textit{Precision} + \textit{Recall}}, \end{equation}

These metrics are presented as a micro-average across all verses. Furthermore, we will analyze how these systems work on unannotated translations. For this purpose, a few verses have been selected to evaluate the output.

5. Results

5.1. Results on $\textit{POS}_{0}$

We evaluated and compared the output of Algorithm 2 with $\varepsilon =2$ and $f=pos_z(x,y)$ for Luther 1912 and GerLeoNA28 as target texts and Luther 1912, GerLeoRP18 and GerLeoNA28 as sources. See Table 3 for a detailed overview. Here, the same target and source texts lead to a significantly better result than the results presented in Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021). Most interestingly, the $F_1$ score is high not only for the same input and output but also for the other texts. Thus, the Leonberger Bible text seems to be very close to Luther, both syntactically and in word choice. However, we could also reproduce that the Leonberger Bible as target text behaves differently: Here, the combined approach improves the results, while this is not the case for Luther 1912.

Comparing these results with the English translations, KJV and ESV, in Table 4, reveals some interesting observations. While our approach significantly improves for the KJV compared to Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021), it performs worse for the ESV, which also follows a word-for-word approach. However, while the advantage of this evaluation is the existence of a fully annotated text, it provides a very specific environment.

Table 3. Results of algorithm $(bible, POS_{0}, all)$ with $\varepsilon =2$ and $f=pos_z(x,y)$ for Luther 1912 and GerLeoNA28 as target texts. The column “ $F_1$ (D)” shows the results by Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021)

Table 4. Results of algorithm $(bible, POS_{0}, all)$ with $f=pos_z(x,y)$ for KJV ( $\varepsilon =2$ ) and ESV ( $\varepsilon =16$ ) as target texts

Table 5. Results of algorithm $(bible, POS_{0}, all)$ with $\varepsilon =2$ and $f=pos_z(x,y)$ for Luther 2017 and HFA as target texts

Table 6. Results of algorithm $(bible, POS_{0}, all)$ with $\varepsilon =2$ and $f=pos_z(x,y)$ for NRSV and WEB as target texts

In order to analyze the results of previously unannotated texts, we created a gold standard for several verses from the Gospels, Acts, and Epistles for several translations. A detailed evaluation with precision, recall, and $F_1$ -score can be found in Table 5, for the German translations HFA, a more free translation, and Luther 2017, which is close to Luther 1912. Again, the results are much better than Dörpinghaus and Düing (Reference Dörpinghaus and Düing2021).

The dictionary-based approaches on German translations (Table 5) show very promising results. The precision value is high, although the recall value increases for HFA and Leonberger Bible. We see a different behavior for Luther 1912 and GerLeoNA28. For the latter, the combination of both dictionaries increases the recall, but also decreases the precision. This means that a smaller proportion of data is correctly annotated, but the relative proportion of correctly annotated data increases. This implies that the amount of annotated parts strongly depends on the data used – it is not as simple as “more is better”, but it is crucial to note that a combination of dictionaries needs to be carefully investigated.

One of the reasons may be that although both translations were done with the same approach, there are more than a hundred years between them considering Luther 1912. Thus, the words and their meanings may have changed. In the next section, we will make some preliminary observations about more recent translations.

This is even more significant for the evaluation of the English translations in Table 6. ESV and ASV are both based on the KJV, and again there are more than a hundred years between them (1769, 1901, 2011). The two most recent translations show a good result, the recall value is high, and the precision value increases with the matching dictionary. The most remarkable result can be found when using KJV for a combination of dictionaries, it even decreases the values. This result has further strengthened our confidence that it is crucial to evaluate the dictionary base for this approach.

However, as we have already mentioned, the precision value is also misleading, since we only evaluate the words recognized by the POS-tagger approach. So we can see two extreme situations: First, the translation has added several phrases and words. So we have more words to tag than words in the original Greek text. Second, the translation uses different paraphrases and constructs to express a longer Greek text, resulting in less words to tag than words in the original Greek text. In Table 7, we have evaluated all six texts we used for our tests. In most cases, there are more words in the original text than our POS tagging engine could detect. For the German texts, the average is well below one, but the extreme values have a higher amount. However, the values are exactly the same for each language. So we can summarize that all available texts had a similar approach.

As we can see, we have selected different values for the bound $\varepsilon$ . In Figs. 4 and 5, we provide a detailed analysis of the $F_1$ scores for changing $\varepsilon$ . As we can see, there is an optimal value, but it has to be found by experiment. Usually, the $F_1$ score does not improve significantly when $\varepsilon \gt 3$ . However, we must emphasize the importance of preprocessing the texts, even for annotated texts. The KJV annotates texts differently, for example, in Acts 1:3, we find the following annotations of multi-word statements and phrases, while other translations like ESV annotate single words, see Fig. 6.

This leads to problems when using KJV as a basis for further annotations. Thus, a further improvement might consider more extensive preprocessing of previously annotated texts.

Table 7. This table shows the minimum, average, and maximum difference between the total number of references to Greek words (Strong’s numbers) and the detected number of words. Interestingly, these numbers are the same for all texts in one particular language

Figure 4. $F_1$ Score for different values of $\varepsilon$ (x-axis) for HFA (left) and Luther 2017 (right) as target text.

Figure 5. $f_1$ Score for different values of $\varepsilon$ (x-axis) for NRSV(left) and WEB (right) as target text.

Figure 6. Example of KJV (top) and ESV (bottom) annotations for Acts 1:3.

5.2. Testing on non-annotated translations

To test our approach on a recent translation, we will use different verses with different linguistic challenges. First, we will use both Luther 1912 and GerLeoNA28 as a basis for mapping. As a first example, we will consider Matt 1:2, which is a noun-centered sentence. For Luther 2017, we get the assignment shown in Fig. 7.

Figure 7. Application to Luther 2017 (Matthew 1:2). The corresponding English text according to the ASV is: “Abraham begat Isaac, and Isaac begat Jacob, and Jacob begat Judah and his brethren.”

Figure 8. Application to HFA (Matthew 1:2). The corresponding English text according to the ASV is: “Abraham begat Isaac, and Isaac begat Jacob, and Jacob begat Judah and his brethren.”

While previous approaches assign G1161 ( $\delta \varepsilon$ ) instead of G2532 ( $\kappa \alpha \iota$ ), indicating the challenge of assigning the correct particles, the proposed approach works correctly. This assignment contains 2 missing assignments of Strong’s numbers. This text is identical to the 2006 Elberfelder translation.

We will show the performance on two more German translations, following a thought-for-thought approach known as dynamic equivalence. Hoffnung für alle (HFA, 2015) is less rigorous than the VOLXBIBEL (2014), which follows a youth communication paradigm. The results in Fig. 8 were run with $\varepsilon =4$ . It contains 3 wrong or missing assignments. All verbs are missing. Again, particles are a challenge, the $\delta \varepsilon$ of the original Greek sentence is missing; but this is also due to the fact that it was omitted in the translation. The results done with $\varepsilon =6$ only show additional misclassified attributes, “Vater” (father) was wrongly assigned to G2384.

Here the description is changed. Instead of describing the begetter, a passive construction “…is father of…” was chosen. The word father was assigned, but could not be found in the Greek text. These errors increase when this method is applied to the VOLXBIBEL, see Fig. 9.

Table 8. Results of algorithm $(bible, POS_{1}, all)$ with $\varepsilon =6$ and $f=pos_z(x,y)$ for Luther 1912 and GerLeoNA28 as target texts

Table 9. Results of algorithm $(bible, POS_{1}, all)$ with $\varepsilon =6$ and $f=pos_z(x,y)$ for Luther 2017 and SLT as target texts

Figure 9. Assignment on VOLX in Matt. 1:2 (Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judah and his brethren).

Figure 10. $f_1$ Score for different values of $\varepsilon$ (x-axis) for Luther 2017 as target text.

There are many paraphrases (e.g. mixing son and father, both wrongly assigned), additional terms (the promise of land), and additional words (“Leute”, “Land”, etc.). Thus, some assignments are neither truly correct nor incorrect. For example, the translation uses “und” (and) for both $\kappa \alpha \iota$ and $\delta \varepsilon$ , while the algorithm assigns only G2532. Other particles are mostly missing. However, most names are assigned correctly. In previous work, not a single word is correctly assigned (see Dörpinghaus and Düing Reference Dörpinghaus and Düing2021). In summary, our approach works best for formal equivalence or dynamic equivalence translations. While it will not work for paraphrase approaches. Here only some parts can be annotated, for example, nouns (locations, names, etc.) or certain verbs. Although their defining characteristic is that they do not match the original language word for word, this annotation is still useful for linking to encyclopedias or other cross-references.

5.3. Results on $POS_{1}$

As we discussed above, the translation uses different paraphrases and constructs to express a longer Greek text, which results in fewer words to be tagged than in the original Greek text. For German translations, this value is between −19 and 16 (average −4.791) and for English translations between −13 and 0 (average −5.375). So the situation is different from the results of $POS_0$ – overall we have fewer words in the original text than our POS-tagger detected.

Tables 8 and 9 show the results for $(bible, POS_{1}, all)$ for German translations. For Luther 1912, the results are comparable, although the recall values are higher. The same is the case for Luther 2017 and SLT. Here the $F_1$ score is lower than with the $(bible, POS_{0}, all)$ approach. In Fig. 10, we show the output of $(bible, POS_{1}, all)$ , $(bible, POS_{1}, cpp)$ , $(bible, POS_{1}, none)$ , and $(\Delta, POS_{1}, all)$ for different translations as input. We see that $(bible, POS_{1}, all)$ is the best approach overall, and the choice of dictionaries is key. Combining different dictionaries does not improve the output in general, but might be a good choice if the best input is unknown.

Tables 10 and 11 show the results for $(bible, POS_{1}, all)$ on English translations. Here, $(bible, POS_{1}, all)$ provides better results than $(bible, POS_{0}, all)$ for KJV and ESV and comparable results for NRSV and WEB. Again, recall is generally higher. Table 11 also shows some surprises. First, larger values for $\varepsilon$ worsen the $F_1$ score for NRSV. Second, for WEB, it neither improves nor decreases the $F_1$ score, but while precision increases, recall decreases. In Figs. 11 and 12, we show the output of $(bible, POS_{1}, all)$ , $(bible, POS_{1}, cpp)$ , $(bible, POS_{1}, none)$ , and $(\Delta, POS_{1}, all)$ for different translations as input.

Table 10. Results of algorithm $(bible, POS_{1}, all)$ with $f=pos_z(x,y)$ for KJV ( $\varepsilon =13$ ) and ESV ( $\varepsilon =8$ ) as target texts

Table 11. Results of algorithm $(bible, POS_{1}, all)$ and $f=pos_z(x,y)$ for NRSV ( $\varepsilon =2$ ) and WEB as target texts

Figure 11. $F_1$ Score for different values of $\varepsilon$ (x-axis) for KJV as target text.

Figure 12. $F_1$ Score for different values of $\varepsilon$ (x-axis) for ESV as target text.

However, Table 10 shows a significant performance improvement for $\varepsilon =13$ . While all other approaches do not change at about $\varepsilon \gt 9$ , the KJV is special, as we discussed earlier. For example, in Romans 20:5, $POS_1$ finds 39 parts of speech, while only 14 Strong’s numbers are assigned, see Fig. 13.

Figure 13. Example of existing (top) and $POS_1$ (bottom) annotations for Romans 20:5.

This explains why a higher value of $\varepsilon$ still increases the $F_1$ score. However, it also underlines the need for a detailed understanding of the texts and the annotation of Strong’s numbers.

5.4. Testing on non-annotated translations

Again, to test our approach on a recent translation, we will use different verses with different linguistic challenges. First, we will use both Luther 1912 and GerLeoNA28 as a basis for assignment. As a first example, we will consider Matt 1:2, which is a noun-centered sentence. For Luther 2017, we get the assignment in Fig. 14.

Figure 14. Application to Luther 2017 (Matthew 1:2). The corresponding English text according to the ASV is: “Abraham begat Isaac, and Isaac begat Jacob, and Jacob begat Judah and his brethren.”

$POS_1$ recognizes the preposition “sein” (autos), but does not assign G0846 to it. In general, G1161 ( $\delta \varepsilon$ ) is missing, but is also omitted in the translation. Overall, the results are better than $POS_0$ . We will show the performance of two more German translations, following a thought-for-thought approach known as dynamic equivalence. Hoffnung für alle (HFA, 2015) is less rigorous than the VOLXBIBEL (2014), which follows a youth communication paradigm. The results in Fig. 15 were run with $\varepsilon =4$ .

Figure 15. Application to HFA (Matthew 1:2). The corresponding English text according to the ASV is: “Abraham begat Isaac, and Isaac begat Jacob, and Jacob begat Judah and his brethren.”

The assignment of G1080 for “folgen” (to follow) is a paraphrase of “beget”. This result shows more parts of speech than $POS_0$ , overall the quality is comparable. We can see that thought-for-thought approaches are challenging. These errors increase when this method is applied to the VOLXBIBEL, see Fig. 16.

Figure 16. Assignment on VOLX in Matt. 1:2 (Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judah and his brethren).

Again, there are a lot of paraphrases (e.g. mixing son and father, both wrongly assigned), additional terms (the promise of land), and additional words (“Leute”, “Land”, etc.). This means that some assignments are neither correct nor incorrect, similar to $POS_0$ . Again, the translation uses “und” (and) for both $\kappa \alpha \iota$ and $\delta \varepsilon$ , while only G2532 is assigned by the algorithm. In addition, $POS_1$ finds more parts of speech, but does not assign Strong’s numbers to all of them. While most of the additional verbs are wrong, some assignments are missing (“der”, “ein”, “für”, etc.).

6. Discussions and conclusions

6.1. Summary

This paper describes two improved approaches for automatically annotating words within New Testament texts to create parallel Bible corpora in different languages based on $(bible, POS_{0}, all)$ and $(bible, POS_{1}, \ast )$ . Automated annotation of words within biblical texts to create parallel biblical corpora in different languages for cross-lingual concordance alignment of New Testament texts and translations is still an important research topic. On the one hand, it is a limited problem with a fixed set of texts, and on the other hand, it is challenging because it relies on Ancient Greek and Hebrew. We proposed a lazy learner approach using dictionaries of existing annotations and dictionaries extracted from annotated texts. However, our approach emphasizes the importance of proper preprocessing of the data, handling morphology and phrases.

Another contribution of this work is the publicly available evaluation dataset, which can be used to make further work in this area comparable.

Although the amount of training data was generally limited due to strict licensing policies in the field of theology, we were able to obtain promising results for some translations. Applying this approach to thought translations does not seem reasonable, but limiting this approach to nouns can provide links to encyclopedias with good quality.

6.2. Limitations

This paper covers many overlapping fields, such as linguistics, Bible translation, ancient languages, theology, NLP, and ML. Thus, we find several limitations in the output with respect to domain. Some of them are discussed here.

  • Different approaches to Bible translation present different challenges. While our approach works better for formal equivalence, it is limited for paraphrase approaches. In general, since other AI approaches have been shown not to perform as well as rule-based algorithmic approaches, it remains unclear how well approaches for automated annotation of parallel Bible corpora will perform.

  • Since there are no language models for ancient Greek or Hebrew (see Dörpinghaus and Düing Reference Dörpinghaus and Düing2021), we are limited in the use of AI methods. On the other hand, our results could be more useful if they could contribute to existing models. Other languages require further discussion.

  • While the approaches do not assign Strong’s numbers that are not used within a verse, they do not necessarily assign correct numbers to a part of speech. We provide extensive experimental results on mixing different POS, but limiting to one category seems most reasonable. For other languages, especially non-Germanic languages, this may not be the case. In particular, we could not provide an in-depths analysis of errors for POS-tagging. While our results can be used by laypeople, or provide an initial foundation for later expert curation, they cannot be used in the field of theology without further restrictions.

  • In other words: Our evaluation was done on English and German translations. This is a serious limitation, as both are Germanic languages. How does the approach work in other languages? Further experiments could help to discuss the usefulness of this approach for other languages. In addition, it would be valuable to test this approach on languages with fewer resources.

6.3. Future work

Here, we presented an improved method for the automated annotation of parallel Bible corpora work with Strong’s numbers, providing a cross-lingual semantic concordance. We introduced a pipeline that uses the SWORD API. Our approach yields results that depend on the input data and the translation approach of the target Bible. For word-for-word translations, it provides a highly accurate baseline that could be used for further expert curation. However, this method cannot be applied to translations that follow a paraphrase approach, such as the German VOLXBIBEL, and it shows lower performance for non-word-by-word approaches. As noted above, it is also questionable whether it is useful to apply this approach to paraphrased texts beyond linking to encyclopedia entries for nouns. However, this work will hopefully lead to further research and a better understanding of the special requirements in the field of theology, especially in ancient languages.

Our analysis of the limitations reveals a number of questions and possible further improvements:

  • First, we need to consider whether more translations, dictionaries, synonyms, and biblical texts can be used as training data. Although recall may not always improve when more dictionaries are used, a better data basis combined with improvements in modeling and algorithms will improve the results.

  • Second, we need to investigate our approach to parts of speech, because we found that the number of POS and Strong’s numbers in a verse varies. Thus, the gap between the Strong’s annotations in the original texts and the POS tagged words needs to be closed.

  • Third, the proposed approach does not depend on the library used for POS detection. Since we were able to identify some errors using POS detection, we suggest further research on the performance of other libraries such as StanfordNLP or NLTK.

  • Finally, an in-depth error analysis should be done for other AI approaches such as the CRF models presented in other papers. Here, it should be analyzed whether a better feature selection (e.g. POS tagging or dependency labels) is the key.

While our proof of concept is both working and generic, it is still early work on a problem that needs more attention. It already provides useful output for several use cases. In other cases, it could help to automatically build a foundation for detailed manual annotation of texts. However, we hope that it will also highlight the importance of more interdisciplinary research in this field.

Footnotes

b For further details, we refer to Kerr (Reference Kerr2011) or Metzger (Reference Metzger2001).

e See http://www.crosswire.org/sword/modules/ for details on these packages.

f This option is called HTML output format by the used software diatheke, see Section 4.1.

m See the official SpaCy documentation and source code available at https://github.com/explosion/spaCy/blob/master/spacy/glossary.py for a detailed overview.

References

Anderson, C. (2018). Digital humanities and the future of theology. Cursur_: Zeitschrift für Explorative Theologie.Google Scholar
Biagetti, E., Zanchi, C. and Short, W.M. (2021). Toward the creation of wordnets for ancient indo-european languages. In Proceedings of the 11th Global Wordnet Conference, pp. 258266.Google Scholar
Büchler, M., Geßner, A., Eckart, T. and Heyer, G. (2010). Unsupervised detection and visualisation of textual reuse on ancient greek texts.Google Scholar
Christodouloupoulos, C. and Steedman, M. (2015). A massively parallel corpus: the bible in 100 languages. Language Resources and Evaluation 49(2), 375395.CrossRefGoogle ScholarPubMed
Clivaz, C. (2017). Die Bibel im digitalen Zeitalter: Multimodale Schriften in Gemeinschaften. Zeitschrift für Neues Testament 39(40), 3557.Google Scholar
Clivaz, C., Gregory, A. and Hamidović, D. (2013). Digital Humanities in Biblical, Early Jewish and Early Christian Studies. Leiden and Boston: Brill.Google Scholar
Conneau, A., Lample, G., Ranzato, M.A., Denoyer, L. and Jégou, H. (2017). Word translation without parallel data. arXiv preprint arXiv:1710.04087.Google Scholar
Cysouw, M., Biemann, C. and Ongyerth, M. (2007). Using strong’s numbers in the bible to test an automatic alignment of parallel texts. STUF-Language Typology and Universals 60(2), 158171.CrossRefGoogle Scholar
De Vries, L. (2000). Bible translation and primary orality. The Bible Translator 51(1), 101114.CrossRefGoogle Scholar
Diab, M. and Finch, S. (2000). A statistical word-level translation model for comparable corpora. Technical report, University of Maryland Institute for Advanced Computer Studies.CrossRefGoogle Scholar
Dou, Z.-Y. and Neubig, G. (2021). Word alignment by fine-tuning embeddings on parallel corpora. arXiv preprint arXiv:2101.08231.Google Scholar
Dörpinghaus, J. (2021). Die soziale netzwerkanalyse: neue perspektiven für die auslegung biblischer texte? Biblisch Erneuerte Theologie 5, 7596.Google Scholar
Dörpinghaus, J. (2022). Digital theology: new perspectives on interdisciplinary research between the humanities and theology. Interdisciplinary Journal of Research on Religion 18, 1–17.Google Scholar
Dörpinghaus, J. (2023). Evaluation Data for the Annotation of German and English New Testament Texts with Strong’s Numbers. URL https://doi.org/10.5281/zenodo.8024803.CrossRefGoogle Scholar
Dörpinghaus, J. and Düing, C. (2021). Automated creation of parallel bible corpora with cross-lingual semantic concordance. In 2021 16th Conference on Computer Science and Intelligence Systems (FedCSIS). IEEE, pp. 111114.CrossRefGoogle Scholar
Eder, M. (2013). Computational stylistics and biblical translation: how reliable can a dendrogram be. The Translator and the Computer, 155170.Google Scholar
Erwin, H. and Oakes, M. (2012). Correspondence analysis of the new testament. In Workshop Organizers, p. 30.Google Scholar
Fei, H., Zhang, M. and Ji, D. (2020). Cross-lingual semantic role labeling with high-quality translated training corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 70147026.CrossRefGoogle Scholar
Instone-Brewer, D. (2023). Computational alignment of Greek and Hebrew with Bible translations, using Swahili as a proof of concept. Available at https://docs.google.com/presentation/d/1XgTRMsvQ-55W2nUmZ1aQQ4a2aYR56O_17ZUi0ZiLaNM/edit#slide=id.g2878d53056_2_75 (accessed: 25 May 2023).Google Scholar
Kerr, G.J. (2011). Dynamic equivalence and its daughters: placing bible translation theories in their historical context. Journal of Translation 7(1), 1320.CrossRefGoogle Scholar
Kunze, C. and Wagner, A. (2001). Anwendungsperspektiven des GermaNet, eines lexikalischsemantischen Netzes für das Deutsche. Chancen und Perspektiven Computergestützter Lexikographie 107, 229246.CrossRefGoogle Scholar
Landes, S., Leacock, C. and Tengi, R.I. (1998). Building semantic concordances. WordNet: An Electronic Lexical Database 199(216), 199216.Google Scholar
Li, Y., Zhang, Y., Yu, K. and Hu, X. (2021). Adversarial training with wasserstein distance for learning cross-lingual word embeddings. Applied Intelligence 51(11), 113.CrossRefGoogle Scholar
Mayer, T. and Cysouw, M. (2014). Creating a massively parallel Bible corpus. Oceania 135(273), 40.Google Scholar
McDonald, D. (2014). A text mining analysis of religious texts. The Journal of Business Inquiry 13(1), 2747.Google Scholar
McMillan-Major, A. (2020). Automating gloss generation in interlinear glossed text. Proceedings of the Society for Computation in Linguistics 3(1), 338349.Google Scholar
Metzger, B.M. (2001). The Bible in Translation: Ancient and English Versions. Biblical Studies. Baker Publishing Group.Google Scholar
Miller, G.A. (1995). WordNet: a lexical database for English. Communications of the ACM 38(11), 3941.CrossRefGoogle Scholar
Muhammad, A.B. (2012). Annotation of Conceptual Co-Reference and Text Mining the Qur’an. University of Leeds.Google Scholar
Ortmann, K., Roussel, A. and Dipper, S. (2019). Evaluating off-the-shelf NLP tools for German. In KONVENS Google Scholar
Palladino, C., Shamsian, F. and Yousef, T. (2022). Using parallel corpora to evaluate translations of ancient greek literary texts. an application of text alignment for digital philology research. Journal of Computational Literary Studies 1(1), 703–747.Google Scholar
Perrone, V., Palma, M., Hengchen, S., Vatri, A., Smith, J.Q. and McGillivray, B. (2019). GASC: genre-aware semantic change for Ancient Greek, Association for Computational Linguistics. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, Florence, Italy, pp. 5666. https://www.aclweb.org/anthology/W19-4707.CrossRefGoogle Scholar
Rapp, R., Sharoff, S. and Zweigenbaum, P. (2016). Recent advances in machine translation using comparable corpora. Natural Language Engineering 22(4), 501516. https://doi.org/10.1017/S1351324916000115.CrossRefGoogle Scholar
Rees, N. and Riding, J. (2009). Automatic concordance creation for texts in any language. Proceedings of Translation and the Computer 31, 1–11.Google Scholar
Renkema, J. and van Wijk, C. (2002). Converting the words of God: an experimental evaluation of stylistic choices in the new Dutch bible translation. Linguistica Antverpiensia, New Series–Themes in Translation Studies 1, 169–190.Google Scholar
Resnik, P., Olsen, M.B. and Diab, M. (1999). The bible as a parallel corpus: annotating the book of 2000 tongues. Computers and the Humanities 33(1), 129153.CrossRefGoogle Scholar
Riding, J.D. (2008). Statistical glossing, language independent analysis in bible translation. Translating and the Computer 30, 703–747.Google Scholar
Riding, J. and Steenbergen, G. (2011). Glossing technology in paratext 7. The Bible Translator 62(2), 04102. https://doi.org/10.1177/026009351106200206.CrossRefGoogle Scholar
Robinson, H. (1973). Morphology and Landscape. University Tutorial Press.Google Scholar
Sabet, M.J., Dufter, P., Yvon, F. and Schütze, H. (2020). Simalign: high quality word alignments without parallel training data using static and contextualized embeddings. arXiv preprint arXiv:2004.08728.Google Scholar
Scorgie, G.G., Strauss, M.L., Voth, S.M., et al. (2009). The Challenge of Bible Translation: Communicating God’s Word to the World. Zondervan Academic.Google Scholar
Simard, M. (2020). Building and using parallel text for translation. In The Routledge Handbook of Translation and Technology, pp. 7890.Google Scholar
Sommerschield, T., Assael, Y., Pavlopoulos, J., Stefanak, V., Senior, A., Dyer, C., Bodel, J., Prag, J., Androutsopoulos, I. and de Freitas, N. (2023). Machine learning for ancient languages: a survey. Computational Linguistics, 144.Google Scholar
Steingrimsson, S., Loftsson, H. and Way, A. (2021). CombAlign: a tool for obtaining high-quality word alignments. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pp. 6473.Google Scholar
Sutinen, E. and Cooper, A.-P. (2021). Digital Theology: A Computer Science Perspective. Emerald Group Publishing.CrossRefGoogle Scholar
Tsvetkov, Y. and Wintner, S. (2012). Extraction of multi-word expressions from small parallel corpora. Natural Language Engineering 18(4), 549573. https://doi.org/10.1017/S1351324912000101.CrossRefGoogle Scholar
Verma, M. (2017). Lexical analysis of religious texts using text mining and machine learning tools. International Journal of Computer Applications 168(8), 3945.CrossRefGoogle Scholar
Vu, T., He, X., Phung, D. and Haffari, G. (2021). Generalised unsupervised domain adaptation of neural machine translation with cross-lingual data selection. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 33353346.CrossRefGoogle Scholar
Wälchli, B. (2010). Similarity semantics and building probabilistic semantic maps from parallel texts. Linguistic Discovery 8(1), 331371.CrossRefGoogle Scholar
Yli-Jyrä, A., Purhonen, J., Liljeqvist, M., Antturi, A., Nieminen, P., Räntilä, K.M. and Luoto, V. (2020). HELFI: a Hebrew-Greek-Finnish Parallel Bible Corpus with Cross-Lingual Morpheme Alignment. arXiv preprint arXiv:2003.07456.Google Scholar
Yousef, T., Heyer, G. and Jänicke, S. (2023). Evalign: visual evaluation of translation alignment models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 277297.CrossRefGoogle Scholar
Yousef, T., Palladino, C., Shamsian, F., d’Orange Ferreira, A. and dos Reis, M.F. (2022a). An automatic model and gold standard for translation alignment of ancient greek. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pp. 58945905.Google Scholar
Yousef, T., Palladino, C., Shamsian, F. and Foradi, M. (2022b). Translation alignment with ugarit. Information-an International Interdisciplinary Journal 13(2), 65.Google Scholar
Zhao, X., Ozaki, S., Anastasopoulos, A., Neubig, G. and Levin, L. (2020). Automatic interlinear glossing for under-resourced languages leveraging translations. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 53975408.CrossRefGoogle Scholar
Figure 0

Figure 1. Illustration of a parallel bible view provided by https://www.stepbible.org. It shows two English translations (ESV and KJV) and a Greek text (SBLG).

Figure 1

Table 1. Overview of training and test data. Here tft refers to thought-for-thought, pa to paraphrase approach, and wfw to word-for-word (formal equivalence). Texts with Strong’s numbers are used for training and testing, and texts without Strong’s numbers only for testing. The Remarks column indicates special cases: For the Leonberger Bible, translations based on two different Greek texts are available, and the VOLX-Bible provides a text based on the German colloquial youth language

Figure 2

Figure 2. A snippet of the XML output for Acts 1:1 from diatheke.

Figure 3

Algorithm 1 Dictionary-based-matches I

Figure 4

Figure 3. The proposed method with example data (Acts 1:1). In general, we use as input the target text (a translated text) verse by verse and an existing annotated text. The original Greek text with annotations could be used when adding a translation or dictionary. Existing dictionaries can be used, or dictionaries can be created from the use of Strong’s numbers in a given translation. First, we use POS tagging and lemmatization to extract the matching words. Then we annotate the target gloss by finding the best matches, either by grouping words at POS or by considering all available terms.

Figure 5

Algorithm 2 Dictionary-based-matches II

Figure 6

Table 2. Overview of the approaches evaluated in this paper. POS and POS0 differ in the number of parts of speech considered for matching. The column “categories” describe whether only elements within a category are matched (all), whether all elements are mixed (none), or whether only conjunctions, prepositions, and pronouns are mixed (cpp)

Figure 7

Algorithm 3 Extract Dictionary

Figure 8

Table 3. Results of algorithm $(bible, POS_{0}, all)$ with $\varepsilon =2$ and $f=pos_z(x,y)$ for Luther 1912 and GerLeoNA28 as target texts. The column “$F_1$ (D)” shows the results by Dörpinghaus and Düing (2021)

Figure 9

Table 4. Results of algorithm $(bible, POS_{0}, all)$ with $f=pos_z(x,y)$ for KJV ($\varepsilon =2$) and ESV ($\varepsilon =16$) as target texts

Figure 10

Table 5. Results of algorithm $(bible, POS_{0}, all)$ with $\varepsilon =2$ and $f=pos_z(x,y)$ for Luther 2017 and HFA as target texts

Figure 11

Table 6. Results of algorithm $(bible, POS_{0}, all)$ with $\varepsilon =2$ and $f=pos_z(x,y)$ for NRSV and WEB as target texts

Figure 12

Table 7. This table shows the minimum, average, and maximum difference between the total number of references to Greek words (Strong’s numbers) and the detected number of words. Interestingly, these numbers are the same for all texts in one particular language

Figure 13

Figure 4. $F_1$ Score for different values of $\varepsilon$ (x-axis) for HFA (left) and Luther 2017 (right) as target text.

Figure 14

Figure 5. $f_1$ Score for different values of $\varepsilon$ (x-axis) for NRSV(left) and WEB (right) as target text.

Figure 15

Figure 6. Example of KJV (top) and ESV (bottom) annotations for Acts 1:3.

Figure 16

Figure 7. Application to Luther 2017 (Matthew 1:2). The corresponding English text according to the ASV is: “Abraham begat Isaac, and Isaac begat Jacob, and Jacob begat Judah and his brethren.”

Figure 17

Figure 8. Application to HFA (Matthew 1:2). The corresponding English text according to the ASV is: “Abraham begat Isaac, and Isaac begat Jacob, and Jacob begat Judah and his brethren.”

Figure 18

Table 8. Results of algorithm $(bible, POS_{1}, all)$ with $\varepsilon =6$ and $f=pos_z(x,y)$ for Luther 1912 and GerLeoNA28 as target texts

Figure 19

Table 9. Results of algorithm $(bible, POS_{1}, all)$ with $\varepsilon =6$ and $f=pos_z(x,y)$ for Luther 2017 and SLT as target texts

Figure 20

Figure 9. Assignment on VOLX in Matt. 1:2 (Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judah and his brethren).

Figure 21

Figure 10. $f_1$ Score for different values of $\varepsilon$ (x-axis) for Luther 2017 as target text.

Figure 22

Table 10. Results of algorithm $(bible, POS_{1}, all)$ with $f=pos_z(x,y)$ for KJV ($\varepsilon =13$) and ESV ($\varepsilon =8$) as target texts

Figure 23

Table 11. Results of algorithm $(bible, POS_{1}, all)$ and $f=pos_z(x,y)$ for NRSV ($\varepsilon =2$) and WEB as target texts

Figure 24

Figure 11. $F_1$ Score for different values of $\varepsilon$ (x-axis) for KJV as target text.

Figure 25

Figure 12. $F_1$ Score for different values of $\varepsilon$ (x-axis) for ESV as target text.

Figure 26

Figure 13. Example of existing (top) and $POS_1$ (bottom) annotations for Romans 20:5.

Figure 27

Figure 14. Application to Luther 2017 (Matthew 1:2). The corresponding English text according to the ASV is: “Abraham begat Isaac, and Isaac begat Jacob, and Jacob begat Judah and his brethren.”

Figure 28

Figure 15. Application to HFA (Matthew 1:2). The corresponding English text according to the ASV is: “Abraham begat Isaac, and Isaac begat Jacob, and Jacob begat Judah and his brethren.”

Figure 29

Figure 16. Assignment on VOLX in Matt. 1:2 (Abraham begat Isaac; and Isaac begat Jacob; and Jacob begat Judah and his brethren).