Introduction
Arabic is a major world language, spoken not only in the Arabian peninsula, but by hundreds of millions of people across northern Africa and western Asia, and more broadly around the world. Corpus linguistics – the analysis of very large amounts of natural language data using computer-assisted methods and techniques – is a major methodology in modern linguistics. Yet, so far, relatively few studies have attempted to apply this major methodology to this major language. We may say, then, that Arabic corpus linguistics as a research endeavour is still in its infancy.
This volume represents an attempt by its authors and editors to help foster its development by bringing together cutting-edge contributions on the data, methods and research foci of this nascent field. Our aim is not merely to place on record present work of this kind but also, we hope, to showcase the intersection of Arabic linguistics and corpus-based methods in such a way as to inspire future work in the area. We feel strongly that this book represents the starting-point for major developments still to come in Arabic corpus linguistics.
Our goal in this introductory chapter is to set the scene for the contributions to follow in the remainder of the book. In doing so, we have attempted to address the perspectives of three main groups of readers who we anticipate will find this book of interest. Researchers and students working in Arabic corpus linguistics are only the first of these groups. We also address here (1) corpus linguists (or those in allied fields such as computational linguistics, natural language processing, or digital humanities) who have little experience of working with Arabic; and (2) Arabic linguists with little experience of corpus methods.
With this in mind, our scene-setting necessarily involves a brief introduction to corpus linguistics on the one hand, and Arabic linguistics on the other. The next section addresses the latter of these goals, and sketches in outline those features of the Arabic language which are most important as background for an understanding of the various chapters in this book. As part of this, we will introduce the transliteration scheme used throughout the volume.