We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Language is the natural currency of most social communication. Until the emergence of more powerful computational methods, it simply was not feasible to measure its use in mainline social psychology. We now know that language can reveal behavioral evidence of mental states and personality traits, as well as clues to the future behavior of individuals and groups. In this chapter, we first review the history of language research in social personality psychology. We then survey the main methods for deriving psychological insights from language (ranging from data-driven to theory-driven, naturalistic to experimental, qualitative to quantitative, holistic to granular, and transparent to opaque) and describe illustrative examples of findings from each approach. Finally, we present our view of the new capabilities, real-world applications, and ethical and psychometric quagmires on the horizon as language research continues to evolve in the future.
This chapter surveys the history and main directions of natural language processing research in general, and for Slavic languages in particular. The field has grown enormously since its beginning. Especially since 2010, the amount of digital texts has been rapidly growing; furthermore, research has yielded an ever-greater number of highly usable applications. This is reflected in the increasing number and attendance of NLP conferences and workshops. Slavic countries are no exception; several have been organising international conferences for decades, and their proceedings are the best place to find publications on Slavic NLP research. The general trend of the evolution of NLP is difficult to predict. It is certain that deep learning, including various new types (e.g. contextual, multilingual) of word embeddings and similar ‘deep’ models will play an increasing role, while predictions also mention the increasing importance of the Universal Dependencies framework and treebanks and research into the theory, not only the practice, of deep learning, coupled with attempts at achieving better explainability of the resulting models.
The introduction to this volume describes its content. It also provides the rationale for including selected topics and provides comments on the manner of presentation adopted in this volume.
The linguistic study of the Slavic language family, with its rich syntactic and phonological structures, complex writing systems, and diverse socio-historical context, is a rapidly growing research area. Bringing together contributions from an international team of authors, this Handbook provides a systematic review of cutting-edge research in Slavic linguistics. It covers phonetics and phonology, morphology and syntax, lexicology, and sociolinguistics, and presents multiple theoretical perspectives, including synchronic and diachronic. Each chapter addresses a particular linguistic feature pertinent to Slavic languages, and covers the development of the feature from Proto-Slavic to present-day Slavic languages, the main findings in historical and ongoing research devoted to the feature, and a summary of the current state of the art in the field and what the directions of future research will be. Comprehensive yet accessible, it is essential reading for academic researchers and students in theoretical linguistics, linguistic typology, sociolinguistics and Slavic/East European Studies.
In this chapter, a case is made for the inclusion of computational approaches to linguistics within the theoretical fold. Computational models aimed at application are a special case of predictive models. The status quo in the philosophy of linguistics is that explanation is scientifically prior to prediction. This is a mistake. Once corrected, the theoretical place of prediction is restored and, with it, computational models of language. The chapter first describes the history behind the emergence of explanation over prediction views in the general philosophy of science. It’s then suggested that this post-positivist intellectual milieu influenced the rejection of computational linguistics in the philosophy of theoretical linguistics. A case study of the predictive power already embedded in contemporary linguistic theory is presented through some work on negative polarity items. The discussion moves to the competence–performance divide informed by the so-called Galilean style in linguistics that retains the explanatory over prediction ideal. In the final sections of the chapter, continuous methods, such as probabilistic linguistics, are used to showcase the explanatory and predictive possibilities of nondiscrete approaches, before a discussion of the contemporary field of deep learning in natural language processing (NLP), where these predictive possibilities are further amplified.
What is the remit of theoretical linguistics? How are human languages different from animal calls or artificial languages? What philosophical insights about language can be gleaned from phonology, pragmatics, probabilistic linguistics, and deep learning? This book addresses the current philosophical issues at the heart of theoretical linguistics, which are widely debated not only by linguists, but also philosophers, psychologists, and computer scientists. It delves into hitherto uncharted territory, putting philosophy in direct conversation with phonology, sign language studies, supersemantics, computational linguistics, and language evolution. A range of theoretical positions are covered, from optimality theory and autosegmental phonology to generative syntax, dynamic semantics, and natural language processing with deep learning techniques. By both unwinding the complexities of natural language and delving into the nature of the science that studies it, this book ultimately improves our tools of discovery aimed at one of the most essential features of our humanity, our language.
Distributional semantic representations were used to investigate crossmodal correspondences within language, offering a comprehensive analysis of how sensory experiences interconnect in linguistic constructs. By computing semantic proximity between words from different sensory modalities, a crossmodal semantic network was constructed, providing a general view of crossmodal correspondences in the English language. Community detection techniques were applied to unveil domains of experience where crossmodal correspondences were likely to manifest, while also considering the role of affective dimensions in shaping these domains. The study revealed the existence of an architecture of structured domains of experience in language, whereby crossmodal correspondences are deeply embedded. The present research highlights the roles of emotion and statistical associations in the organization of sensory concepts across modalities in language. The domains identified, including food, the body, the physical world and emotions/values, underscored the intricate interplay between the senses, emotion and semantic patterns. These findings align with the embodied lexicon hypothesis and the semantic coding hypothesis, emphasizing the capacity of language to capture and reflect crossmodal correspondences’ emotional and perceptual subtleties in the form of networks, while also revealing opportunities for further perceptual research on crossmodal correspondences and multisensory integration.
This chapter investigates why Pāṇini has been called the world’s first computational linguist and situates Pāṇinian grammar within the context of formal language theory and issues of generative capacity, which have been of huge importance in the development of modern linguistic theory since Chomsky's earliest work. Existing claims regarding the generative capacity of Pāṇini’s system are shown to be inadequate, and a new assessment of the power of the Aṣṭādhyāyī is provided.
Why do successive education reforms within a country resonate with familiar assumptions about educational goals, society, class, and state, even at moments of radical change? Repeating cultural narratives sustain continuities within institutional change processes, by influencing how new ideas are interpreted, how interest groups express preferences, and how institutional norms shape political processes. Repeating narratives make it more likely for some types of reforms to be implemented and sustained than others. This chapter develops a theoretical model suggesting how cultural narratives are transmitted across time and an empirical method for assessing cross-national differences in cultural narratives. Each country has a distinctive “cultural constraint,” or a set of cultural symbols and narratives, that appears in a nation’s literary corpus. Writers collectively contribute to this body of cultural tropes; despite individual fluctuations, they largely reproduce the master narratives of their countries. Computational linguistic processes allow us to observe empirical differences between British and Danish cultural depictions of education in 1,084 works of fiction from 1700 to 1920. Cultural narratives do not determine specific outcomes, as tropes must be activated in political struggles. Yet we can show how significant cross-national differences in literary images of education resonate with British and Danish educational trajectories.
This chapter introduces our global dataset of autocratic propaganda, which contains over eight million articles from 65 newspapers drawn from 59 countries in six major languages. By population, our dataset encompasses a set of countries that represents 88\% of all people who live under autocracy. After collecting this propaganda, we measured its content. We employ computational techniques to identify the topics of each article; count the number of references in each article to the autocrat, ruling party, and opposition; and measure the valence of propaganda with dictionary based semantic analysis. The key idea is that some words have an intrinsic positive or negative sentiment. This conception of propaganda -- as spin, not lies -- accords with how scholars and practitioners have long understood it. As a baseline for comparison, our dataset includes state-affiliated newspapers from democracies. To scale our measures of propaganda, we develop a Fox News Index: how Fox News covers Republicans relative to Democrats.
This study investigates the use of metaphor in the dissociative disorder depersonalization/derealization – the feeling of unreality or detachment from the senses or surrounding events. While the debilitating experience of depersonalization/derealization is prevalent, it is also under-acknowledged, such that it is often expressed through metaphor, with more typical metaphor described in diagnostic criteria. Using naturally occurring text from two prominent English language depersonalization/derealization support fora, in the current study a systematic survey is made of metaphor to communicate the experience of depersonalization/derealization in context. It is concluded that metaphor described in the formal diagnostic criteria for depersonalization/derealization does not completely represent metaphor use in the contexts investigated. A summary is made of metaphor for the experiences of depersonalization, and derealization, and depersonalization/derealization more generally, across both the contexts investigated, that may support vital understanding and diagnosis of this debilitating, under-recognized experience, across a wider demographic.
Finnish nonfinite clauses constitute a complex grammatical class with a seemingly chaotic mix of verbal and nominal properties. Thirteen nonfinite constructions, their selection, control, thematic role assignment, nonfinite agreement, embedded subjects, and syntactic status were targeted for analysis. An analysis is proposed which derives their syntactic and semantic properties by relying on a computational model of human information processing. The model analyzes Finnish nonfinite constructions as truncated clauses with one functional layer above the verb phrase. Research methods from naturalistic cognitive science and computational linguistics are considered as potentially useful tools for linguistics.
Authorship analysis is the process of determining who produced a questioned text by language analysis. Although there has been significant success in the performance of computational methods to solve this problem in recent years, these are often methods that are not amenable to interpretation. Authorship analysis is in all effects an area of computer science with very little linguistics or cognitive science. This Element introduces a Theory of Linguistic Individuality that, starting from basic notions of cognitive linguistics, establishes a formal framework for the mathematical modelling of language processing that is then applied to three computational experiments, including using the likelihood ratio framework. The results propose new avenues of research and a change of perspective in the way authorship analysis is currently carried out.
In this article, we report a large-scale corpus study aimed at tackling the (controversial) question to what extent the European national varieties of Dutch, that is, Belgian and Netherlandic Dutch, exhibit morpho-syntactic differences. Instead of relying on a manual selection of cases of morphosyntactic variation, we first marshal large bilingual parallel corpora and machine translation software to identify semiautomatically, in an extensively data-driven fashion, loci of variation from various “corners” of Dutch grammar. We then gauge the distribution of con-structional alternatives in a nationally as well as stylistically stratified corpus for a representative selection of twenty alternation patterns. We find that natiolectal variation in the grammar of Dutch is far more prevalent than often assumed, especially in less edited text types, and that it shows up in inflection phenomena, lexically conditioned syntactic variation, and pure word order permutations. Another key finding is that many cases of synchronic probabilistic asymmetries reflect a diachronic difference between the two varieties: Netherlandic Dutch often tends to be ahead in cases of ongoing grammatical change, with Belgian Dutch holding on somewhat longer to obsolescent features of the grammar.*
Edited by
Mary S. Morgan, London School of Economics and Political Science,Kim M. Hajek, London School of Economics and Political Science,Dominic J. Berry, London School of Economics and Political Science
The distinction that has become standard between natural language and formal language, which rests on differentiating what is socially evolved and experiential from what is purposefully planned, suggests that a similar emphasis on experientiality may illuminate the distinction between narrative and formal modes of knowing, which figures prominently in this volume. Support for that perspective comes from developments in both narratology and computational linguistics. A key concept from both specialties – and for this volume – is that of ‘scripts’, which indicates how even texts that are explicitly formal may be understood as narratives by experienced readers. An explicit example that illuminates these themes comes from James Clerk Maxwell’s classic paper ‘On Faraday’s Lines of Force’. It juxtaposes narrative and formal modes of representation and displays their relative advantages, suggesting that the development of scientific knowledge often depends on continual feedback between natural narrative and formal analysis.
Corpus analysis can be expanded and scaled up by incorporating computational methods from natural language processing. This Element shows how text classification and text similarity models can extend our ability to undertake corpus linguistics across very large corpora. These computational methods are becoming increasingly important as corpora grow too large for more traditional types of linguistic analysis. We draw on five case studies to show how and why to use computational methods, ranging from usage-based grammar to authorship analysis to using social media for corpus-based sociolinguistics. Each section is accompanied by an interactive code notebook that shows how to implement the analysis in Python. A stand-alone Python package is also available to help readers use these methods with their own data. Because large-scale analysis introduces new ethical problems, this Element pairs each new methodology with a discussion of potential ethical implications.
Text contains a wealth of information about about a wide variety of sociocultural constructs. Automated prediction methods can infer these quantities (sentiment analysis is probably the most well-known application). However, there is virtually no limit to the kinds of things we can predict from text: power, trust, misogyny, are all signaled in language. These algorithms easily scale to corpus sizes infeasible for manual analysis. Prediction algorithms have become steadily more powerful, especially with the advent of neural network methods. However, applying these techniques usually requires profound programming knowledge and machine learning expertise. As a result, many social scientists do not apply them. This Element provides the working social scientist with an overview of the most common methods for text classification, an intuition of their applicability, and Python code to execute them. It covers both the ethical foundations of such work as well as the emerging potential of neural network methods.
Event structures are central in Linguistics and Artificial Intelligence research: people can easily refer to changes in the world, identify their participants, distinguish relevant information, and have expectations of what can happen next. Part of this process is based on mechanisms similar to narratives, which are at the heart of information sharing. But it remains difficult to automatically detect events or automatically construct stories from such event representations. This book explores how to handle today's massive news streams and provides multidimensional, multimodal, and distributed approaches, like automated deep learning, to capture events and narrative structures involved in a 'story'. This overview of the current state-of-the-art on event extraction, temporal and casual relations, and storyline extraction aims to establish a new multidisciplinary research community with a common terminology and research agenda. Graduate students and researchers in natural language processing, computational linguistics, and media studies will benefit from this book.
This chapter presents an easily followed overview of computational linguistics and where Arabic fits into it. Computational linguistics, often referred to interchangeably as natural language processing (NLP) or human language technologies, is a large and growing interdisciplinary field of research that lies at the intersection of linguistics, computer science, electrical engineering, cognitive science, psychology, pedagogy, and mathematics, among other fields. Research and work on Arabic computational linguistics has lagged behind English and other languages. This is despite a tremendous increase in the relative growth of Arabic NLP in the period between 2012 and 2016. The reason for its slow start is that Arabic presents a series of difficulties to programmers, those being morphological richness, orthographic ambiguity, dialectal variations, orthographic noise, and resource poverty. Those problems have been or are being overcome, and a new generation of researchers has made great strides in the field. This has partly to do with the growing interest in language technologies for opinion mining and translation in social media, which features dialectal Arabic more than MSA. Another motivation is that commercial giants like Apple and Google are interested in applications of Arabic as it is spoken.
The authors examine the application of electronically searchable corpora, from their own experience, in addressing questions pertinent to linguistics as a whole and to matters internal to Arabic, the while lamenting that the field of Arabic linguistics, in its theoretical and applied orientations alike, has not made use of the rich data source that searchable electronic corpora represent. They show how corpora can be used easily to falsify common assumptions and assertions about the human language capacity in general just as they can be used efficiently to query assumptions and assertions about Arabic itself. So, too, do they hold implications for applied uses such as teaching Arabic as a foreign language and translation between Arabic and other languages. In any of these applications, the use of corpora in the analysis of all varieties of Arabic remains underdeveloped compared to their use in the analysis of other languages, especially English.