Book contents
- Frontmatter
- Contents
- List of Figures
- List of Tables
- Preface
- 1 Introduction
- 2 The Perceptron
- 3 Logistic Regression
- 4 Implementing Text Classification Using Perceptron and Logistic Regression
- 5 Feed-Forward Neural Networks
- 6 Best Practices in Deep Learning
- 7 Implementing Text Classification with Feed-Forward Networks
- 8 Distributional Hypothesis and Representation Learning
- 9 Implementing Text Classification Using Word Embeddings
- 10 Recurrent Neural Networks
- 11 Implementing Part-of-Speech Tagging Using Recurrent Neural Networks
- 12 Contextualized Embeddings and Transformer Networks
- 13 Using Transformers with the Hugging Face Library
- 14 Encoder-Decoder Methods
- 15 Implementing Encoder-Decoder Methods
- 16 Neural Architectures for Natural Language Processing Applications
- Appendix A Overview of the Python Language and Key Libraries
- Appendix B Character Encodings: ASCII and Unicode
- References
- Index
12 - Contextualized Embeddings and Transformer Networks
Published online by Cambridge University Press: 01 February 2024
- Frontmatter
- Contents
- List of Figures
- List of Tables
- Preface
- 1 Introduction
- 2 The Perceptron
- 3 Logistic Regression
- 4 Implementing Text Classification Using Perceptron and Logistic Regression
- 5 Feed-Forward Neural Networks
- 6 Best Practices in Deep Learning
- 7 Implementing Text Classification with Feed-Forward Networks
- 8 Distributional Hypothesis and Representation Learning
- 9 Implementing Text Classification Using Word Embeddings
- 10 Recurrent Neural Networks
- 11 Implementing Part-of-Speech Tagging Using Recurrent Neural Networks
- 12 Contextualized Embeddings and Transformer Networks
- 13 Using Transformers with the Hugging Face Library
- 14 Encoder-Decoder Methods
- 15 Implementing Encoder-Decoder Methods
- 16 Neural Architectures for Natural Language Processing Applications
- Appendix A Overview of the Python Language and Key Libraries
- Appendix B Character Encodings: ASCII and Unicode
- References
- Index
Summary
As mentioned in Chapter 8, the distributional similarity algorithms discussed there conflate all senses of a word into a single numerical representation (or embedding). For example, the word bank receives a single representation, regardless of its financial (e.g., as in the bank gives out loans) or geological (e.g., bank of the river) sense. This chapter introduces a solution for this limitation in the form of a new neural architecture called transformer networks, which learns contextualized embeddings of words, which, as the name indicates, change depending on the context in which the words appear. That is, the word bank receives a different numerical representation for each of its instances in the two texts above because the contexts in which they occur are different. We also discuss several architectural choices that enabled the tremendous success of transformer networks: self attention, multiple heads, stacking of multiple layers, and subword tokenization, as well as how transformers can be pretrained on large amounts of data through through masked language modeling and next-sentence prediction.
- Type
- Chapter
- Information
- Deep Learning for Natural Language ProcessingA Gentle Introduction, pp. 178 - 193Publisher: Cambridge University PressPrint publication year: 2024