Book contents
- Frontmatter
- Dedication
- Contents
- Figures
- Tables
- Examples
- Preface
- 1 Introduction to GPU Kernels and Hardware
- 2 Thinking and Coding in Parallel
- 3 Warps and Cooperative Groups
- 4 Parallel Stencils
- 5 Textures
- 6 Monte Carlo Applications
- 7 Concurrency Using CUDA Streams and Events
- 8 Application to PET Scanners
- 9 Scaling Up
- 10 Tools for Profiling and Debugging
- 11 Tensor Cores
- Appendix A A Brief History of CUDA
- Appendix B Atomic Operations
- Appendix C The NVCC Compiler
- Appendix D AVX and the Intel Compiler
- Appendix E Number Formats
- Appendix F CUDA Documentation and Libraries
- Appendix G The CX Header Files
- Appendix H AI and Python
- Appendix I Topics in C++
- Index
11 - Tensor Cores
Published online by Cambridge University Press: 04 May 2022
- Frontmatter
- Dedication
- Contents
- Figures
- Tables
- Examples
- Preface
- 1 Introduction to GPU Kernels and Hardware
- 2 Thinking and Coding in Parallel
- 3 Warps and Cooperative Groups
- 4 Parallel Stencils
- 5 Textures
- 6 Monte Carlo Applications
- 7 Concurrency Using CUDA Streams and Events
- 8 Application to PET Scanners
- 9 Scaling Up
- 10 Tools for Profiling and Debugging
- 11 Tensor Cores
- Appendix A A Brief History of CUDA
- Appendix B Atomic Operations
- Appendix C The NVCC Compiler
- Appendix D AVX and the Intel Compiler
- Appendix E Number Formats
- Appendix F CUDA Documentation and Libraries
- Appendix G The CX Header Files
- Appendix H AI and Python
- Appendix I Topics in C++
- Index
Summary
This chapter discusses the tensor core hardware available on newer GPUs. This hardware is designed to perform fast mixed precision matrix multiplications and is intended for applications in AI.However, CUDA exposes their use to programmers with the warp matrix function library. These functions support tiled matrix multiplication using 16 × 16 tiles.We provide examples of their use to improve on the early matrix multiplication example in Chapter 2.We also show how reduction operations can be performed using tensor codes as a potential non-AI application.
- Type
- Chapter
- Information
- Programming in Parallel with CUDAA Practical Guide, pp. 358 - 372Publisher: Cambridge University PressPrint publication year: 2022