Thinking and Coding in Parallel

Richard Ansorge

doi:10.1017/9781108855273.003

2 - Thinking and Coding in Parallel

Published online by Cambridge University Press: 04 May 2022

Richard Ansorge

Show author details

Richard Ansorge: Affiliation:
University of Cambridge

Book contents

Get access

Summary

Chapter 2 gives a more formal account of the ideas introduced in Chapter 1. We discuss the requirements for writing CUDA kernel code and explain the syntax in detail. We encourage the reader to start thinking in parallel by introducing some key coding ideas including methods for summing a large number of values in parallel for so-called reduction operations. This chapter also introduces GPU shared memory, illustrated with a tiled matrix multiplication example. We demonstrate how the __restrict keyword applied to kernel pointer arguments can speed up your code. In some sense this is our most conventional chapter for a book on CUDA, and the reduction operation is revisited in a number of later chapters to help introduce new CUDA features. However, many of our other examples go well beyond what you can find elsewhere.

Keywords

SIMD SIMT CUDA kernels thread grids reduction matrix multiplication

Type: Chapter
Information: Programming in Parallel with CUDA
A Practical Guide
, pp. 22 - 71

DOI: https://doi.org/10.1017/9781108855273.003 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

2 - Thinking and Coding in Parallel

Summary

Keywords

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive