Book contents
- Frontmatter
- Dedication
- Contents
- Figures
- Tables
- Examples
- Preface
- 1 Introduction to GPU Kernels and Hardware
- 2 Thinking and Coding in Parallel
- 3 Warps and Cooperative Groups
- 4 Parallel Stencils
- 5 Textures
- 6 Monte Carlo Applications
- 7 Concurrency Using CUDA Streams and Events
- 8 Application to PET Scanners
- 9 Scaling Up
- 10 Tools for Profiling and Debugging
- 11 Tensor Cores
- Appendix A A Brief History of CUDA
- Appendix B Atomic Operations
- Appendix C The NVCC Compiler
- Appendix D AVX and the Intel Compiler
- Appendix E Number Formats
- Appendix F CUDA Documentation and Libraries
- Appendix G The CX Header Files
- Appendix H AI and Python
- Appendix I Topics in C++
- Index
4 - Parallel Stencils
Published online by Cambridge University Press: 04 May 2022
- Frontmatter
- Dedication
- Contents
- Figures
- Tables
- Examples
- Preface
- 1 Introduction to GPU Kernels and Hardware
- 2 Thinking and Coding in Parallel
- 3 Warps and Cooperative Groups
- 4 Parallel Stencils
- 5 Textures
- 6 Monte Carlo Applications
- 7 Concurrency Using CUDA Streams and Events
- 8 Application to PET Scanners
- 9 Scaling Up
- 10 Tools for Profiling and Debugging
- 11 Tensor Cores
- Appendix A A Brief History of CUDA
- Appendix B Atomic Operations
- Appendix C The NVCC Compiler
- Appendix D AVX and the Intel Compiler
- Appendix E Number Formats
- Appendix F CUDA Documentation and Libraries
- Appendix G The CX Header Files
- Appendix H AI and Python
- Appendix I Topics in C++
- Index
Summary
The solution of partial differential equations in two and three-dimensions using stencil iteration (Jacobi’s method) is discussed and illustrated for Laplace’s equation. A very simple kernel gives about a factor of 100 speed-up compared to the host CPU.The very slow convergence of the Jacobi method can be addressed by using solutions on lower resolution grids to initialise higher resolution grids. A convergence check using the maximum change per iteration is also illustrated. Digital image processing is another example of stencil use and a number of digital image filters are shown including the Sobel filter for edge finding and the median filter for noise reduction. The fast GPU-based median filter uses one thread per image pixel and is implemented using an optimal Batcher network.
- Type
- Chapter
- Information
- Programming in Parallel with CUDAA Practical Guide, pp. 106 - 141Publisher: Cambridge University PressPrint publication year: 2022