Hostname: page-component-cd9895bd7-fscjk Total loading time: 0 Render date: 2024-12-27T01:28:55.430Z Has data issue: false hasContentIssue false

Mixed precision algorithms in numerical linear algebra

Published online by Cambridge University Press:  09 June 2022

Nicholas J. Higham
Affiliation:
Department of Mathematics, University of Manchester, Manchester, M13 9PL, UK E-mail: [email protected]
Theo Mary
Affiliation:
Sorbonne Université, CNRS, LIP6, Paris, F-75005, France E-mail: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Today’s floating-point arithmetic landscape is broader than ever. While scientific computing has traditionally used single precision and double precision floating-point arithmetics, half precision is increasingly available in hardware and quadruple precision is supported in software. Lower precision arithmetic brings increased speed and reduced communication and energy costs, but it produces results of correspondingly low accuracy. Higher precisions are more expensive but can potentially provide great benefits, even if used sparingly. A variety of mixed precision algorithms have been developed that combine the superior performance of lower precisions with the better accuracy of higher precisions. Some of these algorithms aim to provide results of the same quality as algorithms running in a fixed precision but at a much lower cost; others use a little higher precision to improve the accuracy of an algorithm. This survey treats a broad range of mixed precision algorithms in numerical linear algebra, both direct and iterative, for problems including matrix multiplication, matrix factorization, linear systems, least squares, eigenvalue decomposition and singular value decomposition. We identify key algorithmic ideas, such as iterative refinement, adapting the precision to the data, and exploiting mixed precision block fused multiply–add operations. We also describe the possible performance benefits and explain what is known about the numerical stability of the algorithms. This survey should be useful to a wide community of researchers and practitioners who wish to develop or benefit from mixed precision numerical linear algebra algorithms.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press

References

Abdelfattah, A., Anzt, H., Boman, E. G., Carson, E., Cojean, T., Dongarra, J., Fox, A., Gates, M., Higham, N. J., Li, X. S., Loe, J., Luszczek, P., Pranesh, S., Rajamanickam, S., Ribizel, T., Smith, B. F., Swirydowicz, K., Thomas, S., Tomov, S., Tsai, Y. M. and Yang, U. M. (2021a), A survey of numerical linear algebra methods utilizing mixed-precision arithmetic, Int. J. High Perform. Comput. Appl. 35, 344369.CrossRefGoogle Scholar
Abdelfattah, A., Costa, T., Dongarra, J., Gates, M., Haidar, A., Hammarling, S., Higham, N. J., Kurzak, J., Luszczek, P., Tomov, S. and Zounon, M. (2021b), A set of Batched Basic Linear Algebra Subprograms and LAPACK routines, ACM Trans. Math. Software 47, 21.CrossRefGoogle Scholar
Abdelfattah, A., Tomov, S. and Dongarra, J. (2019a), Fast batched matrix multiplication for small sizes using half-precision arithmetic on GPUs, in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 111122.CrossRefGoogle Scholar
Abdelfattah, A., Tomov, S. and Dongarra, J. (2019b), Towards half-precision computation for complex matrices: A case study for mixed-precision solvers on GPUs, in 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), IEEE, pp. 1724.CrossRefGoogle Scholar
Abdelfattah, A., Tomov, S. and Dongarra, J. (2020), Investigating the benefit of FP16-enabled mixed-precision solvers for symmetric positive definite matrices using GPUs, in Computational Science – ICCS 2020 (Krzhizhanovskaya, V. V. et al., eds), Vol. 12138 of Lecture Notes in Computer Science, Springer, pp. 237250.CrossRefGoogle Scholar
Abdulah, S., Cao, Q., Pei, Y., Bosilca, G., Dongarra, J., Genton, M. G., Keyes, D. E., Ltaief, H. and Sun, Y. (2022), Accelerating geostatistical modeling and prediction with mixed-precision computations: A high-productivity approach with PaRSEC, IEEE Trans. Parallel Distrib. Syst. 33, 964976.CrossRefGoogle Scholar
Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G. and Keyes, D. E. (2019), Geostatistical modeling and prediction using mixed precision tile Cholesky factorization, in 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC), IEEE, pp. 152162.CrossRefGoogle Scholar
Agullo, E., Cappello, F., Di, S., Giraud, L., Liang, X. and Schenkels, N. (2020), Exploring variable accuracy storage through lossy compression techniques in numerical linear algebra: A first application to flexible GMRES. Research report RR-9342, Inria Bordeaux Sud-Ouest. Available at hal-02572910v2.Google Scholar
Ahmad, K., Sundar, H. and Hall, M. (2019), Data-driven mixed precision sparse matrix vector multiplication for GPUs, ACM Trans. Archit. Code Optim. 16, 51.Google Scholar
Al-Mohy, A. H., Higham, N. J. and Liu, X. (2022), Arbitrary precision algorithms for computing the matrix cosine and its Fréchet derivative, SIAM J. Matrix Anal. Appl. 43, 233256.CrossRefGoogle Scholar
Aliaga, J. I., Anzt, H., Grützmacher, T., Quintana-Ortí, E. S. and Tomás, A. E. (2020), Compressed basis GMRES on high performance GPUs. Available at arXiv:2009.12101.Google Scholar
Alvermann, A., Basermann, A., Bungartz, H.-J., Carbogno, C., Ernst, D., Fehske, H., Futamura, Y., Galgon, M., Hager, G., Huber, S., Huckle, T., Ida, A., Imakura, A., Kawai, M., Köcher, S., Kreutzer, M., Kus, P., Lang, B., Lederer, H., Manin, V., Marek, A., Nakajima, K., Nemec, L., Reuter, K., Rippl, M., Röhrig-Zöllner, M., Sakurai, T., Scheffler, M., Scheurer, C., Shahzad, F., Brambila, D. Simoes, Thies, J. and Wellein, G. (2019), Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects, Japan J. Indust. Appl. Math. 36, 699717.CrossRefGoogle Scholar
Amestoy, P., Boiteau, O., Buttari, A., Gerest, M., Jézéquel, F., L’Excellent, J.Y. and Mary, T. (2021a), Mixed precision low rank approximations and their application to block low rank LU factorization. Available at hal-03251738.Google Scholar
Amestoy, P., Buttari, A., Higham, N. J., L’Excellent, J.-Y., Mary, T. and Vieublé, B. (2021b), Five-precision GMRES-based iterative refinement. MIMS EPrint 2021.5, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar
Amestoy, P., Buttari, A., Higham, N. J., L’Excellent, J.-Y., Mary, T. and Vieublé, B. (2022), Combining sparse approximate factorizations with mixed precision iterative refinement. MIMS EPrint 2022.2, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar
Amestoy, P., Buttari, A., L’Excellent, J.-Y. and Mary, T. (2019), Performance and scalability of the block low-rank multifrontal factorization on multicore architectures, ACM Trans. Math. Software 45, 2.CrossRefGoogle Scholar
Amestoy, P., Duff, I. S., L’Excellent, J.-Y. and Koster, J. (2001), A fully asynchronous multifrontal solver using distributed dynamic scheduling, SIAM J. Matrix Anal. Appl. 23, 1541.CrossRefGoogle Scholar
Anderson, E. (1991), Robust triangular solves for use in condition estimation. Technical report CS-91-142, Department of Computer Science, The University of Tennessee, Knoxville, TN, USA. LAPACK Working Note 36.Google Scholar
ANSI (1966), American National Standard FORTRAN, American National Standards Institute, New York.Google Scholar
Anzt, H., Dongarra, J. and Quintana-Ortí, E. S. (2015), Adaptive precision solvers for sparse linear systems, in Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing (E2SC ’15), ACM Press, article 2.Google Scholar
Anzt, H., Dongarra, J., Flegar, G., Higham, N. J. and Quintana-Ortí, E. S. (2019a), Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers, Concurrency Comput. Pract. Exper. 31, e4460.CrossRefGoogle Scholar
Anzt, H., Flegar, G., Grützmacher, T. and Quintana-Ortí, E. S. (2019b), Toward a modular precision ecosystem for high-performance computing, Int. J. High Perform. Comput. Appl. 33, 10691078.CrossRefGoogle Scholar
Appleyard, J. and Yokim, S. (2017), Programming tensor cores in CUDA 9. Available at https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/.Google Scholar
Arioli, M. and Duff, I. S. (2009), Using FGMRES to obtain backward stability in mixed precision, Electron. Trans. Numer. Anal. 33, 3144.Google Scholar
Arioli, M., Duff, I. S., Gratton, S. and Pralet, S. (2007), A note on GMRES preconditioned by a perturbed $\mathrm{LD}{L}^T$ decomposition with static pivoting, SIAM J. Sci. Comput. 29, 20242044.CrossRefGoogle Scholar
ARM (2018), ARM Architecture Reference Manual. ARMv8, for ARMv8-A Architecture Profile, ARM Limited, Cambridge, UK. Version dated 31 October 2018. Original release dated 30 April 2013.Google Scholar
ARM (2019), Arm A64 Instruction Set Architecture Armv8, for Armv8-A Architecture Profile, ARM Limited, Cambridge, UK.Google Scholar
ARM (2020), Arm Architecture Reference Manual. Armv8, for Armv8-A Architecture Profile, ARM Limited, Cambridge, UK. ARM DDI 0487F.b (ID040120).Google Scholar
Baboulin, M., Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Langou, J., Luszczek, P. and Tomov, S. (2009), Accelerating scientific computations with mixed precision algorithms, Comput. Phys. Comm. 180, 25262533.CrossRefGoogle Scholar
Bailey, D. H. (2021), MPFUN2020: A new thread-safe arbitrary precision package (full documentation). Available at https://www.davidhbailey.com/dhbpapers/mpfun2020.pdf.Google Scholar
Bailey, D. H., Hida, Y., Li, X. S. and Thompson, B. (2002), ARPREC: An arbitrary precision computation package. Technical report LBNL-53651, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.Google Scholar
Bauer, P., Dueben, P. D., Hoefler, T., Quintino, T., Schulthess, T. C. and Wedi, N. P. (2021), The digital revolution of earth-system science, Nature Comput. Sci. 1, 104113.CrossRefGoogle Scholar
Bezanson, J., Edelman, A., Karpinski, S. and Shah, V. B. (2017), Julia: A fresh approach to numerical computing, SIAM Rev. 59, 6598.CrossRefGoogle Scholar
Björck, Å. (1967), Iterative refinement of linear least squares solutions I, BIT 7, 257278.CrossRefGoogle Scholar
Björck, Å. (1996), Numerical Methods for Least Squares Problems, SIAM.CrossRefGoogle Scholar
Blanchard, P., Higham, N. J. and Mary, T. (2020a), A class of fast and accurate summation algorithms,, SIAM J. Sci. Comput. 42, A1541A1557.CrossRefGoogle Scholar
Blanchard, P., Higham, N. J., Lopez, F., Mary, T. and Pranesh, S. (2020b), Mixed precision block fused multiply-add: Error analysis and application to GPU tensor cores, SIAM J. Sci. Comput. 42, C124C141.CrossRefGoogle Scholar
Bouras, A. and Frayssé, V. (2005), Inexact matrix-vector products in Krylov methods for solving linear systems: A relaxation strategy, SIAM J. Matrix Anal. Appl. 26, 660678.CrossRefGoogle Scholar
Bouras, A., Frayssé, V. and Giraud, L. (2000), A relaxation strategy for inner–outer linear solvers in domain decomposition methods. Technical report TR/PA/00/17, CERFACS, Toulouse, France.Google Scholar
Brun, E., Defour, D., De Oliveira Castro, P., Iştoan, M., Mancusi, D., Petit, E. and Vaquet, A. (2021), A study of the effects and benefits of custom-precision mathematical libraries for HPC codes, IEEE Trans. Emerg. Topics Comput. 9, 14671478.CrossRefGoogle Scholar
Buttari, A., Dongarra, J., Kurzak, J., Luszczek, P. and Tomov, S. (2008), Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy, ACM Trans. Math. Software 34, 17.CrossRefGoogle Scholar
Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P. and Kurzak, J. (2007), Mixed precision iterative refinement techniques for the solution of dense linear systems, Int. J. High Perform. Comput. Appl. 21, 457466.CrossRefGoogle Scholar
Carson, E. and Higham, N. J. (2017), A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems, SIAM J. Sci. Comput. 39, A2834A2856.CrossRefGoogle Scholar
Carson, E. and Higham, N. J. (2018), Accelerating the solution of linear systems by iterative refinement in three precisions, SIAM J. Sci. Comput. 40, A817A847.CrossRefGoogle Scholar
Carson, E., Gergelits, T. and Yamazaki, I. (2022a), Mixed precision $s$ -step Lanczos and conjugate gradient algorithms, Numer. Linear Algebra Appl. 29, e2425.CrossRefGoogle Scholar
Carson, E., Higham, N. J. and Pranesh, S. (2020), Three-precision GMRES-based iterative refinement for least squares problems, SIAM J. Sci. Comput. 42, A4063A4083.CrossRefGoogle Scholar
Carson, E., Lund, K., Rozložník, M. and Thomas, S. (2022b), Block Gram–Schmidt algorithms and their stability properties, Linear Algebra Appl. 638, 150195.CrossRefGoogle Scholar
Charara, A., Gates, M., Kurzak, J., YarKhan, A. and Dongarra, J. (2020), SLATE developers’ guide. SLATE Working Note 11, Innovative Computing Laboratory, The University of Tennessee, Knoxville, TN, US.Google Scholar
Choquette, J., Gandhi, W., Giroux, O., Stam, N. and Krashinsky, R. (2021), NVIDIA A100 tensor core GPU: Performance and innovation, IEEE Micro 41, 2935.CrossRefGoogle Scholar
Clark, M. A., Babich, R., Barros, K., Brower, R. C. and Rebbi, C. (2010), Solving lattice QCD systems of equations using mixed precision solvers on GPUs, Comput. Phys. Comm. 181, 15171528.CrossRefGoogle Scholar
Connolly, M. P. and Higham, N. J. (2022), Probabilistic rounding error analysis of Householder QR factorization. MIMS EPrint 2022.5, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar
Connolly, M. P., Higham, N. J. and Mary, T. (2021), Stochastic rounding and its probabilistic backward error analysis, SIAM J. Sci. Comput. 43, A566A585.CrossRefGoogle Scholar
Courbariaux, M., Bengio, Y. and David, J.-P. (2015), Training deep neural networks with low precision multiplications. Available at arXiv:1412.7024v5.Google Scholar
Croarken, M. G. (1985), The centralization of scientific computation in Britain 1925–1955. PhD thesis, University of Warwick, Coventry, UK.Google Scholar
Croci, M., Fasi, M., Higham, N. J., Mary, T. and Mikaitis, M. (2022), Stochastic rounding: Implementation, error analysis, and applications, Roy. Soc. Open Sci. 9, 125.Google Scholar
Davies, P. I., Higham, N. J. and Tisseur, F. (2001), Analysis of the Cholesky method with iterative refinement for solving the symmetric definite generalized eigenproblem, SIAM J. Matrix Anal. Appl. 23, 472493.CrossRefGoogle Scholar
Davis, T. A. and Hu, Y. (2011), The University of Florida Sparse Matrix Collection, ACM Trans. Math. Software 38, 1.Google Scholar
Dawson, A. and Düben, P. D. (2017), rpe v5: An emulator for reduced floating-point precision in large numerical simulations, Geosci. Model Dev. 10, 22212230.CrossRefGoogle Scholar
Dawson, A., Düben, P. D., MacLeod, D. A. and Palmer, T. N. (2018), Reliable low precision simulations in land surface models, Climate Dynam. 51, 26572666.CrossRefGoogle Scholar
Dean, J. (2020), The deep learning revolution and its implications for computer architecture and chip design, in 2020 IEEE International Solid-State Circuits Conference (ISSCC), IEEE, pp. 814.CrossRefGoogle Scholar
Demmel, J. and Hida, Y. (2004), Accurate and efficient floating point summation, SIAM J. Sci. Comput. 25, 12141248.CrossRefGoogle Scholar
Demmel, J. and Li, X. (1994), Faster numerical algorithms via exception handling, IEEE Trans. Comput. 43, 983992.CrossRefGoogle Scholar
Demmel, J., Hida, Y., Riedy, E. J. and Li, X. S. (2009), Extra-precise iterative refinement for overdetermined least squares problems, ACM Trans. Math. Software 35, 28.CrossRefGoogle Scholar
Dennis, J. E. Jr and Schnabel, R. B. (1983), Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall. Reprinted by SIAM, 1996.Google Scholar
Di, S. and Cappello, F. (2016), Fast error-bounded lossy HPC data compression with SZ, in 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 730739.CrossRefGoogle Scholar
Diffenderfer, J., Osei-Kuffuor, D. and Menon, H. (2021), QDOT: Quantized dot product kernel for approximate high-performance computing. Available at arXiv:2105.00115.Google Scholar
Dongarra, J. J. (1980), Improving the accuracy of computed matrix eigenvalues. Preprint ANL-80-84, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA.Google Scholar
Dongarra, J. J. (1982), Algorithm 589 SICEDR: A FORTRAN subroutine for improving the accuracy of computed matrix eigenvalues, ACM Trans. Math. Software 8, 371375.CrossRefGoogle Scholar
Dongarra, J. J. (1983), Improving the accuracy of computed singular values, SIAM J. Sci. Statist. Comput. 4, 712719.CrossRefGoogle Scholar
Dongarra, J. J. (2020), Report on the Fujitsu Fugaku system. Technical report ICL-UT-20-06, Innovative Computing Laboratory, The University of Tennessee, Knoxville, TN, USA.Google Scholar
Dongarra, J. J., Bunch, J. R., Moler, C. B. and Stewart, G. W. (1979), LINPACK Users’ Guide, SIAM.CrossRefGoogle Scholar
Dongarra, J. J., Moler, C. B. and Wilkinson, J. H. (1983), Improving the accuracy of computed eigenvalues and eigenvectors, SIAM J. Numer. Anal. 20, 2345.CrossRefGoogle Scholar
Doucet, N., Ltaief, H., Gratadour, D. and Keyes, D. (2019), Mixed-precision tomographic reconstructor computations on hardware accelerators, in 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3), IEEE, pp. 3138.Google Scholar
Düben, P. D., Subramanian, A., Dawson, A. and Palmer, T. N. (2017), A study of reduced numerical precision to make superparameterization more competitive using a hardware emulator in the OpenIFS model, J. Adv. Model. Earth Syst. 9, 566584.CrossRefGoogle Scholar
Duff, I. S. and Pralet, S. (2007), Towards stable mixed pivoting strategies for the sequential and parallel solution of sparse symmetric indefinite systems, SIAM J. Matrix Anal. Appl. 29, 10071024.CrossRefGoogle Scholar
Duff, I. S., Erisman, A. M. and Reid, J. K. (2017), Direct Methods for Sparse Matrices, second edition, Oxford University Press.CrossRefGoogle Scholar
Emans, M. and van der Meer, A. (2012), Mixed-precision AMG as linear equation solver for definite systems, Procedia Comput. Sci. 1, 175183.CrossRefGoogle Scholar
Fasi, M. and Higham, N. J. (2018), Multiprecision algorithms for computing the matrix logarithm, SIAM J. Matrix Anal. Appl. 39, 472491.Google Scholar
Fasi, M. and Higham, N. J. (2019), An arbitrary precision scaling and squaring algorithm for the matrix exponential, SIAM J. Matrix Anal. Appl. 40, 12331256.CrossRefGoogle Scholar
Fasi, M. and Higham, N. J. (2021), Matrices with tunable infinity-norm condition number and no need for pivoting in LU factorization, SIAM J. Matrix Anal. Appl. 42, 417435.CrossRefGoogle Scholar
Fasi, M. and Mikaitis, M. (2020), CPFloat: A C library for emulating low-precision arithmetic. MIMS EPrint 2020.22, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar
Fasi, M., Higham, N. J., Lopez, F., Mary, T. and Mikaitis, M. (2022), Matrix multiplication in multiword arithmetic: Error analysis and application to GPU tensor cores. MIMS EPrint 2022.3, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar
Fasi, M., Higham, N. J., Mikaitis, M. and Pranesh, S. (2021), Numerical behavior of NVIDIA tensor cores, PeerJ Comput. Sci. 7, e330.CrossRefGoogle ScholarPubMed
Flegar, G., Anzt, H., Cojean, T. and Quintana-Ortí, E. S. (2021), Adaptive precision block-Jacobi for high performance preconditioning in the Ginkgo linear algebra software, ACM Trans. Math. Software 47, 128.CrossRefGoogle Scholar
Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P. and Zimmermann, P. (2007), MPFR: A multiple-precision binary floating-point library with correct rounding, ACM Trans. Math. Software 33, 13.CrossRefGoogle Scholar
Fox, L., Huskey, H. D. and Wilkinson, J. H. (1948), The solution of algebraic linear simultaneous equations by punched card methods. Report, Mathematics Division, Department of Scientific and Industrial Research, National Physical Laboratory, Teddington, UK.CrossRefGoogle Scholar
Fukaya, T., Kannan, R., Nakatsukasa, Y., Yamamoto, Y. and Yanagisawa, Y. (2020), Shifted Cholesky QR for computing the QR factorization of ill-conditioned matrices, SIAM J. Sci. Comput. 42, A477A503.CrossRefGoogle Scholar
Gao, J., Zheng, F., Qi, F., Ding, Y., Li, H., Lu, H., He, W., Wei, H., Jin, L., Liu, X., Gong, D., Wang, F., Zheng, Y., Sun, H., Zhou, Z., Liu, Y. and You, H. (2021), Sunway supercomputer architecture towards exascale computing: Analysis and practice, Sci. China Inform. Sci. 64, 141101.CrossRefGoogle Scholar
Gill, P. E., Saunders, M. A. and Shinnerl, J. R. (1996), On the stability of Cholesky factorization for symmetric quasidefinite systems, SIAM J. Matrix Anal. Appl. 17, 3546.CrossRefGoogle Scholar
Giraud, L., Gratton, S. and Langou, J. (2007), Convergence in backward error of relaxed GMRES, SIAM J. Sci. Comput. 29, 710728.CrossRefGoogle Scholar
Giraud, L., Haidar, A. and Watson, L. T. (2008), Mixed-precision preconditioners in parallel domain decomposition solvers, in Domain Decomposition Methods in Science and Engineering XVII (Langer, U. et al., eds), Vol. 60 of Lecture Notes in Computational Science and Engineering, Springer, pp. 357364.CrossRefGoogle Scholar
Giraud, L., Langou, J., Rozložník, M. and van den Eshof, J. (2005), Rounding error analysis of the classical Gram–Schmidt orthogonalization process, Numer. Math. 101, 87100.CrossRefGoogle Scholar
Göbel, F., Grützmacher, T., Ribizel, T. and Anzt, H. (2021), Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs, in Euro-Par 2021: Parallel Processing, Vol. 12820 of Lecture Notes in Computer Science, Springer, pp. 550564.CrossRefGoogle Scholar
Goddeke, D. and Strzodka, R. (2011), Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid, IEEE Trans. Parallel Distrib. Syst. 22, 2232.CrossRefGoogle Scholar
Göddeke, D., Strzodka, R. and Turek, S. (2007), Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations, Int. J. Parallel Emergent Distrib. Syst. 22, 221256.CrossRefGoogle Scholar
Govaerts, W. and Pryce, J. D. (1990), Block elimination with one iterative refinement solves bordered linear systems accurately, BIT 30, 490507.CrossRefGoogle Scholar
Graillat, S., Jézéquel, F., Mary, T. and Molina, R. (2022), Adaptive precision matrix–vector product. Available at hal-03561193.Google Scholar
Gratton, S., Simon, E., Titley-Peloquin, D. and Toint, P. (2019), Exploiting variable precision in GMRES. Available at arXiv:1907.10550.Google Scholar
Greenbaum, A. (1997), Estimating the attainable accuracy of recursively computed residual methods, SIAM J. Matrix Anal. Appl. 18, 535551.CrossRefGoogle Scholar
Groote, J. F., Morel, R., Schmaltz, J. and Watkins, A. (2021), Logic Gates, Circuits, Processors, Compilers and Computers, Springer.CrossRefGoogle Scholar
Grützmacher, T., Anzt, H. and Quintana-Ortí, E. S. (2021), Using Ginkgo’s memory accessor for improving the accuracy of memory-bound low precision BLAS, Software Pract. Exper. Available at doi:10.1002/spe.3041.CrossRefGoogle Scholar
Gulliksson, M. (1994), Iterative refinement for constrained and weighted linear least squares, BIT 34, 239253.CrossRefGoogle Scholar
Gupta, S., Agrawal, A., Gopalakrishnan, K. and Narayanan, P. (2015), Deep learning with limited numerical precision, in Proceedings of the 32nd International Conference on Machine Learning (Bach, F. and Blei, D., eds), Vol. 37 of Proceedings of Machine Learning Research, PMLR, pp. 17371746.Google Scholar
Haidar, A., Abdelfattah, A., Zounon, M., Wu, P., Pranesh, S., Tomov, S. and Dongarra, J. (2018a), The design of fast and energy-efficient linear solvers: On the potential of half-precision arithmetic and iterative refinement techniques, in Computational Science – ICCS 2018 (Shi, Y. et al., eds), Vol. 10860 of Lecture Notes in Computer Science, Springer, pp. 586600.CrossRefGoogle Scholar
Haidar, A., Bayraktar, H., Tomov, S., Dongarra, J. and Higham, N. J. (2020), Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems, Proc . Roy. Soc. London A 476 (2243), 20200110.Google Scholar
Haidar, A., Tomov, S., Dongarra, J. and Higham, N. J. (2018b), Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers, in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), IEEE, article 47.Google Scholar
Haidar, A., Wu, P., Tomov, S. and Dongarra, J. (2017), Investigating half precision arithmetic to accelerate dense linear system solvers, in Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA ’17), ACM Press, article 10.CrossRefGoogle Scholar
Harvey, R. and Verseghy, D. L. (2015), The reliability of single precision computations in the simulation of deep soil heat diffusion in a land surface model, Climate Dynam. 16, 38653882.Google Scholar
Henry, G., Tang, P. T. P. and Heinecke, A. (2019), Leveraging the bfloat16 artificial intelligence datatype for higher-precision computations, in 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), IEEE, pp. 6976.Google Scholar
Higham, D. J., Higham, N. J. and Pranesh, S. (2021), Random matrices generating large growth in LU factorization with pivoting, SIAM J. Matrix Anal. Appl. 42, 185201.CrossRefGoogle Scholar
Higham, N. J. (1986), Computing the polar decomposition: With applications, SIAM J. Sci. Statist. Comput. 7, 11601174.CrossRefGoogle Scholar
Higham, N. J. (1988), Fast solution of Vandermonde-like systems involving orthogonal polynomials, IMA J. Numer. Anal. 8, 473486.CrossRefGoogle Scholar
Higham, N. J. (1991), Iterative refinement enhances the stability of $\mathrm{QR}$ factorization methods for solving linear equations, BIT 31, 447468.CrossRefGoogle Scholar
Higham, N. J. (1997), Iterative refinement for linear systems and LAPACK, IMA J. Numer. Anal. 17, 495509.CrossRefGoogle Scholar
Higham, N. J. (2002), Accuracy and Stability of Numerical Algorithms, second edition, SIAM.CrossRefGoogle Scholar
Higham, N. J. (2008), Functions of Matrices: Theory and Computation, SIAM.CrossRefGoogle Scholar
Higham, N. J. (2021), Numerical stability of algorithms at extreme scale and low precisions. MIMS EPrint 2021.14, Manchester Institute for Mathematical Sciences, The University of Manchester, UK. To appear in Proc. Int. Cong. Math. Google Scholar
Higham, N. J. and Liu, X. (2021), A multiprecision derivative-free Schur–Parlett algorithm for computing matrix functions, SIAM J. Matrix Anal. Appl. 42, 14011422.CrossRefGoogle Scholar
Higham, N. J. and Mary, T. (2019a), A new approach to probabilistic rounding error analysis, SIAM J. Sci. Comput. 41, A2815A2835.CrossRefGoogle Scholar
Higham, N. J. and Mary, T. (2019b), A new preconditioner that exploits low-rank approximations to factorization error, SIAM J. Sci. Comput. 41, A59A82.CrossRefGoogle Scholar
Higham, N. J. and Mary, T. (2020), Sharper probabilistic backward error analysis for basic linear algebra kernels with random data, SIAM J. Sci. Comput. 42, A3427A3446.CrossRefGoogle Scholar
Higham, N. J. and Mary, T. (2021), Solving block low-rank linear systems by LU factorization is numerically stable, IMA J. Numer. Anal. Available at doi:10.1093/imanum/drab020.Google Scholar
Higham, N. J. and Pranesh, S. (2019), Simulating low precision floating-point arithmetic, SIAM J. Sci. Comput. 41, C585C602.CrossRefGoogle Scholar
Higham, N. J. and Pranesh, S. (2021), Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems, SIAM J. Sci. Comput. 43, A258A277.CrossRefGoogle Scholar
Higham, N. J., Pranesh, S. and Zounon, M. (2019), Squeezing a matrix into half precision, with an application to solving linear systems, SIAM J. Sci. Comput. 41, A2536A2551.CrossRefGoogle Scholar
Ho, N.-M., De Silva, H. and Wong, W.-F. (2021), GRAM: A framework for dynamically mixing precisions in GPU applications, ACM Trans. Archit. Code Optim. 18, 124.CrossRefGoogle Scholar
Hogg, J. D. and Scott, J. A. (2010), A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems, ACM Trans. Math. Software 37, 17.CrossRefGoogle Scholar
Idomura, Y., Ina, T., Ali, Y. and Imamura, T. (2020), Acceleration of fusion plasma turbulence simulations using the mixed-precision communication-avoiding Krylov method, in International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), IEEE, pp. 113.Google Scholar
IEEE (1985), IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985, Institute of Electrical and Electronics Engineers.Google Scholar
IEEE (2008), IEEE Standard for Floating-Point Arithmetic, IEEE Std 754-2008 (Revision of IEEE 754-1985), Institute of Electrical and Electronics Engineers.Google Scholar
Intel Corporation (2018), BFLOAT16: Hardware Numerics Definition. White paper. Document number 338302-001US.Google Scholar
Ipsen, I. C. F. and Zhou, H. (2020), Probabilistic error analysis for inner products, SIAM J. Matrix Anal. Appl. 41, 17261741.CrossRefGoogle ScholarPubMed
Iwashita, T., Suzuki, K. and Fukaya, T. (2020), An integer arithmetic-based sparse linear solver using a GMRES method and iterative refinement, in 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), IEEE, pp. 18.Google Scholar
Jankowski, M. and Woźniakowski, H. (1977), Iterative refinement implies numerical stability, BIT 17, 303311.CrossRefGoogle Scholar
Johansson, F. et al. (2013), Mpmath: A Python library for arbitrary-precision floating-point arithmetic. Available at http://mpmath.org.Google Scholar
Joldes, M., Muller, J.-M. and Popescu, V. (2017), Tight and rigorous error bounds for basic building blocks of double-word arithmetic, ACM Trans. Math. Software 44, 15res.Google Scholar
Jouppi, N. P., Yoon, D. H., Ashcraft, M., Gottscho, M., Jablin, T. B., Kurian, G., Laudon, J., Li, S., Ma, P., Ma, X., Norrie, T., Patil, N., Prasad, S., Young, C., Zhou, Z. and Patterson, D. (2021), Ten lessons from three generations shaped Google’s TPUv4i: Industrial product, in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), IEEE, pp. 114.Google Scholar
Jouppi, N. P., Yoon, D. H., Kurian, G., Li, S., Patil, N., Laudon, J., Young, C. and Patterson, D. (2020), A domain-specific supercomputer for training deep neural networks, Comm. Assoc. Comput. Mach. 63, 6778.Google Scholar
Kahan, W. (1981), Why do we need a floating-point arithmetic standard? Technical report, University of California, Berkeley, CA, USA.Google Scholar
Kelley, C. T. (1995), Iterative Methods for Linear and Nonlinear Equations, SIAM.CrossRefGoogle Scholar
Kelley, C. T. (2022), Newton’s method in mixed precision, SIAM Rev. 64, 191211.CrossRefGoogle Scholar
Kiełbasiński, A. (1981), Iterative refinement for linear systems in variable-precision arithmetic, BIT 21, 97103.Google Scholar
Knight, P. A., Ruiz, D. and Uçar, B. (2014), A symmetry preserving algorithm for matrix scaling, SIAM J. Matrix Anal. Appl. 35, 931955.CrossRefGoogle Scholar
Kronbichler, M. and Ljungkvist, K. (2019), Multigrid for matrix-free high-order finite element computations on graphics processors, ACM Trans. Parallel Comput. 6, 2.CrossRefGoogle Scholar
Kudo, S., Nitadori, K., Ina, T. and Imamura, T. (2020a), Implementation and numerical techniques for one EFlop/s HPL-AI benchmark on Fugaku, in Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale, Vol. 1, IEEE, pp. 6976.Google Scholar
Kudo, S., Nitadori, K., Ina, T. and Imamura, T. (2020b), Prompt report on exa-scale HPL-AI benchmark, in 2020 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp. 418419.CrossRefGoogle Scholar
Kurzak, J. and Dongarra, J. (2007), Implementation of mixed precision in solving systems of linear equations on the Cell processor, Concurrency Comput. Pract. Exper. 19, 13711385.CrossRefGoogle Scholar
Langou, J., Langou, J., Luszczek, P., Kurzak, J., Buttari, A. and Dongarra, J. (2006), Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems), in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC ’06), IEEE.Google Scholar
Lefèvre, V. and Zimmermann, P. (2017), Optimized binary64 and binary128 arithmetic with GNU MPFR, in 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH), IEEE, pp. 1826.CrossRefGoogle Scholar
Li, X. S. and Demmel, J. W. (1998), Making sparse Gaussian elimination scalable by static pivoting, in Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, IEEE, pp. 117.Google Scholar
Li, X. S. and Demmel, J. W. (2003), SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Math. Software 29, 110140.CrossRefGoogle Scholar
Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S. Y., Kapur, A., Martin, M. C., Thompson, B. J., Tung, T. and Yoo, D. J. (2002), Design, implementation and testing of extended and mixed precision BLAS, ACM Trans. Math. Software 28, 152205.CrossRefGoogle Scholar
Lichtenau, C., Carlough, S. and Mueller, S. M. (2016), Quad precision floating point on the IBM z13, in 2016 IEEE 23rd Symposium on Computer Arithmetic (ARITH), IEEE, pp. 8794.CrossRefGoogle Scholar
Lindquist, N., Luszczek, P. and Dongarra, J. (2020), Improving the performance of the GMRES method using mixed-precision techniques, in Communications in Computer and Information Science (Nichols, J. et al., eds), Springer, pp. 5166.Google Scholar
Lindquist, N., Luszczek, P. and Dongarra, J. (2022), Accelerating restarted GMRES with mixed precision arithmetic, IEEE Trans. Parallel Distrib. Syst. 33, 10271037.CrossRefGoogle Scholar
Loe, J. A., Glusa, C. A., Yamazaki, I., Boman, E. G. and Rajamanickam, S. (2021a), Experimental evaluation of multiprecision strategies for GMRES on GPUs, in 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, pp. 469478.CrossRefGoogle Scholar
Loe, J. A., Glusa, C. A., Yamazaki, I., Boman, E. G. and Rajamanickam, S. (2021b), A study of mixed precision strategies for GMRES on GPUs. Available at arXiv:2109.01232.Google Scholar
Lopez, F. and Mary, T. (2020), Mixed precision LU factorization on GPU tensor cores: Reducing data movement and memory footprint. MIMS EPrint 2020.20, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar
Luszczek, P., Yamazaki, I. and Dongarra, J. (2019), Increasing accuracy of iterative refinement in limited floating-point arithmetic on half-precision accelerators, in 2019 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, pp. 16.Google Scholar
Markidis, S., Wei Der Chien, S., Laure, E., Peng, I. B. and Vetter, J. S. (2018), NVIDIA tensor core programmability, performance & precision, in 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, pp. 522531.Google Scholar
Maynard, C. M. and Walters, D. N. (2019), Mixed-precision arithmetic in the ENDGame dynamical core of the unified model, a numerical weather prediction and climate model code, Comput. Phys. Comm. 244, 6975.CrossRefGoogle Scholar
McCormick, S. F., Benzaken, J. and Tamstorf, R. (2021), Algebraic error analysis for mixed-precision multigrid solvers, SIAM J. Sci. Comput. 43, S392S419.CrossRefGoogle Scholar
Meurer, A., Smith, C. P., Paprocki, M., C̆ertik, O., Kirpichev, S. B., Rocklin, M., Kumar, A., Ivanov, S., Moore, J. K., Singh, S., Rathnayake, T., Vig, S., Granger, B. E., Muller, R. P., Bonazzi, F., Gupta, H., Vats, S., Johansson, F., Pedregosa, F., Curry, M. J., Terrel, A. R., Roučka, Š., Saboo, A., Fernando, I., Kulal, S., Cimrman, R. and Scopatz, A. (2017), SymPy: Symbolic computing in Python, PeerJ Comput. Sci. 3, e103.CrossRefGoogle Scholar
Moler, C. B. (1967), Iterative refinement in floating point, J. Assoc. Comput. Mach. 14, 316321.CrossRefGoogle Scholar
Moler, C. B. (2017), ‘Half precision’ $16$ -bit floating point arithmetic. Available at tp://blogs.mathworks.com/cleve/2017/05/08/half-precision-16-bit-floating-point-arithmetic/. Google Scholar
Moler, C. B. (2019), Variable format half precision floating point arithmetic. Available at https://blogs.mathworks.com/cleve/2019/01/16/variable-format-half-precision-floating-point-arithmetic/.Google Scholar
Mukunoki, D., Ozaki, K., Ogita, T. and Imamura, T. (2020), DGEMM using tensor cores, and its accurate and reproducible versions, in High Performance Computing (Sadayappan, P. et al., eds), Springer, pp. 230248.CrossRefGoogle Scholar
Muller, J.-M., Brunie, N., de Dinechin, F., Jeannerod, C.-P., Joldes, M., Lefèvre, V., Melquiond, G., Revol, N. and Torres, S. (2018), Handbook of Floating-Point Arithmetic, second edition, Birkhäuser.CrossRefGoogle Scholar
Nakata, M. (2021), MPLAPACK version 1.0.0 user manual. Available at arXiv:2109.13406.Google Scholar
Norrie, T., Patil, N., Yoon, D. H., Kurian, G., Li, S., Laudon, J., Young, C., Jouppi, N. and Patterson, D. (2021), The design process for Google’s training chips: TPUv2 and TPUv3, IEEE Micro 41, 5663.CrossRefGoogle Scholar
NVIDIA Corporation (2020), NVIDIA A100 Tensor Core GPU Architecture, v1.0.Google Scholar
Ogita, T. and Aishima, K. (2018), Iterative refinement for symmetric eigenvalue decomposition, Japan J. Indust. Appl. Math. 35, 10071035.CrossRefGoogle Scholar
Ogita, T. and Aishima, K. (2019), Iterative refinement for symmetric eigenvalue decomposition II: Clustered eigenvalues, Japan J. Indust. Appl. Math. 36, 435459.CrossRefGoogle Scholar
Ogita, T. and Aishima, K. (2020), Iterative refinement for singular value decomposition based on matrix multiplication, J. Comput. Appl. Math. 369, 112512.CrossRefGoogle Scholar
Oktay, E. and Carson, E. (2022), Multistage mixed precision iterative refinement, Numer. Linear Algebra Appl. Available at doi:10.1002/nla.2434.CrossRefGoogle Scholar
Oo, K. L. and Vogel, A. (2020), Accelerating geometric multigrid preconditioning with half-precision arithmetic on GPUs. Available at arXiv:2007.07539.Google Scholar
Ooi, R., Iwashita, T., Fukaya, T., Ida, A. and Yokota, R. (2020), Effect of mixed precision computing on H-matrix vector multiplication in BEM analysis, in Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, ACM Press.Google Scholar
O’uchi, S.-i, Fuketa, H., Ikegami, T., Nogami, W., Matsukawa, T., Kudoh, T. and Takano, R. (2018), Image-classifier deep convolutional neural network training by 9-bit dedicated hardware to realize validation accuracy and energy efficiency superior to the half precision floating point format, in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, pp. 15.Google Scholar
Paige, C. C., Rozložník, M. and Strakoš, Z. (2006), Modified Gram–Schmidt (MGS), least squares, and backward stability of MGS-GMRES, SIAM J. Matrix Anal. Appl. 28, 264284.CrossRefGoogle Scholar
Palmer, T. N. (2014), More reliable forecasts with less precise computations: A fast-track route to cloud-resolved weather and climate simulators?, Phil. Trans. R. Soc. A 372 (2018), 114.Google Scholar
Palmer, T. N. (2020), The physics of numerical analysis: A climate modelling case study, Phil. Trans. R. Soc. A 378 (2166), 16.CrossRefGoogle ScholarPubMed
Petschow, M., Quintana-Ortí, E. and Bientinesi, P. (2014), Improved accuracy and parallelism for MRRR-based eigensolvers: A mixed precision approach, SIAM J. Sci. Comput. 36, C240C263.CrossRefGoogle Scholar
Pisha, L. and Ligowski, L. (2021), Accelerating non-power-of-2 size Fourier transforms with GPU tensor cores, in 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 507516.CrossRefGoogle Scholar
Ralha, R. (2018), Mixed precision bisection, Math. Comput. Sci. 12, 173181.CrossRefGoogle Scholar
Rubio-González, C., Nguyen, C., Nguyen, H. D., Demmel, J., Kahan, W., Sen, K., Bailey, D. H., Iancu, C. and Hough, D. (2013), Precimonious: Tuning assistant for floating-point precision, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’13), ACM Press, article 27.Google Scholar
San Juan, P., Rodríguez-Sánchez, R., Igual, F. D., Alonso-Jordá, P. and Quintana-Ortí, E. S. (2021), Low precision matrix multiplication for efficient deep learning in NVIDIA carmel processors, J. Supercomput. 77, 1125711269.CrossRefGoogle Scholar
Sato, M., Ishikawa, Y., Tomita, H., Kodama, Y., Odajima, T., Tsuji, M., Yashiro, H., Aoki, M., Shida, N., Miyoshi, I., Hirai, K., Furuya, A., Asato, A., Morita, K. and Shimizu, T. (2020), Co-design for A64FX manycore processor and ‘Fugaku’, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20), IEEE.Google Scholar
Scheinberg, K. (2016), Evolution of randomness in optimization methods for supervised machine learning, SIAG/OPT Views and News 24, 18.Google Scholar
Schenk, O., Gärtner, K., Fichtner, W. and Stricker, A. (2001), PARDISO: A high-performance serial and parallel sparse linear solver in semiconductor device simulation, Future Gener. Comput. Syst. 18, 6978.CrossRefGoogle Scholar
Simoncini, V. and Szyld, D. B. (2003), Theory of inexact Krylov subspace methods and applications to scientific computing, SIAM J. Sci. Comput. 25, 454477.CrossRefGoogle Scholar
Skeel, R. D. (1980), Iterative refinement implies numerical stability for Gaussian elimination, Math. Comp. 35, 817832.CrossRefGoogle Scholar
Smoktunowicz, A. and Sokolnicka, J. (1984), Binary cascades iterative refinement in doubled-mantissa arithmetics, BIT 24, 123127.CrossRefGoogle Scholar
Sorna, A., Cheng, X., D’Azevedo, E., Won, K. and Tomov, S. (2018), Optimizing the fast Fourier transform using mixed precision on tensor core hardware, in 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), IEEE, pp. 37.CrossRefGoogle Scholar
Stathopoulos, A. and Wu, K. (2002), A block orthogonalization procedure with constant synchronization requirements, SIAM J. Sci. Comput. 23, 21652182.CrossRefGoogle Scholar
Stewart, G. W. (1973), Introduction to Matrix Computations, Academic Press.Google Scholar
Stor, N. J., Slapničar, I. and Barlow, J. L. (2015), Accurate eigenvalue decomposition of real symmetric arrowhead matrices and applications, Linear Algebra Appl. 464, 6289.CrossRefGoogle Scholar
Sumiyoshi, Y., Fujii, A., Nukada, A. and Tanaka, T. (2014), Mixed-precision AMG method for many core accelerators, in Proceedings of the 21st European MPI Users’ Group Meeting (EuroMPI/ASIA ’14), ACM Press, pp. 127132.CrossRefGoogle Scholar
Sun, J., Peterson, G. D. and Storaasli, O. O. (2008), High-performance mixed-precision linear solver for FPGAs, IEEE Trans. Comput. 57, 16141623.Google Scholar
Tagliavini, G., Mach, S., Rossi, D., Marongiu, A. and Benin, L. (2018), A transprecision floating-point platform for ultra-low power computing, in 2018 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 10511056.Google Scholar
Tamstorf, R., Benzaken, J. and McCormick, S. F. (2021), Discretization-error-accurate mixed-precision multigrid solvers, SIAM J. Sci. Comput. 43, S420S447.CrossRefGoogle Scholar
Tintó Prims, O., Acosta, M. C., Moore, A. M., Castrillo, M., Serradell, K., Cortés, A. and Doblas-Reyes, F. J. (2019), How to use mixed precision in ocean models: Exploring a potential reduction of numerical precision in NEMO 4.0 and ROMS 3.6, Geosci. Model Dev. 12, 31353148.CrossRefGoogle Scholar
Tisseur, F. (2001), Newton’s method in floating point arithmetic and iterative refinement of generalized eigenvalue problems, SIAM J. Matrix Anal. Appl. 22, 10381057.CrossRefGoogle Scholar
Trader, T. (2016), IBM advances against x86 with Power9. Available at https://www.hpcwire.com/2016/08/30/ibm-unveils-power9-details/.Google Scholar
Tsai, Y. M., Luszczek, P. and Dongarra, J. (2021), Mixed-precision algorithm for finding selected eigenvalues and eigenvectors of symmetric and Hermitian matrices. Technical report ICL-UT-21-05, Innovative Computing Laboratory, The University of Tennessee, Knoxville, TN, USA.Google Scholar
Tsuchida, E. and Choe, Y.-K. (2012), Iterative diagonalization of symmetric matrices in mixed precision and its application to electronic structure calculations, Comput. Phys. Comm. 183, 980985.CrossRefGoogle Scholar
Turner, K. and Walker, H. F. (1992), Efficient high accuracy solutions with GMRES( $m$ ), SIAM J. Sci. Statist. Comput. 12, 815825.CrossRefGoogle Scholar
van den Eshof, J. and Sleijpen, G. L. G. (2004), Inexact Krylov subspace methods for linear systems, SIAM J. Matrix Anal. Appl. 26, 125153.CrossRefGoogle Scholar
Váňa, F., Düben, P., Lang, S., Palmer, T., Leutbecher, M., Salmond, D. and Carver, G. (2017), Single precision in weather forecasting models: An evaluation with the IFS, Mon . Weather Rev. 145, 495502.CrossRefGoogle Scholar
von Neumann, J. and Goldstine, H. H. (1947), Numerical inverting of matrices of high order, Bull. Amer. Math. Soc. 53, 10211099.CrossRefGoogle Scholar
Wang, E., Davis, J. J., Zhao, R., Ng, H.-C., Niu, X., Luk, W., Cheung, P. Y. K. and Constantinides, G. A. (2019), Deep neural network approximation for custom hardware, ACM Comput. Surv. 52, 139.CrossRefGoogle Scholar
Wang, N., Choi, J., Brand, D., Chen, C.-Y. and Gopalakrishnan, K. (2018), Training deep neural networks with $8$ -bit floating point numbers, in Advances in Neural Information Processing Systems 31 (Bengio, S. et al., eds), Curran Associates, pp. 76867695.Google Scholar
Wang, S. and Kanwar, P. (2019), BFloat16: The secret to high performance on cloud TPUs. Available at https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.Google Scholar
Wilkinson, J. H. (1948), Progress report on the Automatic Computing Engine. Report MA/17/1024, Mathematics Division, Department of Scientific and Industrial Research, National Physical Laboratory, Teddington, UK.Google Scholar
Wilkinson, J. H. (1961), Error analysis of direct methods of matrix inversion, J. Assoc. Comput. Mach. 8, 281330.CrossRefGoogle Scholar
Wilkinson, J. H. (1963), Rounding Errors in Algebraic Processes, Notes on Applied Science No. 32, Her Majesty’s Stationery Office. Also published by Prentice Hall, USA. Reprinted by Dover, 1994.Google Scholar
Wilkinson, J. H. (1977), The use of the single-precision residual in the solution of linear systems. Unpublished manuscript.Google Scholar
Yamazaki, I., Tomov, S. and Dongarra, J. (2015a), Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs, SIAM J. Sci. Comput. 37, C307C330.CrossRefGoogle Scholar
Yamazaki, I., Tomov, S. and Dongarra, J. (2016), Stability and performance of various singular value QR implementations on multicore CPU with a GPU, ACM Trans. Math. Software 43, 10.CrossRefGoogle Scholar
Yamazaki, I., Tomov, S., Dong, T. and Dongarra, J. (2015b), Mixed-precision orthogonalization scheme and adaptive step size for improving the stability and performance of CA-GMRES on GPUs, in High Performance Computing for Computational Science (VECPAR 2014) (Daydé, M. et al., eds), Vol. 8969 of Lecture Notes in Computer Science, Springer, pp. 1730.CrossRefGoogle Scholar
Yamazaki, I., Tomov, S., Kurzak, J., Dongarra, J. and Barlow, J. (2015c), Mixed-precision block Gram Schmidt orthogonalization, in Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA ’15), ACM Press.Google Scholar
Yang, K., Chen, Y.-F., Roumpos, G., Colby, C. and Anderson, J. (2019), High performance Monte Carlo simulation of Ising model on TPU clusters, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19), ACM Press.Google Scholar
Yang, L. M., Fox, A. and Sanders, G. (2021), Rounding error analysis of mixed precision block Householder QR algorithms, SIAM J. Sci. Comput. 43, A1723A1753.CrossRefGoogle Scholar
Zhang, S., Baharlouei, E. and Wu, P. (2020), High accuracy matrix computations on neural engines: A study of QR factorization and its applications, in Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, ACM Press.Google Scholar
Zhu, Y.-K. and Hayes, W. B. (2009), Correct rounding and a hybrid approach to exact floating-point summation, SIAM J. Sci. Comput. 31, 29813001.CrossRefGoogle Scholar
Zlatev, Z. (1982), Use of iterative refinement in the solution of sparse linear systems, SIAM J. Numer. Anal. 19, 381399.CrossRefGoogle Scholar
Zounon, M., Higham, N. J., Lucas, C. and Tisseur, F. (2022), Performance impact of precision reduction in sparse linear systems solvers, PeerJ Comput. Sci. 8, e778.CrossRefGoogle ScholarPubMed