Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-23T20:08:19.687Z Has data issue: false hasContentIssue false

Optimization of the Multishift QR Algorithm with Coprocessors for Non-Hermitian Eigenvalue Problems

Published online by Cambridge University Press:  28 May 2015

Takafumi Miyata*
Affiliation:
Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
Yusaku Yamamoto*
Affiliation:
Graduate School of System Informatics, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
Takashi Uneyama*
Affiliation:
Institute for Chemical Research, Kyoto University, Gokasho, Uji 611-0011, Japan
Yoshimasa Nakamura*
Affiliation:
Graduate School of Informatics, Kyoto University, 36-1 Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan
Shao-Liang Zhang*
Affiliation:
Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
*
Corresponding author. Email: [email protected]
Corresponding author. Email: [email protected]
Corresponding author. Email: [email protected]
Corresponding author. Email: [email protected]
Corresponding author. Email: [email protected]
Get access

Abstract

The multishift QR algorithm is efficient for computing all the eigenvalues of a dense, large-scale, non-Hermitian matrix. The major part of this algorithm can be performed by matrix-matrix multiplications and is therefore suitable for modern processors with hierarchical memory. A variant of this algorithm was recently proposed which can execute more computational parts by matrix-matrix multiplications. The algorithm is especially appropriate for recent coprocessors which contain many processor-elements such as the CSX600. However, the performance of the algorithm highly depends on the setting of parameters such as the numbers of shifts and divisions in the algorithm. Optimal settings are different depending on the matrix size and computational environments. In this paper, we construct a performance model to predict a setting of parameters which minimizes the execution time of the algorithm. Experimental results with the CSX600 coprocessor show that our model can be used to find the optimal setting.

Type
Research Article
Copyright
Copyright © Global-Science Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Bai, Z., Day, D., Demmel, J. and Dongarra, J., A test matrix collection for non-Hermitian eigenvalue problems, Univ. Tennessee Comput. Sci. T. R., UT-CS-97-355 (1997).Google Scholar
[2]Bai, Z. and Demmel, J., On a block implementation of Hessenberg QR iteration, Int. J. High Speed Comput., 1 (1989), 97112.CrossRefGoogle Scholar
[3]Braman, K., Byers, R. and Mathias, R., The multishift QR algorithm part I: Maintaining well-focused shifts and level 3 performance, SIAM J. Matrix Anal. Appl., 23 (2002), 929947.Google Scholar
[5]Cuenca, J., García, L.-P., Giménez, D., González, J. and Vidal, A., Empirical modeling of parallel linear algebra routines, Lect. Notes Comput. Sci., 3019 (2004), 169174.Google Scholar
[6]Cuenca, J., Giménez, D. and González, J., Architecture of an automatically tuned linear algebra library, Parallel Comput., 30 (2004), 187210.Google Scholar
[7]Dackland, K. and Kågström, B., An hierarchical approach for performance analysis of ScaLAPACK-based routines using the distributed linear algebra machine, Lect. Notes Comput. Sci., 1184 (1996), 186195.Google Scholar
[8]Francis, J. G. F., The QR transformation: A unitary analogue to the LR transformation-part 1, Comput. J., 4 (1961), 265271.Google Scholar
[9]Francis, J. G. F., The QR transformation-part 2, Comput. J., 4 (1962), 332345.Google Scholar
[10]Golub, G. H. and Van Loan, C. F., Matrix Computations, 3rd ed., Johns Hopkins University Press, Baltimore, London, 1996.Google Scholar
[12]Kressner, D., Numerical Methods for General and Structured Eigenvalue Problems, Lect. Notes Comput. Sci. Eng. 46, Springer-Verlag, Berlin, Heidelberg, 2005.Google Scholar
[13]Kublanovskaya, V. N., On some algorithms for the solution of the complete eigenvalue problem, U.S.S.R. Comput. Math. Math. Phys., 3 (1961), 637657.Google Scholar
[14]Miyata, T., Yamamoto, Y. and Zhang, S.-L., Performance modeling of multishift QR algorithms for the parallel solution ofsymmetric tridiagonal eigenvalue problems, Lect. Notes Comput. Sci., 6082 (2010), 401412.Google Scholar
[15]Watkins, D. S., The transmission of shifts and shift blurring in the QR algorithm, Lin. Alg. Appl., 241-243 (1996), 877896.Google Scholar
[16]Yamamoto, Y., Performance modeling and optimal block size selection for a BLAS-3 based tridiagonalization algorithm, Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region, 2005, 249256.Google Scholar
[17]Yamamoto, Y., Performance modeling and optimal block size selection for the small-bulge multi-shift QR algorithm, Lect. Notes Comput. Sci., 4330 (2006), 451463.Google Scholar
[18]Yamamoto, Y., Miyata, T. and Nakamura, Y., Accelerating the complex Hessenberg QR algorithm with the CSX600 floating-point coprocessor, Proceedings of Parallel and Distributed Computing and Systems, 2007, 204211.Google Scholar