Superscalar Processors

Jean-Loup Baer

doi:10.1017/CBO9780511811258.004

3 - Superscalar Processors

Published online by Cambridge University Press: 05 June 2012

Jean-Loup Baer

Show author details

Jean-Loup Baer: Affiliation:
University of Washington

Book contents

Get access

Summary

From Scalar to Superscalar Processors

In the previous chapter we introduced a five-stage pipeline. The basic concept was that the instruction execution cycle could be decomposed into nonoverlapping stages with one instruction passing through each stage at every cycle. This so-called scalar processor had an ideal throughput of 1, or in other words, ideally the number of instructions per cycle (IPC) was 1.

If we return to the formula giving the execution time, namely,

EXCPU = Number of instructions × CPI × cycle time

we see that in order to reduce EXCPU in a processor with the same ISA – that is, without changing the number of instructions, N – we must either reduce CPI (increase IPC) or reduce the cycle time, or both. Let us look at the two options.

The only possibility to increase the ideal IPC of 1 is to radically modify the structure of the pipeline to allow more than one instruction to be in each stage at a given time. In doing so, we make a transition from a scalar processor to a superscalar one. From the microarchitecture viewpoint, we make the pipeline wider in the sense that its representation is not linear any longer. The most evident effect is that we shall need several functional units, but, as we shall see, each stage of the pipeline will be affected.

Type: Chapter
Information: Microprocessor Architecture
From Simple Pipelines to Chip Multiprocessors
, pp. 75 - 128

DOI: https://doi.org/10.1017/CBO9780511811258.004 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Abel, N., Budnick, D., Kuck, D., Muraoka, Y., Northcote, R., and Wilhelmson, R., “TRANQUIL: A Language for an Array Processing Computer,” Proc. AFIPS SJCC, 1969, 57–73Google Scholar

August, D., Connors, D., Mahlke, S., Sias, J., Crozier, K., Cheng, B., Eaton, P., Olaniran, Q., and Hwu, W-m., “Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture,” Proc. 25th Int. Symp. on Computer Architecture, 1998, 227–237CrossRef Google Scholar

Anderson, D., Sparacio, F., and Tomasulo, R., “Machine Philosophy and Instruction Handling,” IBM Journal of Research and Development, 11, 1, Jan. 1967, 8–24CrossRef Google Scholar

Bernstein, A., “Analysis of Programs for Parallel Processing,” IEEE Trans. on Elec. Computers, Ec03-76992, Oct. 1966, 746–757Google Scholar

Bhandarkar, D., Alpha Implementations and Architecture. Complete Reference and Guide, Digital Press, Boston, 1995Google Scholar

Boggs, D., Baktha, A., Hawkins, J., Marr, D., Miller, J., Roussel, P., Singhal, R., Toll, B., and Venkatraman, K., “The Microarchitecture of the Pentium 4 Processor on 90nm Technology,” Intel Tech. Journal, 8, 1, Feb. 2004, 1–17Google Scholar

Cvetanovic, Z. and Bhandarkar, D., “Performance Characterization of the Alpha 21164 Microprocessor Using TP and SPEC Workloads,” Proc. 2nd Int. Symp. on High-Performance Computer Architecture, 1996, 270–280CrossRef Google Scholar

Colwell, R., Papworth, D., Hinton, G., Fetterman, M., and Glew, A., “Intel's P6 Microarchitecture,” Chapter 7 in Shen, J. P. and Lipasti, M., Eds., Modern Processor Design, 2005, 329–367Google Scholar

Edmondson, J., Rubinfeld, P., Preston, R., and Rajagopalan, V., “Superscalar Instruction Execution in the 21164 Alpha Microprocessor,” IEEE Micro, 15, 2, Apr. 1995, 33–43CrossRef Google Scholar

Gwennap, L., “Brainiacs, Speed Demons, and Farewell,” Microprocessor Report Newsletter, 13, 7, Dec. 1999Google Scholar

Gochman, S., Ronen, R., Anati, I., Berkovits, R., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R., “The Intel Pentium M Processor: Microarchitecture and Performance,” Intel Tech. Journal, 07, 2, May 2003, 21–39Google Scholar

Huck, J., Morris, D., Ross, J., Knies, A., Mulder, H., and Zahir, R., “Introducing the IA-64 Architecture,” IEEE Micro, 20, 5, Sep. 2000, 12–23CrossRef Google Scholar

Hwu, W.-m. and Patt, Y., “HPSm, A High-Performance Restricted Data Flow Architecture Having Minimal Functionality,” Proc. 13th Int. Symp. on Computer Architecture, 1986, 297–307CrossRef Google Scholar

Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., and Roussel, P., “The Microarchitecture of the Pentium4 Processor,” Intel Tech. Journal, 1, Feb. 2001Google Scholar

,Intel Corp, “A Tour of the P6 Microarchitecture,” 1995, http://www.x86.org/ftp/manuals/686/p6tour.pdf

Keller, R., “Look-ahead Processors,” ACM Computing Surveys, 7, 4, Dec. 1975, 177–195CrossRef Google Scholar

Keshava, J. and Pentkovski, V., “Pentium III Processor Implementation Tradeoffs,” Intel Tech. Journal, 2, May 1999Google Scholar

Lam, M., “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation, SIGPLAN Notices, 23, 7, Jul. 1988, 318–328Google Scholar

McNairy, C. and Soltis, D., “Itanium 2 Processor Microarchitecture,” IEEE Micro, 23, 2, Mar. 2003, 44–55CrossRef Google Scholar

Papworth, D., “Tuning the Pentium Pro Microarchitecture,” IEEE Micro, 16, 2, Mar. 1996, 8–15CrossRef Google Scholar

Patterson, D. and Séquin, C., “RISC I: A Reduced Instruction Set VLSI Computer,” Proc. 8th Int. Symp. on Computer Architecture, 1981, 443–457Google Scholar

Riseman, E. and Foster, C., “The Inhibition of Potential Parallelism by Conditional Jumps,” IEEE Trans. on Computers, C-12, 12, Dec. 1972, 1405–1411CrossRef Google Scholar

Sohi, G., “Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers,” IEEE Trans. on Computers, C-39, 3, Mar. 1990, 349–359 (an earlier version with coauthor S. Vajapeyam was published in Proc. 14th Int. Symp. on Computer Architecture, 1987)CrossRef Google Scholar

Sharangpani, H. and Arora, K., “Itanium Processor Microarchitecture,” IEEE Micro, 20, 5, Sep. 2000, 24–43CrossRef Google Scholar

Smith, J. and Pleszkun, A., “Implementation of Precise Interrupts in Pipelined Processors,” IEEE Trans. on Computers, C-37, 5, May 1988, 562–573 (an earlier version was published in Proc. 12th Int. Symp. on Computer Architecture, 1985)CrossRef Google Scholar

Schlansker, M. and Rau, B., “EPIC: Explicitly Parallel Instruction Computing,” IEEE Computer, 33, 2, Feb. 2000, 37–45CrossRef Google Scholar

Smith, J. and Sohi, G., “The Microarchitecture of Superscalar Processors,” Proc. IEEE, 83, 12, Dec. 1995, 1609–1624CrossRef Google Scholar

Thornton, J., “Parallel Operation in the Control Data 6600,” AFIPS Proc. FJCC, pt. 2, vol. 26, 1964, 33–40 (reprinted as Chapter 39 of C. Bell and A. Newell, Computer Structures: Readings and Examples, McGraw-Hill, New York, 1971, and Chapter 43 of D. Siewiorek, C. Bell, and A. Newell, Computer Structures: Principles and Examples, McGraw-Hill, New York, 1982)Google Scholar

Tomasulo, R., “An Efficient Algorithm for Exploiting Multiple Arithmetic Units,” IBM Journal of Research and Development, 11, 1, Jan. 1967, 25–33CrossRef Google Scholar

Thornton, J., Design of a Computer: The Control Data 6600, Scott, Foresman and Co., Glenview, IL, 1970Google Scholar

Tjaden, G. and Flynn, M., “Detection and Parallel Execution of Independent Instructions,” IEEE Trans. on Computers, C-19, 10, Oct. 1970, 889–895CrossRef Google Scholar