Published online by Cambridge University Press: 26 February 2011
Stacking multiple device strata can improve system performance of a microprocessor (μP) by reducing interconnect length. This enables latency improvement, power reduction, and improved memory bandwidth. In this paper we review some of our recent design analysis and process results which quantitatively show the benefits of stacking applied to μPs.
We report on two applications for stacking which take advantage of reduced wire length- “logic+logic” stacking and “logic+memory” stacking. In addition to optimizing minimum wire length, we considered carefully the thermal ramifications of the new designs. For the logic+memory application, we considered the case of reducing off-die wiring by stacking a DRAM cache (32 to 64MB) onto a high performance μP. Simulations showed 3x reduced off-die bandwidth, Cycles Per Memory Access (CPMA) reduction of 13%, and a 66% average bus power reduction. For logic+logic applications, we considered a high performance μP where the unit blocks were repartitioned into two strata. For this case, simulations showed that stacking can simultaneously reduce power by 15% while increasing performance by 15% with a minor 14° C increase in peak temperature compared to the planar design. Using voltage scaling, this translates to 34% power reduction and 8% performance improvement with no temperature increase. We found that these results can be further improved by a secondary splitting of the individual blocks. As an example, we split a 32KB first level data cache resulting in 25% power reduction, 10% latency reduction, and 20% area reduction.
We also discuss the fabrication of stacked structures with two complimentary process flows. In one case, we developed a 300mm wafer stacking process using Cu-Cu bonding, wafer thinning, and through-silicon vias (TSVs). This technology provides reliable bonding with non-detectable bonding-interface resistance and inter-strata via pitch below 8μm. We investigated the impact of this wafer stacking process to the transistor and interconnect layers built using a 65nm strained-Si/Cu-Low-K process technology and found no impact to either discrete N- and P-MOS devices or to thin 4Mb SRAMs. We verified fully functional SRAMs on thinned wafers with thicknesses down to 5μm. Although wafer stacking leads itself well to tight-pitch same-die-size stacking, die stacking enables integration of different size dies and includes opportunity to improve yield by stacking known good dies. We demonstrated a die stack process flow with 75μm thinned die, TSV, and inter-strata via pitch below 100μm. We also found negligible impact to transistors using this process flow. Multiple stacks of up to seven 75μm thin dies with TSVs were fabricated and tested. Prospects for high volume integration of 3D into μPs are discussed.