A Reorder Buffer Design for High Performance Processors

García Ordaz, José R; Ramírez Salinas, Marco A; Villa Vargas, Luis A; Molina Lozano, Herón; Peredo Macías, Cuauhtémoc

Services on Demand

Journal

Article

Indicators

Computación y Sistemas

On-line version ISSN 2007-9737Print version ISSN 1405-5546

Comp. y Sist. vol.16 n.1 Ciudad de México Jan./Mar. 2012

Artículos

A Reorder Buffer Design for High Performance Processors

Diseño de un búfer de reordenamiento para procesadores de alto desempeño

José R. García Ordaz, Marco A. Ramírez Salinas, Luis A. Villa Vargas, Herón Molina Lozano, and Cuauhtémoc Peredo Macías

Microtechnology and Embedded System Laboratory, Centro de Investigación en Computación, Instituto Politécnico Nacional, Av. Juan de Dios Bátiz, s/n, Zacatenco, 07738, México DF, Mexico. Correo: jgarcia@cic.ipn.mx, mars@cic.ipn.mx, lvilla@cic.ipn.mx, hmolina@cic.ipn.mx, cperedo@cic.ipn.mx.

Article received on 01/02/2010.
Accepted on 15/04/2011.

Abstract

Modern reorder buffers (ROBs) were conceived to improve processor performance by allowing instruction execution out of the original program order and run ahead of sequential instruction code exploiting existing instruction level parallelism (ILP). The ROB is a functional structure of a processor execution engine that supports speculative execution, physical register recycling, and precise exception recovering. Traditionally, the ROB is considered as a monolithic circular buffer with incoming instructions at the tail pointer after the decoding stage and completing instructions at the head pointer after the commitment stage. The latter stage verifies instructions that have been dispatched, issued, executed, and are not completed speculatively. This paper presents a design of distributed reorder buffer microarchitecture by using small structures near building blocks which work together, using the same tail and head pointer values on all structures for synchronization. The reduction of area, and therefore, the reduction of power and delay make this design suitable for both embedded and high performance microprocessors.

Keywords: Superscalar processors, reorder-buffer, instruction window, low power consumption.

Resumen

El búfer de reordenamiento de instrucciones (ROB) fue conceptualizado para mejorar el desempeño de los procesadores al permitir ejecutar instrucciones fuera del orden original del programa y en avance al instante preciso de la ejecución secuencial, explotando el paralelismo que existe a nivel de las instrucciones ILP. El ROB es una estructura funcional de la máquina de ejecución de los procesadores para dar soporte a la ejecución especulativa, al reciclado de los registros físicos y a la recuperación precisa de excepciones. Tradicionalmente el ROB es considerado un búfer circular monolítico en donde las instrucciones entran en la dirección especificada por un apuntador de cola después de la etapa de decodificación y son terminadas en la dirección especificada por un apuntador de cabecera después de la etapa de finalización. El artículo presenta el diseño de un búfer de reordenamiento de instrucciones distribuido en pequeñas estructuras cercanas a los bloques funcionales con los cuales interactúan, usando los mismos valores de apuntadores de cola y cabecera por sincronía. La reducción de área y por consecuencia la reducción de consumo de energía y retardo hacen de este diseño apropiado para procesadores embebidos y procesadores de alto desempeño.

Palabras Clave: Procesadores súper escalares, búfer de reordenamiento, ventana de instrucciones, consumo de baja potencia.

DESCARGAR ARTÍCULO EN FORMATO PDF

Acknowledgments

This work has been partially supported by grants under agreements SIP-20101320 and SIP-20101154 of the Graduate Studies and Research Department of the National Polytechnic Institute (IPN), Mexico, and by grants under agreements 124104 and 115976 of the National Council for Science and Technology (CONACyT), Mexico.

References

1. Burger, D. & Austing, T.M. (1997). The Simplescalar Tool Set Ver. 2.0. ACM SIGARCH Computer Architecture news, 25(3), 13-25. [ Links ]

2. Cristal, A., Ortega, D., Llosa, J., & Valero, M. (2004). Out-of-Order Commit Processors. 10th International Symposium on High Performance Computer Architecture (HPCA '04), 48-59. [ Links ]

3. Edmondson, J.H., Rubinfeld, P., Preston, R., & Rajagopalan, V. (1995). Superscalar Instruction Execution in the 21164 Alpha Microprocessor. IEEE micro,15(2), 33-43. [ Links ]

4. Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., & Roussel, P. (2001). The Microarchitecture of the Pentium 4 Processor. Intel Technology Journal, 5(1), 1-13. [ Links ]

5. Kessler, R.E., McLellan, E.J., & Webb, D.A. (1999). The Alpha 21264 Microprocessor Architecture. IEEE micro, 19(2), 24-36. [ Links ]

6. Kucuk, G., Ponomarev, D.V., Ergin, O., & Ghose, K. (2004). Complexity-Effective Reorder Buffer Designs for Superscalar Processors. IEEE Transaction on Computers, 53(6), 653-665. [ Links ]

7. Leibholz, D. & Razdan, R. (1997). The Alpha 21264: A 500mhz out-Of.Order Execution Microprocessor. IEEE COMPCON 97, San Jose, CA , USA, 28-36. [ Links ]

8. Lenell, J., Wallace, S., & Bagherzadeh, N. (1992). A 20mhz Cmos Reorder Buffer for a Superscalar Microprocessor. 4th NASA Symposium on VLSI DESIGN, Idaho, Moscow, 2.3.1-2.3.12. [ Links ]

9. Martí, S.P., Borrás, J.S., Rodríguez, P.L., Tena, R.U., & Marín, J.D. (2009). A Complexity-Effective out-of-Order Retirement Microarchitecture. IEEE Transactions on Computers, 58(12), 1626-1639. [ Links ]

10. Ramirez, M.A., Cristal, A., Veidenbaum, A.V., Villa, L., & Valero, M. (2005). A New Pointer-Based Instruction Queue Design and Its Power-Performance Evaluation. 2005 IEEE International Conference on Computer Design: VLSI in Computers an Processors, San Jose CA, USA, 647-653. [ Links ]

11. Veidenbaum, A.V., Ramirez, M.A., Cristal, A., & Valero, M. (2008). Pointer-Based Instruction Queue Design for out of Order Processors. US 2008/0082788A1 [ Links ]

12. Wang, C.J. & Emnett, F. (1993). Implementing Precise Interruptions in Pipeline Risc Processors. IEEE micro, 13(4), 36-43. [ Links ]

13. Yeaguer, K. C. (1996). The Mips R10000 Superescalar Microprocessors. IEEE micro, 16(2), 28-41. [ Links ]