hpca ยป

HPCA Table of Contents

Introduction

What is Computer Architecture
Moore's Law
Memory Wall
Processor Speed, Power, Cost
Fabrication Cost

Metrics and Evaluation

Performance
Speedup
Benchmarks
Iron Law of Performance
Amdahl's Law
Lhadma's Law

Pipelining

Pipelining
Cycles per Instruction (CPI)
Stalls
Stalls - Control Dependencies
Stalls - Data Dependencies
Hazards
Number of Stages Trade-off

Branches

Branch Prediction
Necessity of Better Branch Prediction
Better Branch Prediction
Branch Target Buffer (BTB)
Direction Predictor - 1 Bit Predictor
2 Bit Predictor
History Based Predictors
History with Shared Counters
PShare and GShare
Tournament Predictors
Hierarchical Predictors
Return Address Stack (RAS)

Predication

Predication
If Conversion to Assembly Code
Conditional Assembly Code Instructions
Hardware Support

Instruction Level Parallelism (ILP)

Ideal ILP
RAW Dependencies
WAW Dependencies
Removing False Dependencies
Register Allocation Table (RAT)
Instruction Level Parallelism (ILP)
ILP vs IPC

Instruction Scheduling

Tomosulo's Algorithm
Tomosulo's Algorithm - Issue
Tomosulo's Algorithm - Dispatch
Tomosulo's Algorithm - Broadcast
Load and Store Instructions
Tomosulo's Algorithm - Timing Example

ReOrder Buffer (ROB)

Exceptions in Out-of-Order Execution
Branch Misprediction in OOO Processors
Tomosulo's with ROB
Dispatch with ROB
Braodcast with ROB
Commit with ROB
ROB and Branch Prediction
ROB and Exceptions
ROB and RAT
ROB Timing Example
Unified Reservation Stations
Specifications of a Superscalar Processor

Memory Ordering

Load and Store Queue (LSQ)
The LSQ, ROB, and RS Relationship

Compiler ILP

Tree Height Reduction
Compiler Instruction Scheduling
Compiler If-Conversion
Loop Unrolling
Function InLining

Very Long Instruction Word (VLIW)

VLIW
VLIW Pros and Cons
VLIW Instructions