慶應義塾大学
2016年度　春学期

コンピューター・アーキテクチャ
Computer Architecture

開講場所：SFC
授業形態：講義
担当: Rodney Van Meter
E-mail: rdv@sfc.keio.ac.jp

第6回 6月23日
Lecture 6, June 23: Processors: Basics of Pipelining

Hennessy and Patterson Appendix A slides!

Outline of This Lecture

Assembly Programming
Stages of Instruction Execution
Pipelining
Final Thoughts
Homework

Assembly Programming Using the MIPS Simulator

Note: The MIPS simulator is available on SourceForge. The original spim page is still on the web.

You will need vecadd.s, a short MIPS assembly program.

Here are screen shots from spim (Linux), xspim (Linux), PCspim (Windows) and spim and QtSpim (Mac) versions of the tool.

PCspim screenshot, with
output console (Windows)

Things to note in the images above, as well as your own execution runs:

Watch the advance of the PC (program counter) as you step through the program.
By convention, most of the registers have two names, Rxx and a more mnemonic one indicating the use.
The floating point registers are either single-precision, or a pair of them (e.g., F0 and F1) are used as one double-precision FP register.
Notice especially the precision and accuracy of the FP numbers in the output, compared to the values we intended! This is an important fact!

Stages of Instruction Execution

This model of how an instruction is executed is tilted slightly toward the MIPS architecture, of which Hennessy and Patterson were two of the instigators. However, the actions in any CPU would be similar.

Instruction Fetch cycle (IF)
Fetch the current instruction from memory, using the program counter (PC) as the address, add 4 to the PC, and store the PC (actually, in MIPS, store the tentative new PC into an internal register called NPC, Next PC).
Instruction Decode/register fetch cycle (ID)
Determine which instruction we are holding, fetch the register values (two, always, in this instruction set), compare the two registers and set the EQUAL flag if equal.
Execution/effective address cycle (EX)
Depending on the instruction type:
- Memory reference: The ALU adds the base register and the offset to form the effective address.
- Register-Register ALU instruction: perform the operation (e.g., addition, multiplication, logic operation) on the register values fetched by the ID stage.
- Register-Immediate ALU instruction: perform the instruction on the first register read and the immediate value in the instruction.
Memory access (MEM)
If the instruction is a LOAD or a STORE, do the appropriate thing, otherwise do nothing. (In MIPS, update the PC using either NPC or the output of the ALU operation.)
Write-Back cycle (WB)
If the instruction was LOAD, write the value fetched from memory into the matching register; if it was an ALU operation, write the result to the register.

The MIPS Pipeline

Pipeline Hazards

Sometimes conflicts occur between the different stages of the pipeline. Such as condition is called a pipeline hazard. There are three types of hazards:

Structural hazard: when there is only a single resource, such as a single port to access main memory, and two instructions try to use it at the same time, a structural hazard is hit and one must wait on the other.
Data hazard:Sometimes one instruction will attempt to use the result of an ALU operation (addition, etc.) before the operation is complete. It must wait, sometimes more than one clock cycle.
Control hazard:In the simplest pipeline implementations, every time a branch occurs, one or more in-progress instructions must be aborted without changing the state of the system, and new instructions must be fetched.

Hazards result in pipeline stalls or pipeline bubbles.

Final Thoughts

The five-stage pipeline we have discussed is far from the only way to divide the work in a pipeline. The Intel Prescott microprocessor (Feb. 2004) had a thirty stage pipeline! Filling that pipeline takes some serious time, so every branch is a problem. The most famous pipeline of all:

Ford Model T assembly line, 1913, via
Wikipedia

宿題
Homework

This week's homework (submit via SFS):

Modify vecadd.s to multiply two four-by-four matrices and print the results (行列の掛け算). Call it arraymult.s. vecadd.sを改変して、4次の正方行列をかけ算するプログラムarraymult.sを作りなさい. Include a printout of the output. Use these arrays (same as above):
array1: .float 3.14159265, 2.71828183, 1.0, -0.10 1.0, 0.0, 1.0, 0.0 0.0, 1.0, 0.0, 1.0 -1.0, 1.0, -1.0, 1.0
array2: .float 2.71828183, 1.0, 3.14159265, 1.0 1.0, 0.0, 1.0, 0.0 -1.0, 1.0, -1.0, 1.0 3.0, 2.0, 1.0, 0.0
Take your assembly-language matrix multiplication program and count the following:
1. Floating-point additions actually executed over all loops
2. Floating-point multiplications actually executed over all loops
3. Integer additions/subtractions actually executed over all loops
4. Branches actually executed over all loops
5. The number of instructions between branch instructions
Calculate the ideal throughput for your assembly program, assuming one instruction per clock cycle. How many clock cycles will your program take? How many seconds is that?
Find and describe a real-world pipeline. Include:
1. The number of stages
2. Functionality of each stage
3. Interlocking between stages
4. Any hazards
5. How balance in execution time is maintained
Pipeline hazards equate to arrows flowing right to left on the figure above. Identify the arrows on the diagram above by type and indicate the maximum delay that the hazard can cause.
The three pipeline programs we "executed" during class today are linked to below. Calculate the following for each:
1. The number of instructions that must be executed. Don't forget to account for the loop in program 3. (n.b.: the #-28 in the branch is decimal!)
2. The number of clock cycles the entire program takes, accounting for data and control hazards.
3. The average clock cycles per instruction (CPI) for the program.

Next Lecture

Next lecture:

第7回 Lecture 7: Memory: Caching and Memory Hierarchy

Readings for next time:

Follow-up from this lecture:
- H-P:Appendix A.1 and A.2
- P-H:
For next time:
- H-P: Appendix C.1, C.2 and C.3
- P-H:

コンピューター・アーキテクチャ
Computer Architecture

第6回 6月23日
Lecture 6, June 23: Processors: Basics of Pipelining

Outline of This Lecture

Assembly Programming Using the MIPS Simulator

Stages of Instruction Execution

The MIPS Pipeline

Pipeline Hazards

Final Thoughts

宿題
Homework

Next Lecture

Additional Information

その他

コンピューター・アーキテクチャ Computer Architecture

第6回 6月23日 Lecture 6, June 23: Processors: Basics of Pipelining

Outline of This Lecture

Assembly Programming Using the MIPS Simulator

Stages of Instruction Execution

The MIPS Pipeline

Pipeline Hazards

Final Thoughts

宿題 Homework

Next Lecture

Additional Information

その他

コンピューター・アーキテクチャ
Computer Architecture

第6回 6月23日
Lecture 6, June 23: Processors: Basics of Pipelining

宿題
Homework