Remaining lectures: 5. MIPS assembly HW: MIPS matmul 6. pipelining HW: none 7. memory, cache HW: cache 8. VM 9. Data parallelism: OpenMP, CUDA in-class exercise HW: OpenMP, submit CUDA 10. Distributed-memory parallel systems: Interconnects, Fugaku 11. I/O Systems, Error Correction, RAID HW: last year's exam 12. Putting it all together: a look back at the x86 and MIPS assembly ----- Homeworks for Computer Architecture 2020 1. a. Calculate array addresses by hand b. Multiply two matrices by hand c. Multiply in pseudocode 3. Matrix multiply in assembler (MIPS) 4. Cache a. bitfields b. just simple calculations on efficiency 5. Parallel a. CUDA matrix multiply (cookbook on colab, just submit as proof you did it) b. OpenMP parallel (same as previous years) 6. Test (last year's final exam as a homework) *** Final deadline for all homeworks is August 1, 2020! ***
We have seen this diagram before:
You already know that data is stored in memory, and the actual computation is done by the CPU. Starting today, we are looking at the inside of that CPU: what work it performs, and how, in order to complete a computation.
コンピューターが命令を実行する。その命令はコンパイラーが人間の 読めるプログラムから通訳してある。例えば:
LOAD R1, A ADD R1, R3 STORE R1, A
This example shows three instructions, to be executed sequentially. The first instruction LOADs a value into register R1 from memory (we will come back to how the value that is loaded into R1 is found in a minute). The second instruction ADDs the contents of register R3 into register R1, then the third instruction STOREs the result into the original memory location.
ちょっと抽象的な絵ですが:
This figure is a little bit on the abstract side, but:
簡単に説明すると、CPUはこの機能の部品がある:
It is the job of the compiler to decide how to use the registers, stack, and heap most efficiently. Note that these functions apply to both user programs, or applications, and the operating system kernel.
It's easy to think of an algorithm in terms of the arithmetic that must be performed. In fact, you could argue that the only important work is that arithmetic. However, you should be aware that much of the work actually done by the CPU is not that arithmetic directly, but instead is various kinds of supporting work to enable that arithmetic: moving data around, and deciding what work should be done next. We can categorize the most common instructions used in an algorithm into three groups:
Besides these three groups, there are instructions that control the state of the processor itself, turning on and off various features of the CPU, some of which we will talk about when we talk about virtual memory. Additional instructions include those necessary to support operating system calls and device I/O, such as interrupts.
There are many different ways to build a complete instruction set for a CPU. The broadest classification is to divide them based on how the operands for an arithmetic operation are brought into the ALU. Can they come directly from memory, or do they have to be used from registers? A common taxonomy is:
Immediate | ADD R4,#3 |
Regs[R4] ← Regs[R4] + 3 |
Register | ADD R4,R3 |
Regs[R4] ← Regs[R4] + Regs[R3] |
Register Indirect | ADD R4, (R1) |
Regs[R4] ← Regs[R4] + Mem[Regs[R1]] |
Displacement | ADD R4, 100(R1) |
Regs[R4] ← Regs[R4] + Mem[100+Regs[R1]] |
In general, the arithmetic instructions are either two address or three address. Two-address operations modify one of the operands, e.g.
ADD R1, R3 ; R1 = R1 + R3whereas three-address operations specify a separate result register, e.g.
ADD R1, R2, R3 ; R3 = R1 + R2(n.b.: in some assembly languages, the target is specified first; in others, it is specified last.)
The MIPS architecture, developed in part by Professors Patterson and Hennessy, is relatively easy to understand. Its instructions are always 32 bits, of which 6 bits are the opcode (giving a maximum of 64 opcodes). rs and rt are the source and target registers, respectively. (Those fields are five bits; how many registers can the architecture support?) Instructions are one of three forms:
Note: The MIPS simulator is available on SourceForge. The original spim page is still on the web.
You will need vecadd.s, a short MIPS assembly program.
Here are screen shots from spim (Linux), xspim (Linux), PCspim (Windows) and spim and QtSpim (Mac) versions of the tool.
Things to note in the images above, as well as your own execution runs:
Next lecture:
第6回 プロセッサー:パイプラインの基本
Lecture 6: Processors: Basics of Pipelining
Readings for next time: