慶應義塾大学
2008年度 秋学期
コンピューター・アーキテクチャ
Computer Architecture
第2回 10月7日 Lecture 2, October 7:
Fundamentals of Computer Design
Outline of This Lecture
- Basic Architecture
- System Diagram
- Outline of a CPU
- Instructions: the Basic Idea
- Quantitative Principles of Design
- Take Advantage of Parallelism
- Principle of Locality
- Focus on the Common Case
- Amdahl's Law
- The Processor Performance Equation
- (Local Bonus: Achieve Balance)
- Homework
System Diagram
命令:基本の概念
Instructions: the Basic Idea
Computers execute instructions, which are usually compiled by
a compiler, a piece of software that translates human-readable
(usually ASCII) code into computer-readable binary.
コンピューターが命令を実行する。その命令はコンパイラーが人間の
読めるプログラムから通訳してある。例えば:
LOAD R1, A
ADD R1, R3
STORE R1, A
This example shows three instructions, to be executed sequentially.
The first instruction LOADs a value into register R1
from memory (we will come back to how the value that is loaded into R1
is found in a minute). The second instruction ADDs the
contents of register R3 into register R1, then the third
instruction STOREs the result into the original memory
location.
Depending on the instruction, the data may be one of several sizes
(using common modern terminology):
- A byte (8 bits, today)
- A half word (16 bits)
- A word (32 bits)
- A double word (64 bits)
CPU: the Central Processing Unit
ちょっと抽象的な絵ですが:
簡単に説明すると、CPUはこの機能の部品がある:
- Instruction fetcherは命令をメモリーから読む。 The
instruction fetcher reads instructions from memory.
- Instruction decoderはその命令どのことか、処理する部分。メモ
リーからデータを読まなければならいかどうかを決める。
The instruction decoder decides what type of instruction is
being executed, and fetches data from memory if necessary.
- Memory interfaceは命令のために、メモリーを読んだり書いたり
する。 The memory interface reads and writes data from memory for
the instructions.
- RegistersはCPUの中のメモリーです。The registers are the
on-chip memory.
- ALU, Arithmetic and Logic Unit,は数学と論理の命令を実行する。
The ALU is the Arithmetic and Logic Unit actually
executes, as the name says, arithmetic and logical instructions.
メモリー(記録):レジスター、スタック、ヒープ
Memory: Registers, Stacks, and Heaps
- Registers: Special memory inside the CPU chip. There are
typically only a few registers in a CPU. They are fast, but
expensive.
- Main Memory: Random Access Memory (RAM) is the
largest amount of memory in your system; you may have 512 megabytes
or more in your laptop. RAM is typically used in several ways:
- Stack: Also called a push-down stack, this area
of memory is used to keep values used as local variables
by functions in the program.
- Heap: Memory allocated to hold global variables
for the program.
- Binary/text segment: The program itself.
It is the job of the compiler to decide how to use the registers,
stack, and heap most efficiently. Note that these functions apply to
both user programs, or applications, and
the operating system kernel.
定量てきなデザイン概念
Quantitative Principles of Design
Last time, we talked about Hennessy & Patterson's Five Principles:
- Take Advantage of Parallelism
- Principle of Locality
- Focus on the Common Case
- Amdahl's Law
- The Processor Performance Equation
I would add to this one imperative: Achieve Balance.
Take Advantage of Parallelism
Parallelism can be found by using multiple processors on different
parts of the problem, or multiple functional units (floating point
units, disk drives, etc.), or by pipelining, dividing an
individual computer instruction into several parts and executing the
parts of different instructions at the same time in different parts of
the CPU.
Principle of Locality
Programs and data tend to reuse data and instructions that have been
recently used. There are two forms of locality: spatial
and temporal. Locality is what allows a cache memory to
work.
Focus on the Common Case
The things that are done a lot should be fast; the things that are
rare may be slow.
Amdahl's Law
Amdahl's Law tells us how much improvement is possible by
making the common case fast, or by parallelizing part of the
algorithm. In the example below, 3/5 of the algorithm can be
parallelized, meaning that three times as much hardware applied to the
problem gains us only a reduction from five time units to three.
Some problems, most famously graphics, are known as "embarrassingly
parallel" problems, in which extracting parallelism is trivial, and
performance is primarily determined by input/output bandwidth and the
number of processing elements available. More generally, the
parallelism achievable is determined by the dependency graph.
Creating that graph and scheduling operations to maximize the
parallelism and enforce correctness is generally the shared
responsibility of the hardware architecture and the compiler.
プロセッサー・パフォマンス定式
The Processor Performance Equation
CPU time = |
(seconds
)/
program
|
= |
(Instructions
)/
program
|
× |
(Clock cycles
)/
Instruction
|
× |
(Seconds
)/
Clock cycle
|
宿題
Homework
This week's homework (submit via SFS):
- Take your "hello, world" program from last time and compile to
assembly code and submit the assembly code. Also, answer the
following questions:
- Where does your program start?
- How many instructions are in the main body of the program
(between that starting point and the exit or return)?
- Draw a diagram similar to the Amdahl's Law diagram above, for the
parallelism exercise we did in class.
- Read the text for next week.
Next Lecture
Next lecture:
第3回 10月14日
Lecture 3, October 14: Processors: Arithmetic
Readings for next time:
- Follow-up from this lecture:
- For next time:
Additional Information
その他