慶應義塾大学
2007年度 秋学期

コンピューター・アーキテクチャ
Computer Architecture

2007年度秋学期 月曜日3時限
科目コード: XXX / 2単位
カテゴリ:
開講場所:SFC
授業形態:講義
担当: Rodney Van Meter
E-mail: rdv@sfc.keio.ac.jp

第10回 1月7日
Lecture 10, January 7: Systems: Chip Multiprocessors http://www.realworldtech.com/page.cfm?ArticleID=rwt090406012516&p=3

Outline of This Lecture

Review: What's Important?

学期末の試験を書きました。レヴューとして、この概念は大事ですよ。

Additional topics:

Review: Amdahl's Law

Example of Amdahl's Law, parallel and
				serial portions.

Intel's 80-Core Processor

Last year, Intel announced a demonstration 80-core, single-chip multiprocessor capable of 1 teraFLOPS (10^9 32-bit floating point operations per second).

A photomicrograph of the chip, and the basic floor plan: Intel's 80-core processor,
					  photo and floor plan

Block diagram of a single processing element (PE). Note the many read/write ports on the register file. This means that the pipeline exhibits no structural hazards.

Block diagram of a node

The pipeline is 8 stages:

8-stage pipeline

The chip, mounted on a board:

Chip, in situ

Note that each PE has a 3KB instruction memory, and a 2KB data memory. Data can be transferred to the network only from registers, not from memory. Their existing demonstration doesn't include a larger memory, but plans are for 3D packaging, with the memory chip stacked directly on top of the processor chip.

Programming this puppy requires a lot of very careful work to schedule the operations, including transfers to other processors on the network. Obviously, it is a message passing, or distributed memory, multiprocessor in its current form. The network is a 2D mesh.

Sun Niagara

Contrast the above architecture with Sun's Niagara.
Floor plan of Sun Niagara
Niagara memory hierarchy

Sony/Toshiba/IBM Cell Processor

Certainly the most famous chip multiprocessor at the moment is the Cell, used in the Sony PS3.

Cell floor plan Cell chip

宿題
Homework

This week's homework (submit via email):

Intel says their 80-core processor runs at 1TFLOPS when the clock speed is 3.13GHz. Determine:

  1. FLOPS per processor, assuming all 80 cores are working
  2. Average floating point ops/clock cycle
  3. If each pipeline stall costs the full 8 cycles, what percentage of instructions can stall and still meet that performance?
  4. If each pipeline stall costs only one clock cycle, what percentage of instructions can stall and still meet that performance?

Next Lecture

Next week, we will have the first of two lectures on I/O systems.

Next lecture:

第11回 1月18日 金曜日! 入出力
Lecture 11, January 18 (n.b.: Friday!): Basics of I/O and Storage Systems and Designing for Networks

Additional Information

その他