慶應義塾大学
2012年度秋学期

コンピューター・アーキテクチャ
Computer Architecture

2012年度秋学期　火曜日3時限
科目コード: 35010 / 2単位
カテゴリ:
開講場所：SFC
授業形態：講義
担当: Rodney Van Meter
E-mail: rdv@sfc.keio.ac.jp

第3回 10月14日 Lecture 3, October 14:
Fastest!

Outline of This Lecture

Amdahl's Law and Dependency Graphs Revisited
Gustafson-Barsis Law
Reduction
Synchronization Barriers
Locks
Homework/課題

Amdahl's Law and Dependency Graphs Revisited

Amdahl's Law

The parallelism achievable is determined by the dependency graph. Creating that graph and scheduling operations to maximize the parallelism and enforce correctness is generally the shared responsibility of the hardware architecture and the compiler.

Let's look at it mathematically:

Speedup =

(1 )/ (1-P) + P/N

(N )/ (1-P)N + P

Question: What is the limit of this as N goes to infinity?

See the description of Amdahl's Law on Wikipedia.

Amdahl's Law can also be applied to serial problems. An example adapted from Wikipedia:

If your car is traveling 50km/h, and you want to travel 100km, how long will it take?

After one hour, your car speeds up to 100km/h. What is your average speed? If your car becomes infinitely fast, what is the average speed? More importantly, what's the minimum time for the complete trip?

Gustafson-Barsis Law

Now go back to the example above. In practice, when your car gets faster, it becomes possible for you to go farther.

For the first hour, your car runs at 50km/h. After one hour, your car speeds up to 100km/h. What's the limit of your average speed if you lengthen your trip?

Gustafson's Law (or the Gustafson-Barsis Law) basically says that parallelism gives you the freedom to make your problem bigger. 25 years ago, we thought that 100,000 processors or 1,000,000 processors was ridiculous, because Amdahl's Law limited their use. Today, systems in that size range are increasingly common, and it's because of Gustafson-Barsis.

See Gustafson's Law on Wikipedia.

The fundamental observation is this:

[I]n practice, the problem size scales with the number of processors.

Reduction

In OpenMP, this can be achieved via something like

#pragma omp parallel for reduction(+:result)
for ( i = 0 ; i < n ; i++ ) {
  result += array[i];
}

Synchronization Barriers

Locks

宿題
Homework

The only assignment for this week is to finish the homework for last week. Any questions?

The specification for OpenMP, and a "summary card" for C and C++, are available here. The latest version is 3.1, but there is a Japanese version of the 3.0 spec available. 最新のバージョンは3.1だが、3.0の日本語版はありますよ！

All of these problems involve variants of the particles program, available on the Berkeley Parallel Bootcamp exercise page. For each problem, execute for n = 500, 1000, 2000 particles. Plot the execution time. You should execute each value five times and report the mean and standard deviation.

Chalk sketches of the graphs I want are here and here.

The simplest option is probably to do the work on armstrong and use the OpenMP version of the program, but you can do the last exercise with either pthreads or MPI, if you want, and you can use any machine(s) where you have the proper tools available.

First, the serial version.
Second, the existing version of the pthreads program.
1. First, for -p 1 (one thread). Compare to the serial version.
2. Next, for 2, 3, 4, 6, 8, and 16 threads.
Third, the existing version of the OpenMP program.
1. First, for one thread. Compare to the serial version. (You may have to modify the code to allow you to select the number of threads.)
2. Next, for 2, 3, 4, 6, 8, and 16 threads.
Pick one of the parallel programs: pthreads, OpenMP, or MPI. Solve the problem stated at the Berkeley Parallel Bootcamp exercise page:
The existing programs all perform poorly because too much information gets shared around; each per-particle loop looks at all of the other particles, which is unnecessary. Your job is to make the program scale better with the number of particles and processes or threads, by reducing the number of particles that each one examines.

Next Lecture

Next lecture:

第3回 10月21日実験的な並列化
Lecture 4, October 21: Experimental Parallelism

Additional Information

日本語のWikipedia on Flynn's taxonomy
The Landscape of Parallel Computing Research: A View from Berkeley, Technical Report No. UCB/EECS-2006-183, December 2006.
Berkeley's Parallel Boot Camp
Intel's Academic Community (a fantastic resource for parallel computing)
Intel's 日本語 education page
Class top page
Elsevier's web page for the textbook.
My web page on system software.

コンピューター・アーキテクチャ Computer Architecture

第3回 10月14日 Lecture 3, October 14: Fastest!