I will be out of town on October 19, so lecture for that day will be rescheduled. Since it is a fairly standalone lecture, the order of lectures won't be changed, we will do that one at some other date, TBD. Please do come when it is scheduled, since that lecture is one of the most fun of the semester!
Last week, we discussed some performance graphs, plotting (a) wall-clock time, (b) speedup, and (c) efficiency versus the number of threads used on a particular problem. This week, each of you will see how to take that data, and how to recreate the graphs. You need R, an account on a Unix-like machine with more than one core and an OpenMP-capable compiler, and the code available from the link below at Homework.
Note on compilers: On the Mac, Apple's supplied compiler is called gcc, but it is really the Clang compiler, not the true GNU gcc. That would be fine, since it's mostly compatible, but there are some features of one not present in the other. In particular, for this exercise you will need the parallel extensions for C known as OpenMP, and Clang does not provide them. You can install gcc5 (or gcc4) for Mac on your machine, or you can install a Vagrant VirtualBox. I have done both. If you want to use gcc-5 on your Mac, remember to change the first line in both Makefiles to:
CC=gcc-5
That should allow you to compile the OpenMP program properly.
#define LOOP 1000 #define REGISTER_SIZE 20Edit the file and fix those parameters, if they are different.
1 440 435 0.2 2 229 451 0.36 3 161 475 0.42 4 122 482 0.4
> ARMDATA1 <- matrix(scan("armstrong-one-run.dat"),ncol=4,byrow=T)
Read 48 items
> plot(ARMDATA1[,1],ARMDATA1[,2])
> x <- seq(1,16)
> y = 440/x
> points(x,y,type="l")
> help(plot)
> plot(ARMDATA1[,1],ARMDATA1[,2],log="y")
> points(x,y,type="l")
> plot(ARMDATA1[,1],ARMDATA1[,2],log="xy")
> points(x,y,type="l")
Here is some data I took this morning:
Van-Meter-Rodneys-MacBook-Pro:vagrant-files rdv$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Clearing any previously set forwarded ports...
==> default: Clearing any previously set network interfaces...
==> default: Preparing network interfaces based on configuration...
default: Adapter 1: nat
==> default: Forwarding ports...
default: 22 => 2222 (adapter 1)
==> default: Running 'pre-boot' VM customizations...
==> default: Booting VM...
==> default: Waiting for machine to boot. This may take a few minutes...
default: SSH address: 127.0.0.1:2222
default: SSH username: vagrant
default: SSH auth method: private key
default: Warning: Connection timeout. Retrying...
default: Warning: Remote connection disconnect. Retrying...
==> default: Machine booted and ready!
==> default: Checking for guest additions in VM...
==> default: Mounting shared folders...
default: /vagrant => /Users/rdv/new/quantum/vagrant-files
==> default: Machine already provisioned. Run `vagrant provision` or use the `--provision`
==> default: to force provisioning. Provisioners marked to run always will still run.
Van-Meter-Rodneys-MacBook-Pro:vagrant-files rdv$ vagrant ssh
Last login: Mon Oct 5 00:17:27 2015 from 10.0.2.2
-bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
[vagrant@localhost ~]$ more /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 70
model name : Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
stepping : 1
cpu MHz : 1997.136
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pa
t pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good pn
i ssse3 lahf_lm
bogomips : 3994.27
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 70
model name : Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
stepping : 1
cpu MHz : 1997.136
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pa
t pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good pn
i ssse3 lahf_lm
bogomips : 3994.27
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 70
model name : Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
stepping : 1
cpu MHz : 1997.136
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pa
t pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good pn
i ssse3 lahf_lm
bogomips : 3994.27
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 70
model name : Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
stepping : 1
cpu MHz : 1997.136
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pa
t pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good pn
i ssse3 lahf_lm
bogomips : 3994.27
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
[vagrant@localhost ~]$
[vagrant@localhost ~]$ cd architecture/
[vagrant@localhost architecture]$ ls
architecture-hw1-111007.tgz data src
[vagrant@localhost architecture]$ cd src
[vagrant@localhost src]$ ls
AAAREADME.txt hw1 qulib
[vagrant@localhost src]$ cd hw1/
[vagrant@localhost hw1]$ ls
Makefile hw1 main.c main.o
[vagrant@localhost hw1]$ time ./hw1
real 0m7.044s
user 0m27.129s
sys 0m1.013s
[vagrant@localhost hw1]$ export OMP_NUM_THREADS=1
[vagrant@localhost hw1]$ time ./hw1
real 0m10.428s
user 0m10.386s
sys 0m0.038s
[vagrant@localhost hw1]$ export OMP_NUM_THREADS=2
[vagrant@localhost hw1]$ time ./hw1
real 0m7.412s
user 0m14.370s
sys 0m0.447s
[vagrant@localhost hw1]$ export OMP_NUM_THREADS=3
[vagrant@localhost hw1]$ time ./hw1
real 0m6.995s
user 0m20.348s
sys 0m0.625s
[vagrant@localhost hw1]$ export OMP_NUM_THREADS=4
[vagrant@localhost hw1]$ time ./hw1
real 0m6.860s
user 0m26.468s
sys 0m0.943s
[vagrant@localhost hw1]$ export OMP_NUM_THREADS=5
[vagrant@localhost hw1]$ time ./hw1
real 0m7.361s
user 0m21.475s
sys 0m1.526s
[vagrant@localhost hw1]$ export OMP_NUM_THREADS=6
[vagrant@localhost hw1]$ time ./hw1
real 0m7.391s
user 0m21.657s
sys 0m1.494s
[vagrant@localhost hw1]$ export OMP_NUM_THREADS=8
[vagrant@localhost hw1]$ time ./hw1
real 0m6.954s
user 0m20.681s
sys 0m1.600s
[vagrant@localhost hw1]$ export OMP_NUM_THREADS=16
[vagrant@localhost hw1]$ time ./hw1
real 0m6.762s
user 0m20.034s
sys 0m2.429s
[vagrant@localhost hw1]$
I ran this on a Vagrant VirtualBox running CentOS on my Mac. You can see that the performance improvement was only modest. Why do you think that is?
Let's talk about Hennessy & Patterson's Five Principles:

| CPU time = | (seconds )/ program | = | (Instructions )/ program | × | (Clock cycles )/ Instruction | × | (Seconds )/ Clock cycle |
(Here's the fun part...)
The source you need, including script files, are available in a tar file here.
The specification for OpenMP, and a "summary card" for C and C++, are available here. The latest version is 3.1, but there is a Japanese version of the 3.0 spec available. 最新のバージョンは3.1だが、3.0の日本語版はあり ますよ!
This week's homework (submit via SFS, due 10/19):
/* XXX parallelizing this loop is tricky, but it's a "big" loop, so worth doing... */ #pragma omp parallel for schedule(static,8192) private(j,k,z)
#pragma omp parallel for private(j,k,z)
第3回 10月12日 プロセッサー:命令の基本
Lecture 3, October 12: Processors: Basics of Instruction Sets
以下で、P-Hはコンピュータの構成と設計~ハードウエアとソフトウエアの インタフェース 第3版、 H-Pはコンピュータアーキテクチャ 定量的アプローチ 第4版.
Below, P-H is Computer Organization and Design: The Hardware-Software Interface, and H-P is Computer Architecture: A Quantitative Approach. Readings for next time: