慶應義塾大学
2008年度 春学期
システム・ソフトウェア
System Software / Operating Systemsオペレーティングシステム
第5回 5月20日 プロセススケジューリング
Lecture 5, May 20: Processプロセス Scheduling
Outline
- Past Homeworkかだいs
- dining philosophers哲学者, Ethernet, etc.
- fork()
- Memory copy
- Projects!
- Scheduling
- Basic priority scheduling (return to Mars)
- Goals of scheduling
- Batchバッチ scheduling: FCFS, SJF
- CPU scheduling: round robin, etc.
- I/O priority boost
- Fairness公平: by user or processプロセス?
- Thread scheduling
- Multiprocessプロセスor scheduling
- Realtime: deadline scheduling
- Bonus: scheduling in multithreaded architectures
- Current scheduling research
Past Homeworkかだいs
There are a couple of homeworkかだい blogs that I do not have the right
mapping for real name/user ID to blog. If I have not posted
comments on your blog about your homeworkかだい and given you a grade on
it, please send me email.
Dining Philosophers哲学者, Ethernet, etc.
The principal way in which Ethernet's CSMA/CD resource control is
similar to the dining philosophers哲学者 is:
- Both are random, time-based competitions for resources. In
both, users compete for access to a desired resource, and when they
do not get it, they must wait; there is no guarantee that any
individual will get access in a particular time period.
The principal ways in which CSMA/CD differs from the philosophers
are:
- CSMA/CD always allows forward progress. Except for the
possibility of a user dying while holding the resource (extremely
rare, but possible with Ethernet, and usually indicative of a
hardware problem or very low-level firmware problem), as long as
each user uses the resource for only a finite amount of
time, someone will make forward progress, and therefore
eventually everyone will be satisfied.
- Ethernet is a single resource; unlike the philosophers,
only one user can work at a time.
Fork()
Several people had trouble with the fork() exercise, and no one
measured the performance truly satisfactorily.
Various problems showed up:
- Several of you neglected to do a wait() on a child
processプロセス. Without a wait, the program is likely to run nearly forever,
as the parents keep dying and leaving behind only the most minimum of
resources to keep the tree alive. The goal was to have a large number
of processプロセスes alive simulataneously in order to determine the limit,
and not all of your programs achieved that.
- Some of you did not recurse properly, instead doing the looping in
the parent, which again made finding the limit on actual processプロセスes
difficult. It also made measuring the processプロセス creation time
difficult, as the processプロセス deletion time is included.
- Many of you did not use perror(). Learn to use it
effectively.
- A few people did describe their computing environment, which is
important; most of you did not.
- Only a few people commented their code; starting with next week's
homeworkかだい, comments count toward your grade!
- No one repeated the experiment enough times to gather any statistic統計s
and determine the actual performance. Next week we will talk
about performance measurement in more detail.
Memory Copy
This week's homeworkかだい included writing a program to copy memory and
measure its performance. How did you fare? There are many ways to do
the copy, and many potential pitfalls, especially in the performance
measurement:
- Granularity of the clock
- Lazy allocation of memory?
- Overhead of system callシステムコール or library call
- Cache effects
- Virtual memory仮そう記録 (VM) effects
- Random interference from the system
- Misaligned memory copies
- Are you measuring wall-clock time or CPU time? User or system?
Projects
I am still not seeing much movement on your projects. I expect you
to schedule a meeting with me (voice or in person) to discuss your
project by the end of next week.
Basic Priority Scheduling
Last week we saw basic priority scheduling, in the example of VxWorks
on Mars.
Goals of Scheduling
- Throughput
- Fairness公平
- Responsiveness
- Effective utilization of all resources
Batchバッチ Scheduling
- Charging
- First Come, First Served (FCFS)
- Short Job First (SJF)
- CPU-bound v. I/O-bound Jobs
Scheduling for large batchバッチ machine servers, such as those that processプロセス
databases, concentrates on throughput, measured in jobs per
hour. Charging in these systems is generally done in dollars
per CPU hour, so it is important to keep the CPU as busy as possible
in order to make as many dollars as possible.
The simplest approach of all is first come, first served
(FCFS). In FCFS, jobs are simply executed in the order in which
they arrive. This approach has the advantage of being fair;
all jobs get the processプロセスing they need in a relatively predictable
time.
Better still, in some ways, is Shortest Job First (SJF). SJF
is provably optimal for minimizing the wait time, among a fixed set of
jobs. However, in order to maintain fairness公平, one has to be careful
about continuing to allow new jobs to join the processプロセスing queue
ahead of older, longer jobs. Moreover, actually determining which
jobs will be short is often a manual processプロセス, and error-prone, at
that. When I was a VMS systems administrator, we achieved an
equivalent effect by having a high-priority batchバッチ queue and a
low-priority batchバッチ queue. The high-priority one was used only rarely,
when someone suddenly needed a particular job done quickly, and
usually for shorter jobs than the low-priority batchバッチ queue.
If CPU is the only interesting resource, FCFS does well. But in
reality, computers are complex machines with multiple resources that
we would like to keep busy, and different jobs have different
characteristics. What if one job would like to do a lot of disk I/O,
and another is only using the CPU? We call these I/O-bound and
CPU-bound jobs, respectively. FCFS would have the disk busy
for the first one, then the CPU busy for the second one. Is there a
way we can keep both busy at the same time, and improve overall
throughput?
next instruction executed is dependent on
the current state of the machine. What chunks of main memory
are already stored in the cache? What is the disk head position? (We
will study disk scheduling more when we get to file systems.)
CPU Scheduling
- Multiprogramming
- Cooperative multitasking
- Preemptive multitasking (most common)
- Time quantum
- Round-robin scheduling
In the discussion of batchバッチ scheduling, we were talking about job
scheduling: deciding which large computation is important enough
to run next, but then not really worrying about it until the job
ends. But most jobs do some I/O, and leaving the CPU idle while the
I/O completes is wasteful. Instead, we can use the CPU for another
processプロセス while the I/O completes. Such a system is
multiprogrammed. In addition to involuntarily giving up the
CPU to complete some I/O, most systems support voluntarily giving up
the CPU. In the first version of MacOS, such cooperative
multitasking was the only form; now it and almost all other major
OSes use preemptive multitasking, in which the operating systemオペレーティングシステム
can take the CPU away from the application. Obviously, cooperative
multitasking makes solving problems such as deadlock easier.
You're already familiar with multitasking operating systemsオペレーティングシステム; no
self-respecting OS today allows one program to use all of the
resources until it completes, then picks the next one. Instead, they
all use a quantum of time; when the processプロセス that is currently
running uses up a certain amount of time, its quantum is said to
expire, and the CPU scheduler is invoked. The CPU
scheduler may choose to keep running the same processプロセス, or may choose
another processプロセス to run. This basic approach achieves two major goals:
it allows us to balance I/O-bound and CPU-bound jobs, and it allows
the computer to be responsive, and give the appearance that it
is paying attention to your job.
This basic concept of a multiprogrammed system was developed
for mainframe hardware with multiple terminal端末s attached to the same
computer; fifty people or more might be using the same machine. As we
discussed in the first lecture, the concept was pioneered by the
Compatible Time Sharing System (CTSS), created at MIT by
Fernando Corbató and his collaborators and students.
In this environment, it makes sense to give some priority to
interactive jobs, so that human time is not wasted. Batchバッチ jobs still
run, but at a lower priority than interactive ones. But how do you
pick among multiple interactive jobs? The simplest approach is
round-robin scheduling, in which the jobs are simply executed
for their quantum, and when the quantum expires, the next one in the
list is taken and the current one is sent to the back of the list. It
is important to select an appropriate quantum.
In round-robin scheduling, if we have five compute-bound tasks, they
will execute in the order
ABCDEABCDEABCDE
We have already seen the basic idea of priority scheduling a couple of
times. Usually, priority scheduling and round-robin scheduling are
combined, and the priority scheduling is strict. If any
processプロセス of a higher priority is ready to run, no lower-priority
processプロセス gets the CPU. If batchバッチ jobs are given lower priority than
those run from interactive terminal端末s, this has the disadvantage of
making it attractive for users to run their compute-bound jobs in a
terminal端末 window, rather than submitting them to a batchバッチ queue.
To guarantee that batchバッチ jobs make at least some progress, it is
also possible to divide the CPU up so that, say, 80 percent of the CPU
goes to high-priority jobs and 20 percent goes to low-priority jobs.
In practice, this is rarely necessary.
I/O Priority Boost
One important technique in scheduling is to give a priority
boost to tasks that are I/O bound, and possibly decrease the
priority of CPU-bound tasks. This approach increases responsiveness
and helps to keep all of the system resources busy, improving
throughput, but implementing it correctly is tricky.
Fairness公平: by User or by Processプロセス?
If, in the above set of tasks, A through D belong to one
user, and E belongs to another, which is the right approach?
AEBECEDEAEBECEDE or
ABCDEABCDE?
Multiprocessプロセスor Scheduling
A little basic queueing theory: a single queue for multiple servers
(CPUs) is better than separate queues for each server (CPU). Okay,
then why does Linux put each processプロセス on a particular CPU and leave it
there? Two reasons: simplifying locking and improving the performance
of the kernel itself, and improving the behavior of the CPU's memory
cache.
This field was heavily researched in the 1980s, and due to the rapid
increase in multicore systems, will no doubt be important in commodity
operating systemsオペレーティングシステム for the next several years, especially the
interaction of thread scheduling and CPU scheduling.
Thread Scheduling
- Threads can be scheduled in the same way.
In a prior lecture, we discussed the implementation実装 of threads at user
level, and at kernel level. If the threads are user level, they are
often more efficient, but usually have to be cooperatively scheduled,
and the kernel can't help put them on separate CPUs. For
kernel-implemented threads, the OS can, potentially, share them out to
separate CPUs. In practice, if each CPU has its own cache, this is
difficult to do correctly, and the performance penalty is large.
Bonus: Instruction and Thread Scheduling in
Multithreaded Architectures
- Multiple issue microprocessプロセスors
- Multithreaded architectures
There are a number of fascinating things happening in computer
architecture that affect scheduling. Modern CPUs are multiple
issue; more than one instruction is executed in each clock cycle.
The most extreme form of this is the TRIPS architecture from
the University of Texas at Austin, where the goal is to issue one
thousand instructions in every clock cycle!
At the other end, one important experiment is in multithreaded
architectures, in which the CPU has enough hardware to support
more than one thread, under limited circumstances. The most extreme
form of this was the Tera Computer, which had hardware support
for 128 threads and always switched threads on every
clock cycle. This approach allowed the machine to hide the latency to
memory, and work without a cache. It also meant that the overall
throughput for the system was poor unless a large number of processプロセスes
or threads were ready to execute all of the time.
Realtime: Deadline Scheduling
- Realtime scheduling is done by establishing deadlines
for certain tasks.
Airplanes falling from the sky, death and destruction all around. We
don't want that, do we? Then don't play Tetris on your flight
avionics hardware...
We should have come to this earlier, but it didn't fit into the flow
above. One important class of scheduling algorithms is
deadline scheduling algorithms for realtime systems.
Unix Batchバッチ Systems
nice, batchバッチ, at, and cron are Unix tools
for managing priorities and submitting and controlling batchバッチ jobs,
including those to be executed at later times. All of them are lousy
compared to the equivalent tools for VMS and mainframes, and I don't
know why. OpenPBS is a batchバッチ system that can be installed;
I've never used it.
Current Scheduling Research
- Kono Lab built tools to allow background jobs to more
effectively assess their impact on the system.
Homeworkかだい
The goal of this week's homeworkかだい is to explore scheduling and
understand its impact on system performance and especially the
performance of individual processプロセスes. As such, the goal is to load
your computer heavily with compute-bound processプロセスes
- Take last week's memory copy program, and modify it to
fork() to a certain depth, then have each one of the processプロセスes
time the copy of a certain amount of memory. Your program
should take three arguments, the depth, the amount of memory to copy,
and the number of times to copy the memory. Ideally, the amount of
memory to copy should be large enough to require several seconds, but
that's not practical, so have it repeat the copy some number of
times. For example, fork five processプロセスes, malloc() ten
megabytes each, and have the processプロセスes copy that memory one hundred
times.
- Run the program with a depth of one, and report how long it
takes. Repeat this program twenty-five times and report the
average and the individual times.
- Now run the program with a depth of five. Again, repeat at
least five times and report the average. Is the average higher than
five times the depth one case? Why?
- Plot the density function機能・関数 for the execution time. (You
should have twenty-five data points here for copying a gigabyte of
memory each, for the depth one and depth five cases.) Is there more
variability in the depth five case?
- Run one copy of your program at the normal priority, and at the
same time a second copy at lower priority, e.g. by using nice.
Does the first one completely monopolize the CPU until it is finished?
- Find and report the time quantum for your particular system.
- Schedule a meeting with me (voice or in person) to discuss your
project by the end of next week.
Next Lecture
Next lecture:
第6回 5月15日 メモリ管理と仮想記憶
Lecture 6, May 27: Memory Management管理 and Virtual Memory仮そう記録
Next week we will also talk about performance measurement.
Readings for next week and followup for this week:
Follow-up:
その他 Additional Information情報