慶應義塾大学
2007年度 春学期
システム・ソフトウェア
System Software / Operating Systemsオペレーティングシステム
第2回 4月15日 システムコール
Lecture 2, April 15: System Callシステムコールs
It From Bit: John Archibald Wheeler, 1911-2008
John Wheeler, from Princeton's web page
Einstein, Yukawa (湯川 秀樹), Wheeler, and ?, from Princeton's web page
Today's Picture
Get
Google to translate this page.
このページをグーグルに翻訳をまかせよう!
- Review of last week: definition of an operating systemオペレーティングシステム, history of
operating systemsオペレーティングシステム
- Thinking Like a Researcher研究者
- Types of operating systemsオペレーティングシステム
- The Basic Idea: Kernel and Processプロセスes
- System Callシステムコールs
- How Big is an Operating Systemオペレーティングシステム?
- Discussion of term projects
- 先週のおさらい: オペレーティングシステムの歴史と定義
- 研究者的に考える
- オペレーティングシステムの種類
- 基本的なアイデア: カーネルとプロセス
- システムコール
- オペレーティングシステムの大きさ
- タームプロジェクトについて
Review of last week: definition of an operating
system, history of operating systemsオペレーティングシステム
The four key roles of an operating systemオペレーティングシステム are:
- Resource management管理/リソース管理
- Extension of the machine/拡張マシン
- Naming/ネーミング
- Data movementデータ転送/データ転送
Recall that mainframes went from bare metal, to batchバッチ systems, to
unprotected multiprogramming, to protected multi-user systems. PC
operating systemsオペレーティングシステム followed the same pattern, and embedded組込み用 OSes for
devices such as PDAs are following the same pattern.
最初にメインフレームとして利用されていたプログラマブルコンピューターは、
金属自身で(つまり部品を切り替えて)プログラムをつくっていました.
次にバッチシステムが開発されました。
そのあと、保護されないマルチプログラミング用のOSが出現し,
次に保護のあるOSをできました。
PC用と組み込み用のOSは同じ道をたどってきました.
Finally, remember that we discussed the light cone光円錐 of information情報 and
its impact on systems, where we must consider that information情報 is
always distributed分散 (and therefore out of date), and that systems are
definitely concurrent同時実行・平行.
最後に
光円錐の影響で、今の知っていることが古いでしょう。そのふたつの概念:全
ての情報は分散である、とすべてのことは同時で行う可能性がある。
Thinking Like a Researcher研究者
研究者の考え方
I asked you to read
the Levin
and Redell paper. That paper will help you learn how to
understand a computer system, as well as the larger issue of learning
how to both analyze分析 existing research, as well as how
to present your own research. Thinking about this issue will
help with writing your master's thesis修論.
前回,
Levin and Redell
paperを読むという宿題を出しました.
この文章はコンピュータシステムへの理解だけではなく,
どのように既存研究を分析するか,
どのように自分の研究を提供するかといった知識を与えてくれるでしょう.
これらの問題に取り組むことはあなたの修士論文の助けになるでしょう.
Everyone always wants to describe what they did, rather
than why they did it, and, ultimately, whether or not what they
did worked, and how they know.
誰もが何をしたのかを話したがり,
何故それが必要かを説明しません.
そして,どういう結果が出て,どうしてそれを知ったのかを説明しないのです.
When you present a project, or an idea, to me, I expect the
following:
皆さんはプロジェクトあるいはアイデアを私に提案するとき,
以下のことを考えます.
- A clear definition of the problem.
- Related work, so that I can see that you know what has been
done before, and why your work might be good.
- The main idea. This is where most people start and end
their analysis.
- A description of how you would evaluate the idea.
- Depending on the stage of development, one of:
- A development plan, including schedule.
- The actual data.
- 明確な問題の定義
- 関連研究.あなたが何をするかわかる前に
- 中心となるアイデア.これは分析のための始点であり終点です.
- どうやってそのアイデアを評価するのかという手順
- 開発段階に応じてどちらかが必須
You will find a slightly updated version of the questions from Levin
and Redell here.
また,皆さんはLevin と Redellからの若干のアップデートされた質問を
ここ.
からみつけることができでしょう.
Types of Operating Systemsオペレーティングシステム
オペレーティングシステムの種類
Very briefly, let us note that there are a number of types of
operating systemsオペレーティングシステム. During this class, we will generally focus on
multi-purpose PC operating systemsオペレーティングシステム, but many of the principles概念 in
today's systems inherit from older systems, and many small operating
systems will eventually include many of these features.
非常に簡単にではありますが,いくつかのOSの種類を説明します.
本抗議では,多目的PCのOS(訳注:皆さんが普段使うようなOSです)
について解決しますが,世の中には多くのOSが存在します.
それらの多くの特徴はほとんど変わらず,
新しいOSは従来のOSから大きく変わるものではないです.
- Mainframe(メインフレーム)
- Server(サーバ)
- (Tanenbaum lists multiprocessプロセスor, but all OSes today must be
MP-capable/タネンバウムはマルチプロセッサをあげていますが,最近のOSはマルチプロセッサをサポートしています)
- PC
- Real-Time(リアルタイムOS)
- Embedded組込み用 (組込み用)
- Smart card/micro-OS(スマートカード/マイクロOS)
The Basic Idea: Kernel and Processプロセスes
基本的な考え:カーネルとプロセス
Modern general-purpose operating systemsオペレーティングシステム user a structure involving
a kernel and user processプロセスes. The kernel executes the
function機能・関数s we described above, managing the system on behalf of all its
users. The application programs that the users want to execute are
run in processプロセスes. The kernel is protected from interference by
applications, so that applications cannot destroy the disk file
system, for example. Applications are also protected from one
another, so that they can keep secrets, and so that each application
can pretend that it owns the entire computer, which dramatically
simplifies creation of application programs.
System Callシステムコールs
System callシステムコールs provide the defined interface between the
operating systemオペレーティングシステム and user application programs. The system callシステムコールs
generally provide several types of services:
システムコールはOSとユーザランドプログラムのインタフェースを
定義します.
基本的なシステムコールは以下のように分類されます.
- processプロセス management管理
- file management管理 and I/O
- directory and file system management管理
- network and device I/O (including graphic display)
- プロセス管理
- ファイル管理, ファイルI/O
- ディレクトリ/ファイルシステムの管理
- ネットワークI/OとデバイスI/O(グラフィックもこの一部です)
In Unix and operating systemsオペレーティングシステム influenced by Unix, the last category is
usually implemented using the file I/O interface, but there are often
additional system callシステムコールs that must be made to, for example, correctly
open a network connection.
Unix,あるいはその影響を強く受けているOS上で,
最後のカテゴリは通常,ファイルI/Oです.
しかし,通常いくつかのシステムコールがファイルI/Oには必要となります.
たとえば,ネットワーク接続が正常に開いているのか,などです.
We discussed resource management管理, data movementデータ転送 and
naming as critical function機能・関数s of an OS. These system callシステムコールs are
an application's means of requesting that the OS perform one of these
function機能・関数s.
前回の授業では,
リソース管理,
データ転送と
名前空間
という重要なOSの要素について扱ってきました.
これらのシステムコースは,アプリケーションの要求する
OSの機能を提供する要素です.
System callシステムコールs are different from library function機能・関数s. Many OS
environments (whether the library is or is not a part of the OS itself
is arguable) provide libraries of function機能・関数s, often standardized, that
application programmers may wish to use. What's the difference?
Library function機能・関数s start by running in user space, though they may also
make system callシステムコールs on behalf of the user processプロセス. Library function機能・関数s
perform actions like string formatting, calculating math function機能・関数s,
etc. System callシステムコールs generally involve access to things that must be
protected: disk drives, files on those disk drives, processプロセス control
structures, etc.
システムコールは通常のライブラリとは違う機能を持ちます.
一般的な多くのOS環境はアプリケーションプログラマが使用したがる
標準化されているライブラリの機能を提供します.
(ライブラリがOSの機能かどうかについては議論の余地があるけれども)
では,システムコールとライブラリは何が違うのでしょうか?
ライブラリはシステムコールを呼ぶものであってもユーザランドからスタートします.
ライブラリファンクションは,
文字列の成型や,数式の計算などを処理します.
システムコールは一般的に,ディスクドライバ,その上のファイル,保護された領域など
にアクセスするために呼ばれます.
Last week we discussed naming as a critical function機能・関数 of an OS.
Humans use a readable form of the name of a system callシステムコール, such as
write(). However, the operating systemオペレーティングシステム itself does not
actually use the human-readable names. In this case, the C compiler
uses header files as a means to translate the human-readable name into
a machine-readable one.
前回,我々はネーミングというOSの重要な機能について議論しました.
通常,人間は可読なシステムコールの名前(たとえば,writeのような)を用います.
しかし,OS自身は人間が可読な名前を持ちません.
つまり,Cコンパイラのヘッダファイルが,マシンが用いる名前と人間の用いる名前を
変換しているのです.
Here is part of the list in a Linux 2.6.19 kernel:
さて,ここにLinux2.6.19のコードの一部を示します.
[rdv@2 ~]$ more /usr/include/asm/unistd.h
#ifndef _ASM_I386_UNISTD_H_
#define _ASM_I386_UNISTD_H_
/*
* This file contains the system callシステムコール numbers.
*/
#define __NR_restart_syscall 0
#define __NR_exit 1
#define __NR_fork 2
#define __NR_read 3
#define __NR_write 4
#define __NR_open 5
#define __NR_close 6
...
#define __NR_move_pages 317
#define __NR_getcpu 318
#define __NR_epoll_pwait 319
#endif /* _ASM_I386_UNISTD_H_ */
...that's it. In Linux, there are 319 system callシステムコールs that do
everything.
Linuxには319のシステムコールが存在することがわかります.
The execution of a system callシステムコール occurs in several phases:
システムコール処理はいくつかのフェーズに分かれます.
- user processプロセス: prepare the arguments
- user processプロセス: execute trap or interrupt割り込み to reach into kernel
- kernel: determine which system callシステムコール is being requested
- kernel: verify permission許可s, size, type, etc. of arguments (this
is a great opportunity for a security hole!)
- kernel: execute system callシステムコール; if I/O wait or other wait is
required, put the processプロセス to sleep
- kernel: when I/O completes, clean up request structures (timers,
any allocated memory, etc.), set errno and the return value for
the system callシステムコール
- kernel: return
- user processプロセス: finish application-side processプロセスing of the call.
- ユーザランド: 引数の準備
- ユーザランド: カーネルランドで処理するためのトラップあるいは割り込み(コンテキストスイッチ)
- カーネルランド: どのシステムコールが呼ばれているかを確定する
- カーネルランド: 引数として与えられた権限,サイズ,種類などを確認する(これはセキュリティホールの原因となる1つの大きな要因となってきた!)
- カーネルランド: システムコールを実行.I/O待ちなどが生じたらプロセスをsleep状態にする
- カーネルランド: I/O処理が終了するとタイマーや割り当てたメモリ領域を解放し,errnoを設定する.そして,システムコールの戻り値を返す.
- カーネルランド: ユーザランドに戻る(コンテキストスイッチ)
- ユーザランド: アプリケーションサイドのシステムコール処理を終える
The application side may or may not wrap the system callシステムコール in a library
routine. The compiler will take care of most of that for you.
アプリケーション側では,ライブラリの中でシステムコールに飛ぶかもしれませんし,飛ばないかもしれません.
コンパイルではこのことを非常に注意深く扱います.
Note that this same essential structure is followed for making calls
to remote servers, as well as to local system services. Again, our
principles概念 of distributed分散 and concurrent同時実行・平行 actions and information情報 (the
light cone光円錐) applies.
このような本質的な構造が,
リモートサーバとの接続もローカルサーバのように扱われているのです.
繰り返しますが,我々の原則である分散かつ並列に処理可能である,
ということが実現されているのです.
Most system callシステムコールs are synchronous; your application program
stops until the OS completes the call and returns (or decides that it
cannot complete, in which case an error is returned).
ほとんどのシステムコールは同期型です.
同期型とは,アプリケーションがシステムコールの動作を完了するまで
停止するということです.
(もちろん場合によってはエラーを返すこともあります)
Looking in a little more detail at the setuid system callシステムコール:
setuidの詳細をみてみましょう.
_syscall1(int,setuid,uid_t,uid);
which will expand to:
_setuid:
subl $4,%exp
pushl %ebx
movzwl 12(%esp),%eax
movl %eax,4(%esp)
movl $23,%eax
movl 4(%esp),%ebx
int $0x80
movl %eax,%edx
testl %edx,%edx
jge L2
negl %edx
movl %edx,_errno
movl $-1,%eax
popl %ebx
addl $4,%esp
ret
L2:
movl %edx,%eax
popl %ebx
addl $4,%esp
ret
(This code is a little old, but illustrates the necessary points.)
このコードは少し古いですが重要な部分は残っています.
So, How Big is an Operating Systemオペレーティングシステム?
オペレーティングシステムって、どのぐらい大きい?
Number of lines of C in the Linux kernel distribution, 2.6.19:
LinuxディストリビューションのC言語の行数は,
- C files: 5.6 million lines(5,600,000行)
- header (.h) files: 1.36M lines(1,360,000行))
Printing that out would require a million pages! The original
Unix kernel was 5,000 lines; the current Linux kernel is more
than 17,000 files! How is Linux following the general
dictum to KISS: keep it simple, stupid? Well, partly because
the breakdown is actually like this (C files only):
なんということだ!これを印刷したら100万ページにもなるじゃないか!
最初のUnixカーネルは5000行だったというのに今のLinuxカーネルは17,000ファイルもある!
LinuxはKISS: keep it simple, stupidという基礎的な概念を守っていてもいいのにね.
- arch: 934869
- 25 types of processプロセスors supported
- example: i386: 178 .c files, 68868 lines
- block (basic disk drive scheduling): 11551
- crypto: 15514
- drivers: 3029159
- fs (file systems): 640300
- more than fifty different file systems (disk and network
supported)
- example: ext3: 22 .c files, 15597 lines
- init: 2854
- kernel: 66498
- lib: 19195
- mm (memory management管理): 37655
- net: 438745
- scripts: 15673
- security: 23368
- sound: 394244
More than half of the total volume, and a third of the total number of
files, is in drivers for various types of devices, most of which are
not used on any given system.
全体量の半分以上と3分の一ものだいるが,
さまざまなデバイスのために用意されています.
そして,そのほとんどが実際にシステムでは使われないのです.
The Linux kernel hackers are generally reasonable about maintaining
comments, so consider these numbers to be high by almost a factor of
two. Note also that this does not include any of the
following:
Linux Kernelのハッカーたちは,メンテナンスコメントの中で,
この大きな2つのサイズのコードについて話します.
ただし,以下のものは含まれていないことに注意してください.
- shell (bash, etc.)
- fundamental utilities: file system mount, network management管理, etc.
- X Window System
- GNOME
The original Unix system also did not include:
また,元のUNIXでは以下のものも含まれないのです.
- virtual memory仮そう記録/仮想メモリ
- networking (of any form)/ネットワーク
- parallel processプロセスing/並列処理
In a system of this size, well-defined, high-performance interfaces
with minimal hidden assumptions are critical. However, in
keeping with Lampson's command to throw the first implementation実装 away,
almost all of Linux, including the virtual memory仮そう記録 subsystem, has been
rewritten frequently (as has Windows!). We will talk more about how
to engineer such systems (including "The Cathedral and the Bazaar")
later in the term.
このレベルのシステムでは,十分な定義がされた最低限の過程を隠した
高機能なインタフェースが極めて重要です.
しかしながら,最初の実装を捨てるためにLampsonのコマンドを守り,
ほぼすべてバーチャルメモリサブシステムを持つLinuxは,
よくに(windowsのように)書き換えられている.
本抗議では,これらのシステムをどのようにエンジニアリングしていくか
を話す予定です.
Term Projects
The goal of your term project is to learn how one particular part of
an operating systemオペレーティングシステム works, without having to hack the kernel and
threaten the stability of your system. You should be able to conduct
your term project on your own PC safely.
Schedule and Grading
The schedule is as follows:
- Project and team proposals due (on your blog): May 1
- Review of proposals returned: May 8
- Revised project proposals due: May 15
- Final approval of projects, implementation実装 begins: May 22
- Weekly reviews begin: May 29
- Mid-term progress review: June 12
- Final evaluation of projects (face-to-face): July 10-21
That gives you seven to eight weeks to actually implement your
project. I would expect that your project will take 30-40 hours
total, including writing and debugging code, taking data, analyzing
the data, and writing up a report of the results. This is actually
not very much time for a project, so they must be sized
appropriately.
Your grade on your project will be 40% of your total grade, split 10%
for the mid-term progress review June 12, and 30% for the final
evaluation. The things I will look for are those detailed in the Levin
and Redell paper. Because many of your projects will involve
performance measurements, I also expect data with error bars and
carefully designed experiments. One great book on the topic is Jain,
The
Art of Computer Systems Performance Analysis, but there are
probably also good books available in Japanese.
Implementation実装
Some people have asked what language they must write their program(s)
in. I don't care what language you use; I care what you learn about
the operating systemオペレーティングシステム. In order to learn about the OS, a low-overhead,
predictable, compiled language is probably preferable. C would be the
obvious choice. Interpreted scripting languages are probably bad
choices.
Likewise, there is no requirement to perform your project on a
particular operating systemオペレーティングシステム. Class lectures will focus on principles概念
highlighted by Unix and Linux examples, as above. The importance of
Unix in the history of operating systemsオペレーティングシステム cannot be overstated, and
Linux and MacOS are (arguably) its most vibrant current
implementation実装s; students
must have some familiarity with the basic ideas of Unix.
However, if your OS of choice is Windows, you will learn a great deal
by comparing concepts from lectures and the book with what you see on
your Windows machine. One obvious advantage of Linux is the easy
availability可用性 of source code, turning "black box" experiments into
"white box" ones.
The first step in either research or development is to identify a
problem. Most of these projects will help you carefully characterize
a system problem that you might want to attack more thoroughly in
research later.
Project Suggestions
The ideal project for this class is probably a performance measurement
of the system. Examples include:
- Redo, a decade later, the measurements in my USENIX paper, Observing
the Effects of Multi-Zone Disks (although this is more hardware
than software).
- Evaluate the performance of the file system as directories get
broader. When there are a thousand files in a directory,
how long does it take to look up a file name? When there are a
million? (Your file system will likely fail before you get to a
million!) Evaluate the performance of the file system as directories get
deeper. Does it take constant time to create a new directory? How
long does it take to look up a file name as the path gets deeper?
Does the file system refuse to go past a certain depth? Is there a
limit on total path name length, or is it a limit on the number of
directories you've descended? You must characterize both cold-cache
and warm-cache performance.
- Shared libraries are valuable due to their memory savings, but
they are complicated. Compare in detail the performance of a
program using a shared library routine such as puts() with
one that uses the same routine, but compiled in directly. Be
careful to separate out the costs of the system callシステムコール itself, the
library function機能・関数, and the overhead of finding and loading the shared
library. In addition to measurement, you will probably need to
actually count instructions manually, using a debugger. The basic C
library will certainly already be in memory; can you find an obscure
library that has to be loaded from disk, and determine the overhead
of the load?
- Measure the CPU cost of software RAID (obviously, this requires
access to a machine that uses software RAID!).
- Compare the cost of malloc() and free() when using
multiple threads and using a single thread. Compare when more than
one thread is trying to allocate memory, and when multiple threads
exist but only one is doing allocation. This project is especially
interesting if you do both C and C++; the C++ new() operator
will do some surprising things.
- Does the file system on your machine allow multiple processプロセスes to
open the same file for write? If two processプロセスes try to write at the
same time, what happens? Is the behavior different if each write is
the size of a single page (4KB, usually) or if it's an odd size? Is
the behavior different over NFS (from the same or two different
machines) as it is locally?
- Measure the performance of TCP networking on your machine.
Determine how many times an incoming or outgoing packet must be
copied, and how many times it will cross the memory bus (note that
these are not necessarily the same number!). Count how many
instructions it takes to processプロセス an incoming TCP packet. How many
instructions are in the interrupt割り込み handler for your link layer
(presumably Ethernet)?
- Determine how much buffering is needed for smooth audio or video
playback. This is tricky, because the OS will be prefetching and
caching data, as well.
- Choose an important application and compare its performance on
Linux, cygwin, and native Windows.
- IBM has tools
to help you make your Linux system boot faster. Similar tools
probably exist for Windows or FreeBSD. If not, porting the tools to
one of those operating systemsオペレーティングシステム would be valuable.
- Everybody complains that starting applications is slow. Take a
large app (Evolution, Firefox, etc.) and run strace or the
equivalent on it, tracking all of the files that are opened and
system callシステムコールs that are made. Provide details on those: how many
bytes are read and written, etc. Can this processプロセス be improved using
prefetching?
Some of these projects will be more difficult on Windows systems, due
to the lack of source code. Some of the measurements will also be
difficult if you cannot flush the file system cache, either by
unmounting the file system or rebooting the machine.
I will ask you not just what happened, but why it
happened. In most cases, you should be able to point to some kernel
source code, or a conference paper, design document, book, or web site
that supports your understanding of why the system behaves the way it
does. For most of these projects, I will also ask you to predict what
will happen as technology continues to improve: modest improvements in
clock speed and disk speed, significant increases in number of
processプロセスors and disk capacity.
Last Year's Projects
Some of the projects from last year:
- Kanai-san (TA) investigated the behavior of several systems when
faced with many different IPv6 Router Advertisements. He got
probably the most interesting results of any student last year. He
discovered a security hole which resulted in security reports to
several OS vendors, doing a real service to the world.
- One student had noticed that a particular application performed
well on Windows XP but not on Vista. He investigated that
difference, determining how many system callシステムコールs were made, how long
they took, etc.
- One student had noticed that his web server performed poorly when
the log file size exceeded a certain size, and investigated
why.
- One investigated the performance and accuracy of sleep
timers.
- One investigated the impact of varying the security level of the
system on various system callシステムコールs, using lmbench.
Others
The OS class at Wisconsin works on a project basis. Here are links to
some prior years' projects:
Readings
Readings to match this week's lecture:
Homeworkかだい
This week's homeworkかだい. You probably want to do the second problem
first.
- This week we have talked about system callシステムコールs. Take your "Hello,
world" program from last week and produce the assembler output from
the compiler. Post the assembler file on your blog.
- Identify the instruction that calls the string output function機能・関数.
Is it a library function機能・関数 or a system callシステムコール? How can you tell?
- Identify the arguments to the function機能・関数. How many are there?
- If your string output function機能・関数 is a library function機能・関数 rather than
system callシステムコール, can you find the function機能・関数 and instruction that actually
does the trap into the kernel?
- Unfortunately, even "Hello, world" is a little bit complicated.
Take the following even shorter program and repeat the above
exercise.
#include <unistd.h>
int main() {
char *buf = "123";
write(1, buf, 3);
return 0;
}
(Hint: if you are using Linux, some of the information情報 you need to
complete this exercise is above in the lecture notes.)
- Find a list of all of the system calls on the OS of your
choice, and post it (or a link) on your blog. How many are there?
Next Lecture
Next lecture:
第3回 4月22日 プロセスとスレッド
Lecture 3, April 22: Processプロセスes and Threads
Readings for next week:
その他 Additional Information情報