慶應義塾大学
2007年度春学期

システム・ソフトウェア
System Software / Operating Systems

2007年度春学期　火曜日2時限
科目コード: 60730
開講場所：SFC
授業形態：講義
担当: Rodney Van Meter
E-mail: rdv@sfc.keio.ac.jp

第2回 4月17日システムコール
Lecture 2, April 17: System Calls

Review of last week: definition of an operating system, history of operating systems
Types of operating systems
Discussion of Lampson: What makes a good system?
System Calls
Discussion of term projects

Review of last week: definition of an operating system, history of operating systems

The four key characteristics of an operating system are:

Resource management/リソース管理
Extension of the machine/拡張マシン
Naming/ネーミング
Data movement/データ転送

Recall that mainframes went from bare metal, to batch systems, to unprotected multiprogramming, to protected multi-user systems. PC operating systems followed the same pattern, and embedded OSes for devices such as PDAs are following the same pattern.

先週の授業の復習です。最初のプログラマブルコンピューターはただ、裸ん坊の金属でプログラムをつくっていたでしょう。次はバッチシステムを開発されました。そのあと、保護のないマルチプログラミング用のOSができて、それから保護のあるOSをできました。PC用と組み込み用のOSは同じ経路で行っています．

Finally, remember that we discussed the light cone of information and its impact on systems, where we must consider that information is always distributed (and therefore out of date), and that systems are definitely concurrent.

光円錐の影響で、今の知っていることが古いでしょう。そのふたつの概念：全ての情報は分散である、とすべてのことは同時で行う可能性がある。

Types of Operating Systems
オペレーティングシステムの種類

Very briefly, let us note that there are a number of types of operating systems. During this class, we will generally focus on multi-purpose PC operating systems, but many of the principles in today's systems inherit from older systems, and many small operating systems will eventually include many of these features.

短いですが、少しOSの種類を説明します。この授業で、普通のマルチパーパス PC用のOSに集中しますが、ほかの種類があると知ってほしい。OSの概念はほとんど変わらない。新しいOSは、既存のOSの特徴を継承する。

Mainframe
Server
(Tanenbaum lists multiprocessor, but all OSes today must be MP-capable)
PC
Real-Time
Embedded （組込み用）
Smart card/micro-OS

Lampson, Hints

(We will discuss some of the slogans in the figure from the paper.)

"The service must have a fairly predictable cost, and must not promise more than the implementer knows how to deliver."
"As long as it is cheap to pass control back and forth, an interface can combine simplicity, flexibility and high performance by solving only one problem and leaving the rest to the client." (How does this compare with our important notion of relativity in systems?)

Number of lines of C in the Linux kernel distribution, 2.6.19:

C files: 5.6 million lines
header (.h) files: 1.36M lines

Printing that out would require a million pages! The original Unix kernel was 5,000 lines; the current Linux kernel is more than 17,000 files! How is Linux following the general dictum to KISS: keep it simple, stupid? Well, partly because the breakdown is actually like this (C files only):

それを印刷なら、１００万ページなっちゃう！最初のUnixカーネルは5,000行だったが、現在のLinuxカーネルは17,000ファイル！KISS: keep it simple, stupidという概念は守っていないんじゃいない？カーネルの中のサブシステムのインタフェースをちゃんと遵奉しているでしょう。

arch: 934869
- 25 types of processors supported
- example: i386: 178 .c files, 68868 lines
block (basic disk drive scheduling): 11551
crypto: 15514
drivers: 3029159
fs (file systems): 640300
- more than fifty different file systems (disk and network supported)
- example: ext3: 22 .c files, 15597 lines
init: 2854
kernel: 66498
lib: 19195
mm (memory management): 37655
net: 438745
scripts: 15673
security: 23368
sound: 394244

More than half of the total volume, and a third of the total number of files, is in drivers for various types of devices, most of which are not used on any given system.

The Linux kernel hackers are generally reasonable about maintaining comments, so consider these numbers to be high by almost a factor of two. Note also that this does not include any of the following:

shell (bash, etc.)
fundamental utilities: file system mount, network management, etc.
X Window System
GNOME

The original Unix system also did not include:

virtual memory
networking (of any form)
parallel processing

In a system of this size, well-defined, high-performance interfaces with minimal hidden assumptions are critical. However, in keeping with Lampson's command to throw the first implementation away, almost all of Linux, including the virtual memory subsystem, has been rewritten frequently (as has Windows!). We will talk more about how to engineer such systems (including "The Cathedral and the Bazaar") later in the term.

"[A]n asymptotically faster algorithm is not necessarily better."
"A system cannot be expected to function well if the demand for any resource exceeds two-thirds of the capacity, unless the load can be characterized extremely well."

System Calls

System calls provide the defined interface between the operating system and user application programs. The system calls generally provide several types of services:

process management
file management and I/O
directory and file system management
network and device I/O (including graphic display)

In Unix and operating systems influenced by Unix, the last category is usually implemented using the file I/O interface, but there are often additional system calls that must be made to, for example, correctly open a network connection.

We discussed resource management, data movement and naming as critical functions of an OS. These system calls are an application's means of requesting that the OS perform one of these functions.

System calls are different from library functions. Many OS environments (whether the library is or is not a part of the OS itself is arguable) provide libraries of functions, often standardized, that application programmers may wish to use. What's the difference? Library functions start by running in user space, though they may also make system calls on behalf of the user process. Library functions perform actions like string formatting, calculating math functions, etc. System calls generally involve access to things that must be protected: disk drives, files on those disk drives, process control structures, etc.

Last week we discussed naming as a critical function of an OS. Humans use a readable form of the name of a system call, such as write(). However, the operating system itself does not actually use the human-readable names. In this case, the C compiler uses header files as a means to translate the human-readable name into a machine-readable one.

Here is part of the list in a Linux 2.6.19 kernel:

[rdv@2 ~]$ more /usr/include/asm/unistd.h 
#ifndef _ASM_I386_UNISTD_H_
#define _ASM_I386_UNISTD_H_

/*
 * This file contains the system call numbers.
 */

#define __NR_restart_syscall      0
#define __NR_exit                 1
#define __NR_fork                 2
#define __NR_read                 3
#define __NR_write                4
#define __NR_open                 5
#define __NR_close                6
...
#define __NR_move_pages         317
#define __NR_getcpu             318
#define __NR_epoll_pwait        319

#endif /* _ASM_I386_UNISTD_H_ */

...that's it. In Linux, there are 319 system calls that do everything.

The execution of a system call occurs in several phases:

user process: prepare the arguments
user process: execute trap or interrupt to reach into kernel
kernel: determine which system call is being requested
kernel: verify permissions, size, type, etc. of arguments (this is a great opportunity for a security hole!)
kernel: execute system call; if I/O wait or other wait is required, put the process to sleep
kernel: when I/O completes, clean up request structures (timers, any allocated memory, etc.), set errno and the return value for the system call
kernel: return
user process: finish application-side processing of the call.

The application side may or may not wrap the system call in a library routine. The compiler will take care of most of that for you.

Note that this same essential structure is followed for making calls to remote servers, as well as to local system services. Again, our principles of distributed and concurrent actions and information (the light cone) applies.

Most system calls are synchronous; your application program stops until the OS completes the call and returns (or decides that it cannot complete, in which case an error is returned).

Looking in a little more detail at the setuid system call:

_syscall1(int,setuid,uid_t,uid);
which will expand to:

_setuid:
  subl $4,%exp
  pushl %ebx
  movzwl 12(%esp),%eax
  movl %eax,4(%esp)
  movl $23,%eax
  movl 4(%esp),%ebx
  int $0x80
  movl %eax,%edx
  testl %edx,%edx
  jge L2
  negl %edx
  movl %edx,_errno
  movl $-1,%eax
  popl %ebx
  addl $4,%esp
  ret
L2:
  movl %edx,%eax
  popl %ebx
  addl $4,%esp
  ret

(This code is a little old, but illustrates the necessary points.)

Term Projects

Everyone seems concerned about their term projects. I should have been prepared to explain them in more detail last week. My apologies.

The goal of your term project is to learn how one particular part of an operating system works, without having to hack the kernel and threaten the stability of your system. You should be able to conduct your term project on your own PC safely.

Schedule and Grading

The schedule is as follows:

Project and team proposals due (on your blog): May 1
Review of proposals returned: May 8
Revised project proposals due: May 15
Final approval of projects, implementation begins: May 22
Weekly reviews begin: May 29
Mid-term progress review: June 12
Final evaluation of projects (face-to-face): July 10-21

That gives you seven to eight weeks to actually implement your project. I would expect that your project will take 30-40 hours total, including writing and debugging code, taking data, analyzing the data, and writing up a report of the results. This is actually not very much time for a project, so they must be sized appropriately.

Your grade on your project will be 40% of your total grade, split 10% for the mid-term progress review June 12, and 30% for the final evaluation. The things I will look for are those detailed in the Levin and Redell paper. Because many of your projects will involve performance measurements, I also expect data with error bars and carefully designed experiments. One great book on the topic is Jain, The Art of Computer Systems Performance Analysis, but there are probably also good books available in Japanese.

Implementation

Some people have asked what language they must write their program(s) in. I don't care what language you use; I care what you learn about the operating system. In order to learn about the OS, a low-overhead, predictable, compiled language is probably preferable. C would be the obvious choice. Interpreted scripting languages are probably bad choices.

Likewise, there is no requirement to perform your project on a particular operating system. Class lectures will focus on principles highlighted by Unix and Linux examples, as above. The importance of Unix in the history of operating systems cannot be overstated, and Linux and MacOS are (arguably) its most vibrant current implementations; students must have some familiarity with the basic ideas of Unix. However, if your OS of choice is Windows, you will learn a great deal by comparing concepts from lectures and the book with what you see on your Windows machine. One obvious advantage of Linux is the easy availability of source code, turning "black box" experiments into "white box" ones.

The first step in either research or development is to identify a problem. Most of these projects will help you carefully characterize a system problem that you might want to attack more thoroughly in research later.

Project Suggestions

The ideal project for this class is probably a performance measurement of the system. Examples include:

Redo, a decade later, the measurements in my USENIX paper, Observing the Effects of Multi-Zone Disks (although this is more hardware than software).
Evaluate the performance of the file system as directories get broader. When there are a thousand files in a directory, how long does it take to look up a file name? When there are a million? (Your file system will likely fail before you get to a million!) Evaluate the performance of the file system as directories get deeper. Does it take constant time to create a new directory? How long does it take to look up a file name as the path gets deeper? Does the file system refuse to go past a certain depth? Is there a limit on total path name length, or is it a limit on the number of directories you've descended? You must characterize both cold-cache and warm-cache performance.
Shared libraries are valuable due to their memory savings, but they are complicated. Compare in detail the performance of a program using a shared library routine such as puts() with one that uses the same routine, but compiled in directly. Be careful to separate out the costs of the system call itself, the library function, and the overhead of finding and loading the shared library. In addition to measurement, you will probably need to actually count instructions manually, using a debugger. The basic C library will certainly already be in memory; can you find an obscure library that has to be loaded from disk, and determine the overhead of the load?
Measure the CPU cost of software RAID (obviously, this requires access to a machine that uses software RAID!).
Compare the cost of malloc() and free() when using multiple threads and using a single thread. Compare when more than one thread is trying to allocate memory, and when multiple threads exist but only one is doing allocation. This project is especially interesting if you do both C and C++; the C++ new() operator will do some surprising things.
Does the file system on your machine allow multiple processes to open the same file for write? If two processes try to write at the same time, what happens? Is the behavior different if each write is the size of a single page (4KB, usually) or if it's an odd size? Is the behavior different over NFS (from the same or two different machines) as it is locally?
Measure the performance of TCP networking on your machine. Determine how many times an incoming or outgoing packet must be copied, and how many times it will cross the memory bus (note that these are not necessarily the same number!). Count how many instructions it takes to process an incoming TCP packet. How many instructions are in the interrupt handler for your link layer (presumably Ethernet)?
Determine how much buffering is needed for smooth audio or video playback. This is tricky, because the OS will be prefetching and caching data, as well.
Choose an important application and compare its performance on Linux, cygwin, and native Windows.
IBM has tools to help you make your Linux system boot faster. Similar tools probably exist for Windows or FreeBSD. If not, porting the tools to one of those operating systems would be valuable.
Everybody complains that starting applications is slow. Take a large app (Evolution, Firefox, etc.) and run strace or the equivalent on it, tracking all of the files that are opened and system calls that are made. Provide details on those: how many bytes are read and written, etc. Can this process be improved using prefetching?

Some of these projects will be more difficult on Windows systems, due to the lack of source code. Some of the measurements will also be difficult if you cannot flush the cache, either by unmounting the file system or rebooting the machine.

I will ask you not just what happened, but why it happened. In most cases, you should be able to point to some kernel source code, or a conference paper, design document, book, or web site that supports your understanding of why the system behaves the way it does. For most of these projects, I will also ask you to predict what will happen as technology continues to improve: modest improvements in clock speed and disk speed, significant increases in number of processors and disk capacity.

Last week I mentioned the OS class at Wisconsin. Here are links to some prior years' projects:

Readings

Readings to match this week's lecture:

Tanenbaum, Sec. 1.3, 1.6
Lampson, Hints for Computer System Design
Levin and Redell, An Evaluation of the Ninth SOSP Submissions -or- How (and How Not) to Write a Good Systems Paper

Homework

This week's homework. You probably want to do the second problem first.

This week we have talked about system calls. Take your "Hello, world" program from last week and produce the assembler output from the compiler. Post the assembler file on your blog.
- Identify the instruction that calls the string output function. Is it a library function or a system call? How can you tell?
- Identify the arguments to the function. How many are there?
- If your string output function is a library function rather than system call, can you find the function and instruction that actually does the trap into the kernel?
Unfortunately, even "Hello, world" is a little bit complicated. Take the following even shorter program and repeat the above exercise.
```
#include <unistd.h>

int main() {
  char *buf = "123";
        write(1, buf, 3);
        return 0;
}
```
(Hint: if you are using Linux, some of the information you need to complete this exercise is above in the lecture notes.)
Find a list of all of the system calls on the OS of your choice, and post it (or a link) on your blog. How many are there?

Next Lecture

Next lecture:

第3回 4月24日プロセスとスレッド
Lecture 3, April 24: Processes and Threads

Readings for next week:

Tanenbaum, 2.1-2.2

システム・ソフトウェア System Software / Operating Systems

第2回 4月17日 システムコール Lecture 2, April 17: System Calls