Get Google to translate this page. このページをグーグルに翻訳をまかせよう!
But no one actually found the instruction that traps into the kernel. Everyone stopped at a call instruction, which calls a subroutine but not the kernel itself. Even the apparent calls to a function called write are actually library calls.
On my Linux Fedora Core 6 box using an i686 kernel with an Intel Celeron M microprocessor, the assembly version of the program looks something like this:
.file "tmp-write.c" .section .rodata .LC0: .string "123" .text .globl main .type main, @function main: leal 4(%esp), %ecx andl $-16, %esp pushl -4(%ecx) pushl %ebp movl %esp, %ebp pushl %ecx subl $36, %esp movl $.LC0, -8(%ebp) movl $3, 8(%esp) movl -8(%ebp), %eax movl %eax, 4(%esp) movl $1, (%esp) call write movl $0, %eax addl $36, %esp popl %ecx popl %ebp leal -4(%ecx), %esp ret .size main, .-main .ident "GCC: (GNU) 4.1.1 20070105 (Red Hat 4.1.1-51)" .section .note.GNU-stack,"",@progbitsThe starting of the actual program proceeds roughly as follows:
(magic number check finds ELF executable) _start __libc_start_main@plt _dl_runtime_resolve _dl_fixup (approximately 1400 instructions later...) _init call_gmon_start (only for programs using gmon monitoring) (approximately 100 instructions later...) mainOnce we get to main, it's only thirteen instructions to write(), right? Not quite. That call write instruction actually calls a library wrapper routine that does various things before actually making the system call...
call write _dl_runtime_resolve _dl_fixup _dl_lookup_symbol_x (calls strcmp, do_lookup_x...) (approximately 700 instructions later, hit a break point...) (gdb) stepi 0x0021e018 in write () from /lib/libc.so.6 1: x/i $pc 0x21e018: jne 0x21e03c (gdb) 0x0021e01a in __write_nocancel () from /lib/libc.so.6 1: x/i $pc 0x21e01a <__write_nocancel>: push %ebx (gdb) 0x0021e01b in __write_nocancel () from /lib/libc.so.6 1: x/i $pc 0x21e01b <__write_nocancel+1>: mov 0x10(%esp),%edx (gdb) 0x0021e01f in __write_nocancel () from /lib/libc.so.6 1: x/i $pc 0x21e01f <__write_nocancel+5>: mov 0xc(%esp),%ecx (gdb) 0x0021e023 in __write_nocancel () from /lib/libc.so.6 1: x/i $pc 0x21e023 <__write_nocancel+9>: mov 0x8(%esp),%ebx (gdb) 0x0021e027 in __write_nocancel () from /lib/libc.so.6 1: x/i $pc 0x21e027 <__write_nocancel+13>: mov $0x4,%eax (gdb) 0x0021e02c in __write_nocancel () from /lib/libc.so.6 1: x/i $pc 0x21e02c <__write_nocancel+18>: call *%gs:0x10 (gdb) 0x0095c400 in __kernel_vsyscall () 1: x/i $pc 0x95c400 <__kernel_vsyscall>: int $0x80 Breakpoint 4, 0x001bc400 in __kernel_vsyscall () (gdb) where #0 0x001bc400 in __kernel_vsyscall () #1 0x0027b033 in __write_nocancel () from /lib/libc.so.6 #2 0x08048387 in main () at tmp-write.c:7 (gdb) x/2i __kernel_vsyscall 0x1bc400 <__kernel_vsyscall>: int $0x80 0x1bc402 <__kernel_vsyscall+2>: ret (gdb)
[rdv@localhost linux-2.6.19]$ find . -name \*.c -print | xargs grep do_fork | more ./kernel/fork.c:long do_fork(unsigned long clone_flags, ./kernel/fork.c: * functions used by do_fork() cannot be used here directly ./arch/um/kernel/process.c: pid = do_fork(CLONE_VM | CLONE_UNTRACED | flags, 0, ./arch/um/kernel/process.c: panic("do_fork failed in kernel_thread, errno = %d", pid); ./arch/um/kernel/syscall.c: ret = do_fork(SIGCHLD, UPT_SP(¤t->thread.regs.regs), ./arch/um/kernel/syscall.c: ret = do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, ./arch/um/sys-i386/syscalls.c: ret = do_fork(clone_flags, newsp, ¤t->thread.regs, 0, parent_tid, ./arch/um/sys-x86_64/syscalls.c: ret = do_fork(clone_flags, newsp, ¤t->thread.regs, 0, pare nt_tid, ./arch/cris/arch-v10/kernel/process.c: return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, ®s , 0, NULL, NULL); ./arch/cris/arch-v10/kernel/process.c: return do_fork(SIGCHLD, rdusp(), regs, 0, NULL, NULL); ./arch/cris/arch-v10/kernel/process.c: return do_fork(flags, newusp, regs, 0, parent_tid, child_tid); ./arch/cris/arch-v10/kernel/process.c: return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, rdusp(), regs, 0, NULL, NULL); ./arch/cris/kernel/process.c: * sys_clone and do_fork have a new argument, user_tid --More--involves three processes connected via two pipes: one each for find, xargs, and more. Oh, and the xargs actually repeatedly forks off calls to grep, which use xarg's stdout, so the total number of processes involved is actually large.
In a VMS system, this kind of operation was substantially more tedious; pipes are one of the features that made Unix such a hacker's paradise. (In VMS, such IPC was usually done either with temporary files, or using an explicit construct known as a mailbox, although it was also possible to redirect the input and output files, known as SYS$INPUT and SYS$OUTPUT.)
Pipes work well partly because of the simplicity of naming due to the semantics of fork. The pipe system call gives a simple example, showing how two file descriptors are created for pipe and shared through the fork. (There is also a special form of pipe known as a named pipe which we aren't going to discuss, but you might want to look up if you are interested.)
Pipes also allow for a simple form of parallel processing and asynchronous operation; the first process in the pipeline can be reading data from the disk and buffering it through the pipe to the second process, without the inherent complexities of asynchronous read operations. The first process automatically blocks when the pipe's buffers are full, and the second process automatically blocks when it tries to read from an empty pipe, and is awakened when data arrives.
The alternative is message passing. Message passing involves copying the data from one process to the other, which is less efficient, but has lots of advantages. Control of buffers is much clearer, the contents can't be modified, and the messages also serve as a natural means of synchronizing and ordering interactions.
There is an excellent simulation at Northwestern University.
A full mathematical analysis is beyond our purposes at the moment, but very roughly, if we treat time as discrete, the probability of all n philosophers not currently eating at a particular time and all attempting to get forks at the same time is roughly (fp)^n where f is the fraction of time that a philosopher is eating, and p is the probability that, when not eating, she will attempt to eat. This should also give you some idea of how difficult it is to debug such a problem: reproducing the problem is always the first step, and that will be difficult.
There are many solutions to the problem; one simple one is to have everyone put down their first fork when they fail to get the second fork, pause for a few moments (randomly), then try again. (What are the problems with this?) Another is priority, either with or without preemption, ordering the philosophers and letting them decide in order whether or not to pick up forks at a particular moment. (What are the problems with this solution?)
One useful technique for finding deadlocks when they occur, or for designing locking systems, is the concept of a resource graph. Entities that currently hold or desire certain locks are nodes in the graph. A directed link is drawn from each node that desires a resource currently held by someone else to the node holding the resource. If there is a cycle in the graph, you have deadlock.
In file systems and databases, it is often necessary to present one consistent version of a data structure. It may not matter whether a reader sees the version before a change to structure takes place or after, but during would be a problem. The simplest solution is to put a simple lock on the structure and only allow one process to read or write the data structure at a time. This, unfortunately, is not very efficient.
By recognizing that many people can read the data structure without interfering with each other, we divide the lock into two roles: a read lock and a write lock. If a process requests either role, and the structure is idle, the lock is granted. If a reader arrives and another reader already has the structure locked, then both are allowed to read. If a writer arrives and the structure is locked by either a reader or another writer, then the writer blocks and must wait until the lock is freed.
So far, so good. We have allowed multiple readers or a single writer. The tricky part is to prioritize appropriately those who are waiting for the lock. Causing the writer to wait while allowing new readers to continue to enter risks starving the writer. The simplest good solution is to queue new read requests behind the write request. It's not necessarily efficient, depending on the behavior of the readers, but it will usually do.
Another alternative is to create versions of the data structure; rather than modifying it in place, copy part of the structure so that it can be modified, and allow new readers to continue to come in and access the old version. Once the modification (writing) is complete, a simple pointer switch can put the new version in place so that subsequent readers will get the new version.
In file systems or databases, there are tradeoffs between fine-grained locking and large-grained locking. Fine-grained locking is a lot harder to get right, and there are a lot more locking operations so the efficiency of the locking operation itself must be considered, but it allows more concurrency.
To understand this problem, you need to know the most basic facts about scheduling: most schedulers support a priority mechanism, and no lower-priority task gets to run as long as a higher-priority one wants the CPU. (We will discuss variations on this next week, but the basic idea holds.) The system included numerous tasks, each assigned a different role. There are several that we care about for the purposes of this discussion:
This problem was solved by enabling a mechanism known as priority inheritance. Assume a low-priority task is holding a particular resource. When a higher-priority task requests the resource, and blocks waiting for it, the lower-priority task then inherits the priority of the task waiting for resources it holds.
The general problem of synchronization and resource locking, which Tanenbaum lumps in with IPC, requires a moderate amount of hardware support, but allows both HW and SW solutions. This area is one of the places where you can most directly see the impact of theory on practice.
Two things on the homework list for this week (no submission of homework is required yet):
第5回 5月14日 プロセススケジューリング
Lecture 5, May 14: Process Scheduling
Readings for next week and followup for this week: