慶應義塾大学
2007年度春学期

システム・ソフトウェア
System Software / Operating Systemsオペレーティングシステム

2007年度春学期　火曜日2時限
科目コード: 60730
開講場所：SFC
授業形態：講義
担当: Rodney Van Meter
E-mail: rdv@sfc.keio.ac.jp

第9回 6月5日ファイルシステム
Lecture 9, June 5: File Systems

Outline

Correction
Why You Should Never, Ever Trust a Computer
What's a File?
Using Files
What's a File System?
Other File Types

Corrections/Clarifications

Capability, not credentials!
- Capabilities can be shared or transferred between users or processプロセスes
- A name, if hard to forge or guess, can be a capability
- Because anyone can use the capability once holding it, you have to keep it secret (encrypt?) when you transfer it.
Shapley-Curtis debate and Hubble

Why You Should Never, Ever Trust a Computer

Ken Thompson's compiler hack
- login backdoor
- cc double-hacked

What's a File?

Regular file: non-volatile storage for (user) data, managed by the OS

A regular file (the only kind we will deal with for the moment) is a collection of data stored on some non-volatile storage managed by the operating systemオペレーティングシステム.

The most common model of file today is the simple byte stream, but originally file I/O involved a much more hardware-oriented view and/or more sophisticated, database-oriented system services. There have been numerous types of basic files:

Block files
Byte stream (assumes 8-bit bytes!)
Indexed files (record-oriented)
Files with forks

Using Files

File Access APIs: Access can be
- buffered or unbuffered
- Structured or unstructured
- Synchronous or asynchronous非同期
File Session APIs
File Management管理 APIs

File Access APIs

The file access APIs most commonly used today were developed on Unix, but they are far from the only ones. Even on Unix, there are several varieties, some system callシステムコールs and some library routines, all oriented toward C:

read() and write(): the system callシステムコールs
fread() and fwrite(): stdio libraries for structured I/O
fscanf() and fprintf(): formatted ASCII
mmap(): memory-mapped I/O
(rooted in TOPS-20 or earlier)
aio_read() and aio_write(): asynchronous非同期 I/O
(VMS had an implementation実装, but the roots go back further)
Additional routines for character I/O
readv() and writev(): scatter-gather I/O
sync()

On OpenVMS, there were several block and record-oriented services:

Block oriented: SYS$READ, SYS$WRITE
Record oriented: RMS$GET, RMS$FIND

One extremely important feature of read() and write() is that they allow arbitrary alignment of the application's buffer. In the VMS block-oriented calls, applications had to align write data on 512-byte boundaries.

Regular files represent non-volatile data stored on disk. Using the standard APIs, however, the data may not be committed to disk when the write() call completes; the data may still be buffered in the system somewhere (e.g., in a special place in kernel memory). Again our concept of relativity相対性理論, (相対性理論) comes into play: the information情報 we have has not yet flowed to its final resting place. The data is guaranteed to be committed to disk once the close() is complete, or once a sync() call is complete. Note that data does not necessarily all land on disk in the order in which you wrote it! If you care, you should sync() every time you need guaranteed ordering of the writes. Also note that the semantics allow close() or sync() to fail even after the write() has succeeded!

File Session APIs

...What's missing from the above list?

open() and close()
file locking

Most file systems use the concept of a file session, spanning the time you first ask to access the file to the time you are done with it. The way you ask to access a file is via the open(), which requires a file name.

Most file APIs use the concept of a file pointer. The file pointer represents the current position in the file where the application is reading or writing. For regular files, it can be adjusted to the beginning, end, or any arbitrary position in the file. For byte stream files, the pointer is just an integer representing the byte offset; for record-oriented files, it is a more complex structure. The file pointer is part of the kernel's structure that it uses to track which files a processプロセス is accessing. The pointer assists the kernel in managing the reading and writing of file data.

File Management管理 APIs

creat()
unlink()
link() (hard or soft?)
chmod()

What's a File System?

Key services
File Naming: Directories, Devices and Mount Points
Security
Other Metadata

Key Services

We have discussed individual files, but what's a file system? A file system is the overall structure that holds the user files (and sometimes other things, which we won't discuss today). A file system provides a number of services:

non-volatile storage of user files, including space management管理
naming services for files
enforcement of security
on some systems, such as VMS, versioning of files
support for backup
sometimes, locking

We will talk about space management管理 next week.

File Naming: Directories, Devices and Mount Points

Current operating systemsオペレーティングシステム all store individual files in directories, or folders. A directory provides the mapping from a name to the actual file. When we speak of the files that are referenced by the directory, we say that a directory contains those files, but it's not literally true; the directory actually contains only pointers to the files.

Directories can contain other directories, known as subdirectories, creating a hierarchical namespace, or a directory tree. A complete path name may look like /home/rdv/keio/file1.

In Unix terminology, the normal mapping from name to file is a hard link. A regular file can have more than one hard link, or more than one name. A file with more than one hard link is not really deleted until the last link is deleted. Files can also have soft links, which are just name to name mappings that are held in the file system, but which do not participate in the actual management管理 of the file. If the file is deleted but the soft link is not, the soft link is referred to as a dangling reference. One reason for the existence of soft links is to allow linking to directories without violating the requirement that each directory has a single parent. Another is to allow linking across partition boundaries or mount points.

In a Unix system, there is a single root to the directory tree. Applications and users only rarely have to know on which disk their data is stored. System managers can expand parts of the directory tree by mounting other file systems in any place in the tree.

In many other operating systemsオペレーティングシステム, the devices are explicitly named. On Windows, they have names such as C:\RDV\KEIO\file1, where the colon separates the device and the directory. On VMS, it could be SYS$HOME:[rdv.keio]file1.

Note that a name for a file is non-volatile, but not permanent; files can be renamed by users and applications, or the system manager may change the mount point and hence the full path name to a file. This behavior creates problems for long-term tracking of data, and numerous research systems (including Plan 9) have attempted to address this need.

Some file systems support case-sensitive file names, others do not. You have probably also noticed that sometimes non-ASCII file names are not printed properly. Most file systems originally assumed ASCII file names, and non-ASCII names are a problem because the character sets are not self-describing. NTFS solves this problem by storing all names in Unicode.

Security

All operating systemsオペレーティングシステム operate with some sort of security model, and the most important feature of that model is its plan for file protections. Multi-user OSes, by definition, understand users; the user is usually the basis of decisions about permission許可s. Unix and VMS both make a distinction between the user, the user's group, and "other" users.

Permission許可s, on Unix and its descendants, consist of read, write, and execute. Execute is used for both programs, and for directories.

An important, independent concept is that of an Access Control List, or ACL. An ACL matches specific permission許可s to either specific users, or to users holding a particular identifier or access token, which may be encrypted, and may be transferable from one user or processプロセス to another.

Other Metadata

The file system uses several forms of metadata, or data about the data, to manage files. We have seen two of the most important types of metadata already: names, and security information情報. But there are other types, which may or may not be supported:

File layout information情報
Timestamps: created, modified, accessed, backed up
File type
File size (last byte)
File storage (which may be either more or less than the file size)
Block size
Realtime info
User-definable metadata

Some of these attributes are preserved across backup and restore, or transfer to another system via ftp. Others are not, and sometimes problems occur.

Way back at the beginning, I mentioned file forks but didn't discuss them. Forks were originally developed for the Macintosh, to hold file icons. NTFS has a similar feature called file streams. These forks really blur the boundary between system metadata and user data, and typically are not preserved when files move between systems.

Other File Types

pipes (named or unnamed)
device special files
sockets
device raw files

Homeworkかだい, Etc.

Homeworkかだい

This week's homeworkかだい involves testing the limits of your file system. None of these actions should cause problems with your file system, but be careful! And be certain to check the return values from your system callシステムコールs.

Report on your system configuration! What type of file system are you working on?
Determine how long a file name your system supports. First, create a temporary directory to work in. Try creating files with different length names and see what happens.
Now see how many files you can practically put in this directory. Time the creation of new files. How long does it take to create files 1-10? 101-110? 1,001-1,010? 10,001-10,010? Keep going until performance is too bad to continue, then time the deletion of the files. Plot the performance.
Now do something similar for depth: what happens as you create deeper directory trees?
1. Do this problem once with relative path names. Each time you create a directory, chdir() into it before proceeding.
2. Repeat using absolute path names.
Weekly progress report on your term project.

Next Lecture

Next lecture:

第10回 6月12日ファイル・システムの実装
Lecture 10, June 24: File System Implementation実装s

Readings for next week and followup for this week:

The Shapley-Curtis Debate at Wikipedia and at NASA
Ken Thompson's C compiler hack
VMS System Services
OpenVMS Record Management管理 Services (RMS)

システム・ソフトウェア System Software / Operating Systemsオペレーティングシステム

第9回 6月5日 ファイルシステム Lecture 9, June 5: File Systems