慶應義塾大学
2013年度 春学期
システム・ソフトウェア
System Software / Operating Systems
第9回 6月11日 ファイルシステム
Lecture 9, June 11: File Systems
Outline
- Passing This Class
- Final Project
- What's a File?
- Using Files
- What's a File System?
- Other File Types
Passing This Class
- So far, no one has submitted a project proposal via SFS,
um, because there was no homework place to do so! That has been
fixed. You have one week to submit them! Without a completed
project, you will not pass this class!
- You are responsible for submitting five programming
homeworks and several reports on readings.
Quoting from the first lecture:
Your grade will be determined as follows:
- 授業の討論/Class participation: 10%
- 宿題/Homework: 40%
- 学期のプロジェクト/Term project: 50%
This year, heavier emphasis will be placed on the term project than
on homeworks. You should expect your project to take 40-60
hours. There will be five homework assignments over the
course of the semester, and readings almost every week. At least
one of the programming assignments will involve parallel
programming.
Summarizing the currently-assigned work:
Project:
* Proposal: (overdue, get them in this week!)
* final report & interview due: July 16-26
Programming:
HW1/L1. hello, world
HW2/L2. system call & assembly
HW3/L3. fork() + Berkeley parallel programming exercise (important!)
L4. (none)
L5. (none)
L6. (none)
L7. (none)
HW4/L8. study socket definition
L9. (none)
L10. TBD
L11. TBD
L12. TBD
L13. TBD
L14. TBD
Required Reading:
HW1/L1. Levin & Redell (reading & writing papers)
HW2/L2. Lampson, hints (system design)
L3. (none)
HW4+5/L4. Scherer, scalable sync queues (example of recent work
coupling OS principles to language design and broader systems design)
L5. (none)
HW?/L6. Raymond, Cathedral & Bazaar (software development philosophy)
L7. (none)
L8. (none)
L9. (none)
L10. TBD
L11. TBD
L12. TBD
L13. TBD
L14. TBD
Recommended Textbook Reading:
(various things, see the week-by-week readings)
Recommended Paper & Website Reading:
L1. (none)
L2. (none)
L3. (none)
L4. (none)
L5. Larus, spending Moore's dividend
L7. several things on page replacement
L8. several things on the Morris worm, Ken Thompson's hack, Yoshifuji on Linux IPv6
L9. Sweeney, XFS
L10. TBD
L11. TBD
L12. TBD
L13. TBD
L14. parallel debugging from SOSP 2009
Submit a proposal for your final report, including the topic and the list of papers you will read and review.
Final Project
Requirements, as discussed in class:
- minimum of 6 papers
- one from ACM Computing Surveys
- one from ACM Transactions on Computing Systems
- one from Communications of the ACM or IEEE Computer
- at least two from top systems conferences
- at least one more than ten years old, one less than three years old
Contents of the final paper:
- taxonomy (organization of ideas)
- key ideas
- timeline of development
- key people and organizations
- influence on production systems
Examples of top systems conferences:
- USENIX Annual Technical Conference (USENIX ATC)
- Symposium on Operating Systems Principles (SOSP)
- Operating Systems Design and Implementation (OSDI)
- EuroSys
- Architectural Support for Programming Languages and Operating
Systems (ASPLOS)
- Network Systems Design and Implementation (NSDI)
- International Symposium on Computer Architecture (ISCA)
- SIGCOMM
- Principles of Distributed Systems (PODS)
What's a File?
- Regular file: non-volatile storage for (user)
data, managed by the OS
A regular file (the only kind we will deal with for the moment)
is a collection of data stored on some non-volatile storage
managed by the operating system.
The most common model of file today is the simple byte stream, but
originally file I/O involved a much more hardware-oriented view and/or
more sophisticated, database-oriented system services. There have
been numerous types of basic files:
- Block files
- Byte stream (assumes 8-bit bytes!)
- Indexed files (record-oriented)
- Files with forks
Using Files
- File Access APIs: Access can be
- buffered or unbuffered
- Structured or unstructured
- Synchronous or asynchronous
- File Session APIs
- File Management APIs
File Access APIs
The file access APIs most commonly used today were developed on Unix,
but they are far from the only ones. Even on Unix, there are several
varieties, some system calls and some library routines, all oriented
toward C:
- read() and write(): the system calls
- fread() and fwrite(): stdio libraries for
structured I/O
- fscanf() and fprintf(): formatted ASCII
- mmap(): memory-mapped I/O
(rooted in TOPS-20 or earlier)
- aio_read() and aio_write(): asynchronous I/O
(VMS had an implementation, but the roots go back further)
- Additional routines for character I/O
- readv() and writev(): scatter-gather I/O
- sync()
On OpenVMS, there were several block and record-oriented services:
- Block oriented: SYS$READ, SYS$WRITE
- Record oriented: RMS$GET, RMS$FIND
One extremely important feature of read() and write() is
that they allow arbitrary alignment of the application's
buffer. In the VMS block-oriented calls, applications had to
align write data on 512-byte boundaries.
Regular files represent non-volatile data stored on disk. Using the
standard APIs, however, the data may not be committed to disk
when the write() call completes; the data may still be buffered
in the system somewhere (e.g., in a special place in kernel memory).
Again our concept of relativity, (相対性理論) comes into play:
the information we have has not yet flowed to its final resting place.
The data is guaranteed to be committed to disk once the close()
is complete, or once a sync() call is complete. Note that data
does not necessarily all land on disk in the order in which you wrote
it! If you care, you should sync() every time you need
guaranteed ordering of the writes. Also note that the semantics allow
close() or sync() to fail even after the write()
has succeeded!
File Session APIs
...What's missing from the above list?
- open() and close()
- file locking
Most file systems use the concept of a file session, spanning
the time you first ask to access the file to the time you are done
with it. The way you ask to access a file is via the open(),
which requires a file name.
Most file APIs use the concept of a file pointer. The file
pointer represents the current position in the file where the
application is reading or writing. For regular files, it can be
adjusted to the beginning, end, or any arbitrary position in the
file. For byte stream files, the pointer is just an integer
representing the byte offset; for record-oriented files, it is a more
complex structure. The file pointer is part of the kernel's structure
that it uses to track which files a process is accessing. The pointer
assists the kernel in managing the reading and writing of file
data.
File Management APIs
- creat()
- unlink()
- link() (hard or soft?)
- chmod()
What's a File System?
- Key services
- File Naming: Directories, Devices and Mount Points
- Security
- Other Metadata
Key Services
We have discussed individual files, but what's a file system?
A file system is the overall structure that holds the user files (and
sometimes other things, which we won't discuss today). A file system
provides a number of services:
- non-volatile storage of user files, including space management
- naming services for files
- enforcement of security
- on some systems, such as VMS, versioning of files
- support for backup
- sometimes, locking
We will talk about space management next week.
File Naming: Directories, Devices and Mount Points
Current operating systems all store individual files in
directories, or folders. A directory provides the
mapping from a name to the actual file. When we speak of the files
that are referenced by the directory, we say that a directory
contains those files, but it's not literally true; the
directory actually contains only pointers to the files.
Directories can contain other directories, known as
subdirectories, creating a hierarchical namespace, or a
directory tree. A complete path name may look like
/home/rdv/keio/file1.
In Unix terminology, the normal mapping from name to file is a hard
link. A regular file can have more than one hard link, or more
than one name. A file with more than one hard link is not really
deleted until the last link is deleted. Files can also have soft
links, which are just name to name mappings that are held in the
file system, but which do not participate in the actual management of
the file. If the file is deleted but the soft link is not, the soft
link is referred to as a dangling reference. One reason for
the existence of soft links is to allow linking to directories without
violating the requirement that each directory has a single parent.
Another is to allow linking across partition boundaries or
mount points.
In a Unix system, there is a single root to the directory
tree. Applications and users only rarely have to know on which disk
their data is stored. System managers can expand parts of the
directory tree by mounting other file systems in any place in
the tree.
In many other operating systems, the devices are explicitly named. On
Windows, they have names such as C:\RDV\KEIO\file1, where the colon
separates the device and the directory. On VMS, it could be
SYS$HOME:[rdv.keio]file1.
Note that a name for a file is non-volatile, but not permanent; files
can be renamed by users and applications, or the system manager
may change the mount point and hence the full path name to a file.
This behavior creates problems for long-term tracking of data, and
numerous research systems (including Plan 9) have attempted to address
this need.
Some file systems support case-sensitive file names, others do not.
You have probably also noticed that sometimes non-ASCII file names are
not printed properly. Most file systems originally assumed ASCII file
names, and non-ASCII names are a problem because the character sets
are not self-describing. NTFS solves this problem by storing all
names in Unicode.
Security
All operating systems operate with some sort of security model,
and the most important feature of that model is its plan for file
protections. Multi-user OSes, by definition, understand users;
the user is usually the basis of decisions about permissions.
Unix and VMS both make a distinction between the user, the user's
group, and "other" users.
Permissions, on Unix and its descendants, consist of read,
write, and execute. Execute is used for both programs,
and for directories.
An important, independent concept is that of an Access Control
List, or ACL. An ACL matches specific permissions to
either specific users, or to users holding a particular
identifier or access token, which may be encrypted, and
may be transferable from one user or process to another.
Other Metadata
The file system uses several forms of metadata, or data about
the data, to manage files. We have seen two of the most important
types of metadata already: names, and security information. But there
are other types, which may or may not be supported:
- File layout information
- Timestamps: created, modified, accessed, backed up
- File type
- File size (last byte)
- File storage (which may be either more or less than the file size)
- Block size
- Realtime info
- User-definable metadata
Some of these attributes are preserved across backup and restore, or
transfer to another system via ftp. Others are not, and
sometimes problems occur.
Way back at the beginning, I mentioned file forks but didn't
discuss them. Forks were originally developed for the Macintosh, to
hold file icons. NTFS has a similar feature called file
streams. These forks really blur the boundary between system
metadata and user data, and typically are not preserved when files
move between systems.
Other File Types
- pipes (named or unnamed)
- device special files
- sockets
- device raw files
Homework, Etc.
Submit your project proposal. Older homeworks that had not been
added to SFS are now there.
Next Lecture
Next lecture:
第10回 6月18日 ファイル・システムの実装
Lecture 10, June 18: File System Implementations
Readings for next week and followup for this week:
その他 Additional Information