慶應義塾大学
2009年度 春学期
システム・ソフトウェア
System Software / Operating Systemsオペレーティングシステム
第10回 6月23日 入出力
Lecture 10, June 23: Input/Output Systems
Outline
- What's a Disk Drive?
- The Importance of a Disk Drive
- The Access Time Gap
- The Insides of a Disk Drive
- Disk Drive Trends
- Talking to the Hardware
- Buses and I/O Ports
- Device Types
- Doing the Talking
- Device Drivers
- Character and Block Devices
- Top Half and Bottom Half
- Naming
- Performance
- Data Copies in the File System and Elsewhere
- Interrupt割り込み Rate
- Coalescing Interrupt割り込みs
- Tools
What's a Disk Drive?
A disk drive stores data in sectors that held on tracks;
all of the tracks at the same distance from the spindle are called
a cylinder.
It uses a read/write head attached to a slider,
mounted on an actuator arm, to read and write the data as it
spins past.
The Importance of a Disk Drive
In an architectural sense, what's important about disk drives?
- They are expensive
- They consume lots of power
- They are often the performance bottleneck (the access time gap)
- They break more easily than many other parts of the system
...and yet, the Information情報 Revolution (情報革命?) can fairly be
said to be built on disk drives. Without them, there would be no
PCs, no Google.
The Access Time Gap
The Insides of a Disk Drive
Disk Drive Trends
Talking to the Hardware
In order to understand I/O, we need to briefly review the hardware
architecture...
Buses and I/O Ports
Systems generally consist of multi-level attachments that provide
differing types of aggregation. Some physical devices sit on, or
close to, the main memory bus; others are kept more distant via some
sort of controller.
There are a number of common types of controller:
- Serial
- Parallel
- SCSI
- ATA
- USB
- Firewire
- ...
As it gets easier to put more hardware into the devices themselves,
they exhibit more complex behavior, including helping the OS identify
them. We will come back to that later. First, some examples of the
device types:
Device Types
Okay, so what kind of devices are we actually talking to? There are
many, many kinds of I/O devices. Here are the classes that
SCSI defines:
- 00h - direct-access device (e.g., magnetic disk)
- 01h - sequential-access device (e.g., magnetic tape)
- 02h - printer device
- 03h - processプロセスor device
- 04h - write-once device
- 05h - CDROM device
- 06h - scanner device
- 07h - optical memory device (e.g., some optical disks)
- 08h - medium Changer (e.g. jukeboxes)
- 09h - communication通信s device
- 0Ah-0Bh - defined by ASC IT8 (Graphic arts pre-press devices)
- 0Ch - Storage array controller device (e.g., RAID)
- 0Dh - Enclosure services device
- 0Eh - Simplified direct-access device (e.g., magnetic disk)
- 0Fh - Optical card reader/writer device
- 10h - Reserved for bridging expanders
- 11h - Object-based Storage Device
- 12h - Automation/Drive Interface
- 13h-1Dh - reserved
- 1Eh - Well known logical unit
- 1Fh - unknown or no device type
Here are the USB device types:
- 00h - Use class information情報 in the Interface Descriptors
- 01h - Audio
- 02h - Communication通信s and CDC Control
- 03h - HID (Human Interface Device)
- 05h - Physical
- 06h - Image
- 07h - Printer
- 08h - Mass Storage
- 09h - Hub
- 0Ah - CDC-Data
- 0Bh - Smart Card
- 0Dh - Content Security
- 0Eh - Video
- DCh - Diagnostic Device
- E0h - Wireless Controller
- EFh - Miscellaneous
- FEh - Application Specific
- FFh - Vendor Specific
And neither of these lists includes graphics devices such as the
monitor or graphics display itself. Other devices requiring similar
I/O control may include specialized processプロセスors, and of course all
manner of scientific equipment.
Obviously, all of this requires a lot of software; in Linux, there are
almost 3,300 different device drivers! Don't worry, the complexity is
actually quite manageable; we'll come back to that when we discuss the
drivers themselves below.
Doing the Talking
...okay. Now, how do we talk to a device? There are two basic ways
for the CPU to talk to hardware devices:
- I/O Instructions
- Memory-Mapped I/O
When using I/O instructions, the CPU executes an IN or
OUT instruction, which reads from or writes to a separate
address space (namespace) for I/O devices, usually attached to a
separate bus.
Those methods refer to how the CPU talks to, or controls the
device. In both cases, there are two primary ways to get your actual
data out:
- Programmed I/O
- Direct Memory Access (DMA)
DMA involves setting up some other piece of hardware besides the CPU
to actually control the I/O and move the data from the device into
main memory.
DMA may be done to virtual or physical addresses. The primary
advantage of virtual is that it supports scatter/gather I/O.
These days, most device controllers support scatter/gather directly
for physical addresses anyway, and with the MMU incorporated into the
CPU it's a little harder to use the address translation hardware, so
it's not commonly done any more.
Device Drivers
So far, we've hardly said a word about the operating systemオペレーティングシステム. The
device driver is the primary piece of the OS that is
responsible for managing I/O.
As you might expect from the initial discussion of hardware, there are
several levels of device drivers, starting with software to control
the actual buses and going on down to the devices. The bus drivers
are used more or less as a library of function機能・関数s for the actual device
drivers.
In most modern systems, the device driver that matches a particular
device can be loaded as a kernel module after the device is
identified by the OS.
A device driver must follow a particular form, which is very dependent
on the operating systemオペレーティングシステム. Over the last several years, there has been
a push for OS-independent device drivers, so that OS developers
can share the same code for a device independent of whether it was
developed for Windows, Linux, or Mac.
Character and Block Devices
Very early on in the development of Unix, the authors made a brilliant
decision: they devided hardware devices into two classes, the block
devices and the character devices. The primary difference
between the two is that file systems can be mounted on block devices,
requiring additional function機能・関数s from the device driver.
Top Half and Bottom Half
In Unix, the code for a device driver is divided into the top
half and the bottom half. (The bottom half is usually much
less than half of the total code, though.) The bottom half is
essentially the interrupt割り込み handler, and it must be prepared to run at
any time, with the system in any state. The top half generally
runs with the system set to the state (e.g., memory map) of the
processプロセス that is scheduling (or has scheduled) the I/O.
Naming
Devices used to be named strictly according to their bus address,
which was simple and never changed. In today's Plug-and-Play
(PNP) world, that's simply not so. Morever, different
flash drives can be plugged into the same slot and use the same
address (over time), but the OS should treat them differently!
Ideally, devices would always identify themselves completely. Most
devices provide some identification, but those that store data could,
and should, make more effective use of the volume name, which
is generally embedded組込み用 in the device.
Performance
The overall performance goal, as we discussed in the lecture on processプロセス
scheduling, is generally balance between keeping the CPU busy
and keeping the devices busy. For the moment, we are really only
concerned with how to achieve the highest I/O rate (measured in
throughput or I/O operations per second).
The principle概念 reasons that I/O slows down are:
- The CPU being late on an operation, resulting in the device
stalling.
- Data copies in the file system and elsewhere.
- Interrupt割り込み handling (which leads to CPU load and stalling the
device).
Device Stalls
For a disk drive, a common form of disk performance problem is a
rotational miss. Disks also must seek, and poor choices
in ordering seeks can ruin your performance, but we don't have time to
go into that right now.
For tape drives, underflowing or overflowing a buffer results in a
tape stall, which is extremely expensive.
Data Copies in the File System and Elsewhere
A couple of weeks ago we talked about file system APIs. At one point,
we talked about the alignment of application file read/write buffers.
In modern C/Unix APIs, the buffer can be any place in memory, but in
older systems, buffers always had to be aligned to the size of the
system
memory page.
In Unix systems, it is also true that disk I/Os are done in multiples
of a page size, and the I/O is also done to page boundaries. So how
are the API and the I/O system reconciled? Through the file system
buffer cache. The buffer cache serves two important purposes:
the first is alignment, and the second is buffering, to allow speed
matching of I/O and allow the application to continue while I/O is
handled by the kernel on its behalf.
Packets arrive into the system in a variety of sizes. Worse, in
general, you don't know which processプロセス (if any!) wants the
packet until you get it into memory and examine the headers.
These effects cumulatively mean that data copies are common in
operating systemsオペレーティングシステム, and they have an enormous impact on system
performance:
- They tie up the processプロセスor itself
- They tie up the memory bus
- They pollute the cache
Interrupt割り込み Rate and Coalescing Interrupt割り込みs
How many interrupt割り込みs per second do you get from a 100Mbps Ethernet card
with 1500B frames? What about a 10Gbps Ethernet card with
minimum-size frames?
Tools
On Linux, and on some other Unix systems, the following tools are useful:
- vmstat
- lspci
- iostat
- dmesg
You may be interested in the following benchmarks:
The following code should give you the number of clock cycles since
the Pentium last rebooted:
#include >stdio.h<
__inline__ unsigned long long int rdtsc()
{
unsigned long long int x;
__asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
return x;
}
main()
{
int i;
for ( i = 0 ; i < 100 ; i++ ) {
printf("%ld\n",rdtsc());
}
}
Homeworkかだい, Etc.
宿題 Homeworkかだい
This is the last homeworkかだい!!! After this week, your homeworkかだい
responsibility is your term project.
- Imagine that the bitmap showing which disk blocks are free has
become unreliable, so you decide to rebuild it by walking through all
of the in-use inodes. Assume on-disk inodes are 128 bytes, and the
file system was initialized to hold a maximum of one million files.
Your disk has a transfer rate of 40 megabytes/second, and can execute
a random operation in 10milliseconds. It holds 100GB.
- If all of the inodes are stored in one contiguous chunk of disk,
how long will it take to read them all?
- If 100,000 of the files each use a single indirect block that is
randomly placed on the disk, how long will it take to work through
all of them?
- Using the same type of disk, how long would it take to read
every 4KB block on the disk in random order?
- Do you back up the data on your computer? If so, how?
- Execute a data backup. This backup may be of any type, using
any tool, and may be just your user files or may be the entire
system. You may back up to CD/DVD, external disk, tape, or over the
network to a server.
- Report how long it took to perform your backup, and how much
data was transferred.
- Report the status of your term project.
Next Lecture
Next lecture:
Followup for this week:
その他 Additional Information情報