慶應義塾大学
2010年度 春学期

システム・ソフトウェア
System Software / Operating Systems

2010年度春学期 火曜日2時限
科目コード: 60730
開講場所:SFC
授業形態:講義
担当: Rodney Van Meter
E-mail: rdv@sfc.keio.ac.jp

第12回 7月6日 ハイパーバイザー
Lecture 12, July 6: Hypervisors and Virtual Systems

What do a matryoshka doll and a fukuruma doll have to do with operating systems?

(Images from www.cse.ucsd.edu/~saul/images/matryoshka.jpg and http://russian-crafts.com/nest/history/fucu.jpg.)

Outline

Basic Principle of Virtualization

We have already discussed virtual memory. What if we decide to virtualize more of the hardware? How about all of the hardware? If virtualizing some of it was a good idea, why not all of it?

How many of you have used VMware, Parallels or Xen? (VMware, by the way, claims to be the fastest-growing software company in history.) Then you have used a virtual machine (VM) and virtual machine monitor (VMM). Because an operating system usually runs in supervisor mode, VMMs are also referred to as hypervisors.

The basic goal of a hypervisor is to allow multiple operating systems to run on the same hardware at the same time. This is not simply dual boot, but dynamic sharing of the CPU, memory and other resources, the same as different processes share the system in a multitasking OS. Moreover, the different instances of the operating systems, known as guest OSes can be heterogeneous.

The primary challenges to creating a good VMM are:

Uses of Virtualization

There are many good reasons to want a virtual machine environment:

All of these can be classified as one of three basic forms of use:

Below is an image of VMware's Vmotion, which allows live migration of a server from one physical machine to another.

Vmotion, from VMware

History of Virtualization

You may not realize that virtual machine technology actually goes back to the 1960s. The original goal of IBM's VM was to completely virtualize the underlying hardware. IBM had the distinct advantage that it could change both the hardware and the software. Their primary goal was sharing of the hardware for legacy software; the pre-existing APIs gave applications (typically large databases) very direct access to the device controllers, and they wanted to preserve the customers' investment in that software while making the hardware shareable, in order to bring down total cost of ownership (TCO).

Virtualization Architectures

IBM's VM abstracted the hardware so completely that it was possible to run another copy of VM inside of it. My OS professor, Kim Korner, told of running them thirty deep, something like a matryoshka doll or fukuruma doll. For most purposes, this depth of nesting is unnecessary: what we really want is to run two OSes, or to migrate a server or user environment from one place to another.

IBM's ability to control the hardware had a huge advantage: they didn't have to worry about thousands of different types of peripherals, odd trap semantics and page tables, etc. The biggest initial hurdle to a truly useful VMM is that plethora of peripherals. VMware solves this difficult problem by using a Host OS in addition to the Guest OS. Their chosen Host OS is Linux.

The Host OS actually performs the I/O on behalf of the VMM, using the host OS's device drivers. The VMM communicates with a process running inside the VM to execute the I/Os and return the results, regardless of which VM actually requested the data.

Implementing Virtualization

Most existing hardware doesn't support virtualization directly, but requires some help. There are two principle approaches:
And several rather pragmatic problems that must be solved:

VMware uses direct execution with binary translation, while Xen uses paravirtualization.

Hardware support for all of these things certainly makes life a lot simpler. Intel has support for both IA-32 (x86) and IA-64 (Itanium) in the newest round of chips. Sun has support in UltraSPARC. With these modifications, binary translation and the source code modification for paravirtualization are both unnecessary.

The single biggest problem presented by the IA-32 architecture is that some operations that a user process might not be privileged to execute are silently ignored by the hardware. For a VMM, you would much prefer that the hardware traps. VMware solves this problem by dynamically modifying the binary code of the OS to trap to the VMM. This approach involves a lot of execution overhead when the system first starts, but should run well once the cache of modified code is warm.

Xen's approach is to modify the OS source code to support cooperating with the VMM. They have successfully used this approach for Linux and Windows.

A major problem on Intel architectures is ring compression. On IA-32, there are four protection rings, 0-3. Traditionally, the OS uses ring 0 ("supervisor mode"), and applications use ring 3, with separate page tables. Rings 1 and 2 were used by OS/2, but no important OS today uses them. They are similar to Multics' protection rings, another important idea from the 1960s, but out of fashion for much of the last thirty years.

The obvious solution would be to have the hypervisor run in ring 0, the OS in ring 1, and the apps in ring 3. However, there's a problem: the MMU doesn't distinguish among rings 0, 1, and 2. All three of them can change the page tables at will, and all three levels have access to all memory. This problem forces the guest OS to run in ring 3, the same as the applications themselves. This in turn causes problems in implementing the VMM.

Some VMMs support fixed partitioning of the hardware resources, especially memory; others do it dynamically. VMware uses a "balloon process" that it "inflates" inside a VM when the VMM wants to recover some memory. The inflation causes the VM to page out some memory to give to the balloon process, which the balloon process then gives to the VMM to be given to another VM.

One important problem for desktop sharing of OSes: how do you communicate between them? "Drag and drop" for files would be incredibly useful, but done improperly it's hard to build and maintain, and a possible security hole. The simplest approach, and the one that VMware initially took, is to allow the two VMs to communicate through the network, as if they were running on separate machines.

Multicore and Parallel VMM

What are the special challenges in parallel systems for virtual machines? In the early days of multicomputers, the system could usually be divided up among multiple users. User A gets processors 0 to 7, user B gets 8 to 15, user C gets 16 to 31. When the memory is distributed, this approach generally gets you good isolation of the separate programs. But the multicore systems are shared-memory multiprocessors, and the I/O devices are shared, too, so this approach does not provide perfect isolation.

Is a VMM just an OS?

Yes.

But it's a very restricted OS, focusing only on the resource management. It provides no GUI library calls, no shell, no real file system or network stack; it allows the guest OSes to provide all of those.

The key is to avoid "feature creep" or "software bloat" so that the VMM remains lightweight.

Security

The VMM's ability to isolate processes from each other constitutes an important security feature. But what if it goes wrong? What if the VMM itself turns out to have security holes that allow information to be leaked from one machine to another? Some people have suggested carrying around a CD with your environment on it, and booting your own, secure version of Linux on any handy PC when you need to check your email. What happens if that booting happens on a virtual machine instead of real hardware?

All of these are serious concerns. But because the VMM is small and does not run user processes directly, it is easier to make secure. Moreover, the VMM is generally a net win in security, because it can sometimes recognize attempts to subvert the guest OSes running on top of it.

Related approaches

宿題 Homework

Your only homework this week is to report on the progress of your term project.

Next Lecture

Next lecture:

第13回 7月13日 OS事例研究
Lecture 13, July 13: Operating Systems Research
(actual topic TBD.)

Followup for this week:

その他 Additional Information