慶應義塾大学
2016年度春学期

Operating Systems

開講場所：SFC
授業形態：講義
担当: Rodney Van Meter
E-mail: rdv@sfc.keio.ac.jp

第13回 5月23日ハイパーバイザー
Lecture 13, May 23: Hypervisors and Virtual Systems

What do a matryoshka doll and a fukuruma doll have to do with operating systems?

(Images from www.cse.ucsd.edu/~saul/images/matryoshka.jpg and http://russian-crafts.com/nest/history/fucu.jpg.)

Outline

Basic principle of virtualization
Uses of virtualization
History of virtualization
Virtualization architectures
Implementing virtualization
Is a VMM just an OS?
Security
Multicore and parallel VMM
Related approaches

Basic Principle of Virtualization

VMware, Parallels and Xen are common virtual machine monitors.
Virtualization lets you run multiple virtual hosts running same or different OSes, or to migrate virtual servers to new hardware.
Virtualization goes back to IBM in the 1960s, and was formalized in the 1970s by Popek and Goldberg.

We have already discussed virtual memory. What if we decide to virtualize more of the hardware? How about all of the hardware? If virtualizing some of it was a good idea, why not all of it?

How many of you have used VMware, Parallels or Xen? (VMware, by the way, claims to be the fastest-growing software company in history.) Then you have used a virtual machine (VM) and virtual machine monitor (VMM). Because an operating system usually runs in supervisor mode, VMMs are also referred to as hypervisors.

The basic goal of a hypervisor is to allow multiple operating systems to run on the same hardware at the same time. This is not simply dual boot, but dynamic sharing of the CPU, memory and other resources, the same as different processes share the system in a multitasking OS. Moreover, the different instances of the operating systems, known as guest OSes can be heterogeneous.

In 1974, Popek and Goldberg defined it this way (adapted):

"A virtual machine is taken to be an efficient, isolated duplicate of the real machine...As a piece of software, a VMM [virtual machine monitor] has three essential characteristics.

First, the VMM provides an environment for programs which is essentially identical with the original machine;
second, programs run in this environment show at worst only minor decreases in speed; and
last, the VMM is in complete control of system resources."

Today, we would augment/relax those conditions:

The VMM provides an abstract machine environment, for each VM, but it is not necessarily the same as the underlying hardware environment, and probably not the same for each VM.
Virtual machines must be isolated from one another. (Implicit in the third item above, should be explicit.)
Different guest operating systems must be supported. (Again, perhaps not really necessary.)
The VMM must provide efficient access to external networks, and efficient communication mechanisms between VMs on the same or different hardware (e.g., via a virtual, emulated network internal to the VMM).

Uses of Virtualization

There are many good reasons to want a virtual machine environment:

Run different OSes (Windows and Linux)
Version management for supporting software
Shipping complete software environments
Migrating/load balancing virtual servers
Security isolation

All of these can be classified as one of three basic forms of use:

workload isolation
workload consolidation
workload migration

Below is an image of VMware's Vmotion, which allows live migration of a server from one physical machine to another.

History of Virtualization

You may not realize that virtual machine technology actually goes back to the 1960s. The original goal of IBM's VM was to completely virtualize the underlying hardware. IBM had the distinct advantage that it could change both the hardware and the software. Their primary goal was sharing of the hardware for legacy software; the pre-existing APIs gave applications (typically large databases) very direct access to the device controllers, and they wanted to preserve the customers' investment in that software while making the hardware shareable, in order to bring down total cost of ownership (TCO).

Virtualization Architectures

Truly recursive virtual machine v. single-layer
Host OS/Guest OS architecture v. true virtual machine monitor

IBM's VM abstracted the hardware so completely that it was possible to run another copy of VM inside of it. My OS professor, Kim Korner, told of running them thirty deep, something like a matryoshka doll or fukuruma doll. For most purposes, this depth of nesting is unnecessary: what we really want is to run two OSes, or to migrate a server or user environment from one place to another.

IBM's ability to control the hardware had a huge advantage: they didn't have to worry about thousands of different types of peripherals, odd trap semantics and page tables, etc. The biggest initial hurdle to a truly useful VMM is that plethora of peripherals. VMware solves this difficult problem by using a Host OS in addition to the Guest OS. Their chosen Host OS is Linux.

The Host OS actually performs the I/O on behalf of the VMM, using the host OS's device drivers. The VMM communicates with a process running inside the VM to execute the I/Os and return the results, regardless of which VM actually requested the data.

Implementing Virtualization

Most new hardware supports virtualization directly (see the section below on Intel architectures), but older CPUs require some help. There are two principal approaches:

direct execution with binary translation modifies the instructions when a program starts
paravirtualization requires that the OS be ported to run on the hypervisor

And several rather pragmatic problems that must be solved:

Machine protection levels
Managing page tables
Masking interrupts
Managing resources ("balloon process")

VMware uses direct execution with binary translation, while Xen uses paravirtualization.

Hardware support for all of these things certainly makes life a lot simpler. Intel has support for virtualization on both IA-32 (x86) and IA-64 (Itanium) in the newest round of chips, known as VT-x and VT-i, respectively. Sun has support in UltraSPARC. With these modifications, binary translation and the source code modification for paravirtualization are both unnecessary.

The single biggest problem presented by the IA-32 architecture is that some operations that a user process might not be privileged to execute are silently ignored by the hardware. For a VMM, you would much prefer that the hardware traps. VMware solves this problem by dynamically modifying the binary code of the OS to trap to the VMM. This approach involves a lot of execution overhead when the system first starts, but should run well once the cache of modified code is warm.

Xen's approach is to modify the OS source code to support cooperating with the VMM. They have successfully used this approach for Linux and Windows.

A major problem on Intel architectures is ring compression. On IA-32, there are four protection rings, 0-3. Traditionally, the OS uses ring 0 ("supervisor mode"), and applications use ring 3, with separate page tables. Rings 1 and 2 were used by OS/2, but no important OS today uses them. They are similar to Multics' protection rings, another important idea from the 1960s, but out of fashion for much of the last thirty years.

The obvious solution would be to have the hypervisor run in ring 0, the OS in ring 1, and the apps in ring 3. However, there's a problem: the MMU doesn't distinguish among rings 0, 1, and 2. All three of them can change the page tables at will, and all three levels have access to all memory. This problem forces the guest OS to run in ring 3, the same as the applications themselves. This in turn causes problems in implementing the VMM.

Some VMMs support fixed partitioning of the hardware resources, especially memory; others do it dynamically. VMware uses a "balloon process" that it "inflates" inside a VM when the VMM wants to recover some memory. The inflation causes the VM to page out some memory to give to the balloon process, which the balloon process then gives to the VMM to be given to another VM.

One important problem for desktop sharing of OSes: how do you communicate between them? "Drag and drop" for files would be incredibly useful, but done improperly it's hard to build and maintain, and a possible security hole. The simplest approach, and the one that VMware initially took, is to allow the two VMs to communicate through the network, as if they were running on separate machines.

Multicore and Parallel VMM

What are the special challenges in parallel systems for virtual machines? In the early days of multicomputers, the system could usually be divided up among multiple users. User A gets processors 0 to 7, user B gets 8 to 15, user C gets 16 to 31. When the memory is distributed, this approach generally gets you good isolation of the separate programs. But the multicore systems are shared-memory multiprocessors, and the I/O devices are shared, too, so this approach does not provide perfect isolation.

Is a VMM just an OS?

Yes.

But it's a very restricted OS, focusing only on the resource management. It provides no GUI library calls, no shell, no real file system or network stack; it allows the guest OSes to provide all of those.

The key is to avoid "feature creep" or "software bloat" so that the VMM remains lightweight.

Security

The VMM's ability to isolate processes from each other constitutes an important security feature. But what if it goes wrong? What if the VMM itself turns out to have security holes that allow information to be leaked from one machine to another? Some people have suggested carrying around a CD with your environment on it, and booting your own, secure version of Linux on any handy PC when you need to check your email. What happens if that booting happens on a virtual machine instead of real hardware?

All of these are serious concerns. But because the VMM is small and does not run user processes directly, it is easier to make secure. Moreover, the VMM is generally a net win in security, because it can sometimes recognize attempts to subvert the guest OSes running on top of it.

Implementing Virtualization

Intel identifies several classes of problems with the IA-32 architecture that interfere with virtualization, since it was not anticipated when the architecture was defined [Neiger 2006]:

Ring aliasing
Address-space compression
Non-faulting access to privileged state
Adverse impact on guest system calls
Interrupt virtualization
Access to hidden state
Ring compression
Frequent access to privileged resources

Related approaches

Microkernels such as early NT supported different "personalities".
Language-specific Virtual machines (Lisp, Java, .NET (Common Language Infrastructure))

Formal Approach

Popek and Goldberg:

<E,M,P,R> is machine state:
executable storage E, processor mode M, program counter P, and relocation-bounds register R. Today, we would replace R with the full active page table.
A sensitive instruction is one that modifies either the processor mode M or the limits on memory that can be touched, represented by R.
A privileged instruction is any instruction that will trap if M = u (user mode).
Theorem 1: For any conventional third generation computer, a virtual machine monitor may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions. (Emphasis added.)
Theorem 2: A conventional third generation computer is recursively virtualizable if it is: (a) virtualizable, and (b) a VMM without any timing dependencies can be constructed for it. (Emphasis added.)
Theorem 3: A hybrid virtual machine monitor may be constructed for any conventional third generation machine in which the set of user sensitive instructions are a subset of the privileged instructions. (Emphasis added.)

(Note that this model explicitly ignores I/O, but we can extend it to include I/O.)

宿題 Homework

Your only homework this week is to prepare for your final evaluations on Thursday!

Next Lecture

Next lecture:

Final review!

Followup for this week:

Popek and Goldberg, Formal requirements for virtualizable third generation architectures, CACM, July 1974.
Barham et al., Xen and the Art of Virtualization, SOSP 2003.
Sugerman et al., Virtualizing I/O Devices on WMware Workstation's Hosted Virtual Machine Monitor, USENIX 2001.
Waldspurger, Memory Resource Management in VMware ESX Server, OSDI'02
Several articles in the May 2005 issue of IEEE Computer.
Intel Tech Journal issue on virtualization. This includes the Neiger 2006 article referenced above, pp. 167-177. (PDF)
Wikipedia page on hypervisors.
Table at Wikipedia comparing virtual machines.
KVM, kernel-based virtual machine, also at Wikipedia.
Documents on KVM from the KVM website.
Keith Adams and Ole Agesen, "A Comparison of Software and Hardware Techniques for x86 Virtualization". Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA, 2006.
Xen papers and presentations.
The paravirtualization mechanism for I/O in Linux, used with KVM, is called virtio. Rusty Russell has written a paper about it.

Operating Systems

第13回 5月23日 ハイパーバイザー Lecture 13, May 23: Hypervisors and Virtual Systems

Outline

Basic Principle of Virtualization

Uses of Virtualization

History of Virtualization

Virtualization Architectures

Implementing Virtualization

Multicore and Parallel VMM

Is a VMM just an OS?

Security

Implementing Virtualization

Related approaches

Formal Approach

宿題 Homework

Next Lecture

その他 Additional Information

第13回 5月23日ハイパーバイザー
Lecture 13, May 23: Hypervisors and Virtual Systems