慶應義塾大学
2016年度 春学期
Operating Systems
第13回 5月23日 ハイパーバイザー
Lecture 13, May 23: Hypervisors and Virtual Systems
What do a matryoshka doll and a fukuruma doll have to do
with operating systems?
(Images from www.cse.ucsd.edu/~saul/images/matryoshka.jpg and
http://russian-crafts.com/nest/history/fucu.jpg.)
Outline
- Basic principle of virtualization
- Uses of virtualization
- History of virtualization
- Virtualization architectures
- Implementing virtualization
- Is a VMM just an OS?
- Security
- Multicore and parallel VMM
- Related approaches
Basic Principle of Virtualization
- VMware, Parallels and Xen are common virtual machine monitors.
- Virtualization lets you run multiple virtual hosts running
same or different OSes, or to migrate virtual servers to new
hardware.
- Virtualization goes back to IBM in the 1960s, and was
formalized in the 1970s by Popek and Goldberg.
We have already discussed virtual memory. What if we decide to
virtualize more of the hardware? How about all of the
hardware? If virtualizing some of it was a good idea, why not all of
it?
How many of you have used VMware, Parallels or
Xen? (VMware, by the way, claims to be the fastest-growing
software company in history.) Then you have used a virtual machine
(VM) and virtual machine monitor (VMM). Because an
operating system usually runs in supervisor mode, VMMs are also
referred to as hypervisors.
The basic goal of a hypervisor is to allow multiple operating systems
to run on the same hardware at the same time. This is
not simply dual boot, but dynamic sharing of the CPU,
memory and other resources, the same as different processes share the
system in a multitasking OS. Moreover, the different instances of the
operating systems, known as guest OSes can be
heterogeneous.
In 1974, Popek and Goldberg defined it this way (adapted):
"A virtual machine is taken to be an efficient, isolated
duplicate of the real machine...As a piece of software, a VMM
[virtual machine monitor] has three essential characteristics.
- First, the VMM provides an environment for programs which is
essentially identical with the original machine;
- second, programs run in this environment show at worst only
minor decreases in speed; and
- last, the VMM is in complete control of system resources."
Today, we would augment/relax those conditions:
- The VMM provides an abstract machine environment, for
each VM, but it is not necessarily the same as the underlying
hardware environment, and probably not the same for each VM.
- Virtual machines must be isolated from one another. (Implicit
in the third item above, should be explicit.)
- Different guest operating systems must be supported. (Again,
perhaps not really necessary.)
- The VMM must provide efficient access to external networks, and
efficient communication mechanisms between VMs on the same or
different hardware (e.g., via a virtual, emulated network internal
to the VMM).
Uses of Virtualization
There are many good reasons to want a virtual machine environment:
- Run different OSes (Windows and Linux)
- Version management for supporting software
- Shipping complete software environments
- Migrating/load balancing virtual servers
- Security isolation
All of these can be classified as one of three basic forms of use:
- workload isolation
- workload consolidation
- workload migration
Below is an image of VMware's Vmotion, which allows live migration
of a server from one physical machine to another.
History of Virtualization
You may not realize that virtual machine technology actually goes back
to the 1960s. The original goal of IBM's VM was to completely
virtualize the underlying hardware. IBM had the distinct advantage
that it could change both the hardware and the software. Their
primary goal was sharing of the hardware for legacy software; the
pre-existing APIs gave applications (typically large databases) very
direct access to the device controllers, and they wanted to preserve
the customers' investment in that software while making the hardware
shareable, in order to bring down total cost of ownership (TCO).
Virtualization Architectures
- Truly recursive virtual machine v. single-layer
- Host OS/Guest OS architecture v. true virtual machine monitor
IBM's VM abstracted the hardware so completely that it was possible to
run another copy of VM inside of it. My OS professor, Kim
Korner, told of running them thirty deep, something like a
matryoshka doll or fukuruma doll. For most purposes,
this depth of nesting is unnecessary: what we really want is to run
two OSes, or to migrate a server or user environment from one place to
another.
IBM's ability to control the hardware had a huge advantage: they
didn't have to worry about thousands of different types of
peripherals, odd trap semantics and page tables, etc. The biggest
initial hurdle to a truly useful VMM is that plethora of peripherals.
VMware solves this difficult problem by using a Host OS in
addition to the Guest OS. Their chosen Host OS is Linux.
The Host OS actually performs the I/O on behalf of the VMM, using the
host OS's device drivers. The VMM communicates with a process running
inside the VM to execute the I/Os and return the results, regardless
of which VM actually requested the data.
Implementing Virtualization
Most new hardware supports virtualization directly (see the section
below on Intel architectures), but older CPUs require some help.
There are two principal approaches:
- direct execution with binary translation modifies the
instructions when a program starts
- paravirtualization requires that the OS
be ported to run on the hypervisor
And several rather pragmatic problems that must be solved:
- Machine protection levels
- Managing page tables
- Masking interrupts
- Managing resources ("balloon process")
VMware uses direct execution with binary translation, while Xen uses
paravirtualization.
Hardware support for all of these things certainly makes life a lot
simpler. Intel has support for virtualization on both IA-32 (x86) and
IA-64 (Itanium) in the newest round of chips, known as VT-x and VT-i,
respectively. Sun has support in UltraSPARC. With these
modifications, binary translation and the source code modification for
paravirtualization are both unnecessary.
The single biggest problem presented by the IA-32 architecture is that
some operations that a user process might not be privileged to execute
are silently ignored by the hardware. For a VMM, you would much
prefer that the hardware traps. VMware solves this problem by
dynamically modifying the binary code of the OS to trap to the
VMM. This approach involves a lot of execution overhead when the
system first starts, but should run well once the cache of modified
code is warm.
Xen's approach is to modify the OS source code to support cooperating
with the VMM. They have successfully used this approach for Linux and
Windows.
A major problem on Intel architectures is ring compression. On
IA-32, there are four protection rings, 0-3. Traditionally,
the OS uses ring 0 ("supervisor mode"), and applications use ring 3,
with separate page tables. Rings 1 and 2 were used by OS/2, but no
important OS today uses them. They are similar to Multics' protection
rings, another important idea from the 1960s, but out of fashion for
much of the last thirty years.
The obvious solution would be to have the hypervisor run in ring 0,
the OS in ring 1, and the apps in ring 3. However, there's a problem:
the MMU doesn't distinguish among rings 0, 1, and 2. All three of
them can change the page tables at will, and all three levels have
access to all memory. This problem forces the guest OS to run in ring
3, the same as the applications themselves. This in turn causes
problems in implementing the VMM.
Some VMMs support fixed partitioning of the hardware resources,
especially memory; others do it dynamically. VMware uses a "balloon
process" that it "inflates" inside a VM when the VMM wants to recover
some memory. The inflation causes the VM to page out some memory to
give to the balloon process, which the balloon process then gives to
the VMM to be given to another VM.
One important problem for desktop sharing of OSes: how do you
communicate between them? "Drag and drop" for files would be
incredibly useful, but done improperly it's hard to build and
maintain, and a possible security hole. The simplest approach, and
the one that VMware initially took, is to allow the two VMs to
communicate through the network, as if they were running on separate
machines.
Multicore and Parallel VMM
What are the special challenges in parallel systems for virtual
machines? In the early days of multicomputers, the system
could usually be divided up among multiple users. User A gets
processors 0 to 7, user B gets 8 to 15, user C gets 16 to 31. When
the memory is distributed, this approach generally gets you good
isolation of the separate programs. But the multicore systems are
shared-memory multiprocessors, and the I/O devices are shared, too, so
this approach does not provide perfect isolation.
Is a VMM just an OS?
Yes.
But it's a very restricted OS, focusing only on the resource
management. It provides no GUI library calls, no shell, no real file
system or network stack; it allows the guest OSes to provide all of
those.
The key is to avoid "feature creep" or "software bloat" so that the
VMM remains lightweight.
Security
The VMM's ability to isolate processes from each other constitutes an
important security feature. But what if it goes wrong? What if the
VMM itself turns out to have security holes that allow information to
be leaked from one machine to another? Some people have suggested
carrying around a CD with your environment on it, and booting your
own, secure version of Linux on any handy PC when you need to check
your email. What happens if that booting happens on a virtual machine
instead of real hardware?
All of these are serious concerns. But because the VMM is small and
does not run user processes directly, it is easier to make secure.
Moreover, the VMM is generally a net win in security, because it can
sometimes recognize attempts to subvert the guest OSes running on top
of it.
Implementing Virtualization
Intel identifies several classes of problems with the IA-32
architecture that interfere with virtualization, since it was not
anticipated when the architecture was defined [Neiger 2006]:
- Ring aliasing
- Address-space compression
- Non-faulting access to privileged state
- Adverse impact on guest system calls
- Interrupt virtualization
- Access to hidden state
- Ring compression
- Frequent access to privileged resources
Related approaches
- Microkernels such as early NT supported different
"personalities".
- Language-specific Virtual machines (Lisp, Java, .NET (Common
Language Infrastructure))
Formal Approach
Popek and Goldberg:
- <E,M,P,R> is machine state:
executable
storage E, processor mode M, program counter P,
and relocation-bounds register R. Today, we would
replace R with the full active page table.
- A sensitive instruction is one that modifies either the
processor mode M or the limits on memory that can be touched,
represented by R.
- A privileged instruction is any instruction that
will trap if M = u (user mode).
- Theorem 1: For any conventional third generation computer, a
virtual machine monitor may be constructed if the set
of sensitive instructions for that computer is a subset of
the set of privileged instructions. (Emphasis added.)
- Theorem 2: A conventional third generation computer is
recursively virtualizable if it is: (a) virtualizable, and
(b) a VMM without any timing dependencies can be constructed for
it. (Emphasis added.)
- Theorem 3: A hybrid virtual machine monitor may be
constructed for any conventional third generation machine in which
the set of user sensitive instructions are a subset of the
privileged instructions. (Emphasis added.)
(Note that this model explicitly ignores I/O, but we can extend it
to include I/O.)
宿題 Homework
Your only homework this week is to prepare for your final
evaluations on Thursday!
Next Lecture
Next lecture:
Final review!
Followup for this week:
- Popek and Goldberg, Formal requirements for virtualizable third generation architectures, CACM, July 1974.
- Barham et al., Xen and the Art of Virtualization, SOSP
2003.
- Sugerman et al., Virtualizing I/O Devices on WMware
Workstation's Hosted Virtual Machine Monitor, USENIX 2001.
- Waldspurger, Memory Resource Management in VMware ESX Server, OSDI'02
- Several
articles in
the May
2005 issue of IEEE Computer.
- Intel
Tech Journal issue on virtualization. This includes the Neiger
2006 article referenced above, pp. 167-177.
(PDF)
- Wikipedia page on hypervisors.
- Table at
Wikipedia comparing
virtual machines.
- KVM, kernel-based virtual machine, also at Wikipedia.
- Documents on KVM from the KVM website.
- Keith Adams and Ole Agesen, "A Comparison of Software and Hardware Techniques for x86 Virtualization". Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, USA, 2006.
- Xen
papers and presentations.
- The paravirtualization mechanism for I/O in Linux, used with KVM,
is called virtio.
Rusty Russell has
written a paper
about it.
その他 Additional Information