Commit graph

34 commits

Author SHA1 Message Date
Avi Kivity
774c47f1d7 [PATCH] KVM: cpu hotplug support
On hotplug, we execute the hardware extension enable sequence.  On unplug, we
decache any vcpus that last ran on the exiting cpu, and execute the hardware
extension disable sequence.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:41 -08:00
Avi Kivity
8d0be2b3bf [PATCH] KVM: VMX: add vcpu_clear()
Like the inline code it replaces, this function decaches the vmcs from the cpu
it last executed on.  in addition:

 - vcpu_clear() works if the last cpu is also the cpu we're running on
 - it is faster on larger smps by virtue of using smp_call_function_single()

Includes fix from Ingo Molnar.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:41 -08:00
Avi Kivity
26bb83a755 [PATCH] kvm: VMX: Reload ds and es even in 64-bit mode
Or 32-bit userspace will get confused.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:40 -08:00
Avi Kivity
988ad74ff6 [PATCH] kvm: vmx: handle triple faults by returning EXIT_REASON_SHUTDOWN to userspace
Just like svm.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:40 -08:00
Ingo Molnar
96958231ce [PATCH] kvm: optimize inline assembly
Forms like "0(%rsp)" generate an instruction with an unnecessary one byte
displacement under certain circumstances.  replace with the equivalent
"(%rsp)".

Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:40 -08:00
Al Viro
8b6d44c7bd [PATCH] kvm: NULL noise removal
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09 09:14:07 -08:00
Avi Kivity
432bd6cbf9 [PATCH] KVM: fix lockup on 32-bit intel hosts with nx disabled in the bios
Intel hosts, without long mode, and with nx support disabled in the bios
have an efer that is readable but not writable.  This causes a lockup on
switch to guest mode (even though it should exit with reason 34 according
to the documentation).

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-01 16:22:41 -08:00
Avi Kivity
cccf748b81 [PATCH] KVM: fix race between mmio reads and injected interrupts
The kvm mmio read path looks like:

 1. guest read faults
 2. kvm emulates read, calls emulator_read_emulated()
 3. fails as a read requires userspace help
 4. exit to userspace
 5. userspace emulates read, kvm sets vcpu->mmio_read_completed
 6. re-enter guest, fault again
 7. kvm emulates read, calls emulator_read_emulated()
 8. succeeds as vcpu->mmio_read_emulated is set
 9. instruction completes and guest is resumed

A problem surfaces if the userspace exit (step 5) also requests an interrupt
injection.  In that case, the guest does not re-execute the original
instruction, but the interrupt handler.  The next time an mmio read is
exectued (likely for a different address), step 3 will find
vcpu->mmio_read_completed set and return the value read for the original
instruction.

The problem manifested itself in a few annoying ways:
- little squares appear randomly on console when switching virtual terminals
- ne2000 fails under nfs read load
- rtl8139 complains about "pci errors" even though the device model is
  incapable of issuing them.

Fix by skipping interrupt injection if an mmio read is pending.

A better fix is to avoid re-entry into the guest, and re-emulating immediately
instead.  However that's a bit more complex.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23 07:52:06 -08:00
Herbert Xu
e001548911 [PATCH] vmx: Fix register constraint in launch code
Both "=r" and "=g" breaks my build on i386:

  $ make
    CC [M]  drivers/kvm/vmx.o
  {standard input}: Assembler messages:
  {standard input}:3318: Error: bad register name `%sil'
  make[1]: *** [drivers/kvm/vmx.o] Error 1
  make: *** [_module_drivers/kvm] Error 2

The reason is that setbe requires an 8-bit register but "=r" does not
constrain the target register to be one that has an 8-bit version on
i386.

According to

	http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10153

the correct constraint is "=q".

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-22 19:27:02 -08:00
Ingo Molnar
07031e14c1 [PATCH] KVM: add VM-exit profiling
This adds the profile=kvm boot option, which enables KVM to profile VM
exits.

Use: "readprofile -m ./System.map | sort -n" to see the resulting
output:

   [...]
   18246 serial_out                               148.3415
   18945 native_flush_tlb                         378.9000
   23618 serial_in                                212.7748
   29279 __spin_unlock_irq                        622.9574
   43447 native_apic_write                        2068.9048
   52702 enable_8259A_irq                         742.2817
   54250 vgacon_scroll                             89.3740
   67394 ide_inb                                  6126.7273
   79514 copy_page_range                           98.1654
   84868 do_wp_page                                86.6000
  140266 pit_read                                 783.6089
  151436 ide_outb                                 25239.3333
  152668 native_io_delay                          21809.7143
  174783 mask_and_ack_8259A                       783.7803
  362404 native_set_pte_at                        36240.4000
 1688747 total                                      0.5009

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-11 18:18:21 -08:00
Dor Laor
022a93080c [PATCH] KVM: Simplify test for interrupt window
No need to test for rflags.if as both VT and SVM specs assure us that on exit
caused from interrupt window opening, 'if' is set.

Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:28 -08:00
Avi Kivity
4db9c47c05 [PATCH] KVM: Don't set guest cr3 from vmx_vcpu_setup()
It overwrites the right cr3 set from mmu setup.  Happens only with the test
harness.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:28 -08:00
Avi Kivity
e52de1b8cf [PATCH] KVM: Improve reporting of vmwrite errors
This will allow us to see the root cause when a vmwrite error happens.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:27 -08:00
Avi Kivity
e2dec939db [PATCH] KVM: MMU: Detect oom conditions and propagate error to userspace
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:27 -08:00
Avi Kivity
5f015a5b28 [PATCH] KVM: MMU: Remove invlpg interception
Since we write protect shadowed guest page tables, there is no need to trap
page invalidations (the guest will always change the mapping before issuing
the invlpg instruction).

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:25 -08:00
Avi Kivity
ebeace8609 [PATCH] KVM: MMU: oom handling
When beginning to process a page fault, make sure we have enough shadow pages
available to service the fault.  If not, free some pages.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:25 -08:00
Avi Kivity
399badf315 [PATCH] KVM: Prevent stale bits in cr0 and cr4
Hardware virtualization implementations allow the guests to freely change some
of the bits in cr0 and cr4, but trap when changing the other bits.  This is
useful to avoid excessive exits due to changing, for example, the ts flag.

It also means the kvm's copy of cr0 and cr4 may be stale with respect to these
bits.  most of the time this doesn't matter as these bits are not very
interesting.  Other times, however (for example when returning cr0 to
userspace), they are, so get the fresh contents of these bits from the guest
by means of a new arch operation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:23 -08:00
Dor Laor
c1150d8cf9 [PATCH] KVM: Improve interrupt response
The current interrupt injection mechanism might delay an interrupt under
the following circumstances:

 - if injection fails because the guest is not interruptible (rflags.IF clear,
   or after a 'mov ss' or 'sti' instruction).  Userspace can check rflags,
   but the other cases or not testable under the current API.
 - if injection fails because of a fault during delivery.  This probably
   never happens under normal guests.
 - if injection fails due to a physical interrupt causing a vmexit so that
   it can be handled by the host.

In all cases the guest proceeds without processing the interrupt, reducing
the interactive feel and interrupt throughput of the guest.

This patch fixes the situation by allowing userspace to request an exit
when the 'interrupt window' opens, so that it can re-inject the interrupt
at the right time.  Guest interactivity is very visibly improved.

Signed-off-by: Dor Laor <dor.laor@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:22 -08:00
Ingo Molnar
d3b2c33860 [PATCH] KVM: Use raw_smp_processor_id() instead of smp_processor_id() where applicable
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:22 -08:00
Ingo Molnar
965b58a550 [PATCH] KVM: Fix GFP_KERNEL alloc in atomic section bug
KVM does kmalloc() in an atomic section while having preemption disabled via
vcpu_load().  Fix this by moving the ->*_msr setup from the vcpu_setup method
to the vcpu_create method.

(This is also a small speedup for setting up a vcpu, which can in theory be
more frequent than the vcpu_create method).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-05 23:55:21 -08:00
Nguyen Anh Quynh
c68876fd28 [PATCH] KVM: Rename some msrs
No need to append _MSR to msr names, a prefix should suffice.

Signed-off-by: Nguyen Anh Quynh <aquynh@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Avi Kivity
3bab1f5dda [PATCH] KVM: Move common msr handling to arch independent code
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Avi Kivity
671d656479 [PATCH] KVM: Implement a few system configuration msrs
Resolves sourceforge bug 1622229 (guest crashes running benchmark software).

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Avi Kivity
a9058ecd3c [PATCH] KVM: Simplify is_long_mode()
Instead of doing tricky stuff with the arch dependent virtualization
registers, take a peek at the guest's efer.

This simlifies some code, and fixes some confusion in the mmu branch.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-30 10:56:44 -08:00
Michael Riepe
0f8e3d365a [PATCH] KVM: Handle p5 mce msrs
This allows plan9 to get a little further booting.

Signed-off-by: Michael Riepe <michael@mr511.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:46 -08:00
Michael Riepe
abacf8dff9 [PATCH] KVM: Force real-mode cs limit to 64K
This allows opensolaris to boot on kvm/intel.

Signed-off-by: Michael Riepe <michael@mr511.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22 08:55:46 -08:00
Avi Kivity
bfdc0c280a [PATCH] KVM: Fix vmx hardware_enable() on macbooks
It seems macbooks set bit 2 but not bit 0, which is an "enabled but vmxon will
fault" setting.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Tested-by: Alex Larsson (sometimes testing helps)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:48 -08:00
Michael Riepe
3b99ab2421 [PATCH] KVM: Don't touch the virtual apic vt registers on 32-bit
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:48 -08:00
Avi Kivity
873a7c423b [PATCH] KVM: Disallow the kvm-amd module on intel hardware, and vice versa
They're not on speaking terms.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:48 -08:00
Avi Kivity
7725f0badd [PATCH] KVM: Move find_vmx_entry() to vmx.c
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:47 -08:00
Uri Lublin
f7fbf1fdf0 [PATCH] KVM: Make the GET_SREGS and SET_SREGS ioctls symmetric
This makes the SET_SREGS ioctl behave symmetrically to the GET_SREGS ioctl wrt
the segment access rights flag.

Signed-off-by: Uri Lublin <uril@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:47 -08:00
Avi Kivity
05b3e0c2c7 [PATCH] KVM: Replace __x86_64__ with CONFIG_X86_64
As per akpm's request.

Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:46 -08:00
Anthony Liguori
3b3be0d1cc [PATCH] KVM: Add missing include
load_TR_desc() lives in asm/desc.h, so #include that file.

Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13 09:05:46 -08:00
Avi Kivity
6aa8b732ca [PATCH] kvm: userspace interface
web site: http://kvm.sourceforge.net

mailing list: kvm-devel@lists.sourceforge.net
  (http://lists.sourceforge.net/lists/listinfo/kvm-devel)

The following patchset adds a driver for Intel's hardware virtualization
extensions to the x86 architecture.  The driver adds a character device
(/dev/kvm) that exposes the virtualization capabilities to userspace.  Using
this driver, a process can run a virtual machine (a "guest") in a fully
virtualized PC containing its own virtual hard disks, network adapters, and
display.

Using this driver, one can start multiple virtual machines on a host.

Each virtual machine is a process on the host; a virtual cpu is a thread in
that process.  kill(1), nice(1), top(1) work as expected.  In effect, the
driver adds a third execution mode to the existing two: we now have kernel
mode, user mode, and guest mode.  Guest mode has its own address space mapping
guest physical memory (which is accessible to user mode by mmap()ing
/dev/kvm).  Guest mode has no access to any I/O devices; any such access is
intercepted and directed to user mode for emulation.

The driver supports i386 and x86_64 hosts and guests.  All combinations are
allowed except x86_64 guest on i386 host.  For i386 guests and hosts, both pae
and non-pae paging modes are supported.

SMP hosts and UP guests are supported.  At the moment only Intel
hardware is supported, but AMD virtualization support is being worked on.

Performance currently is non-stellar due to the naive implementation of the
mmu virtualization, which throws away most of the shadow page table entries
every context switch.  We plan to address this in two ways:

- cache shadow page tables across tlb flushes
- wait until AMD and Intel release processors with nested page tables

Currently a virtual desktop is responsive but consumes a lot of CPU.  Under
Windows I tried playing pinball and watching a few flash movies; with a recent
CPU one can hardly feel the virtualization.  Linux/X is slower, probably due
to X being in a separate process.

In addition to the driver, you need a slightly modified qemu to provide I/O
device emulation and the BIOS.

Caveats (akpm: might no longer be true):

- The Windows install currently bluescreens due to a problem with the
  virtual APIC.  We are working on a fix.  A temporary workaround is to
  use an existing image or install through qemu
- Windows 64-bit does not work.  That's also true for qemu, so it's
  probably a problem with the device model.

[bero@arklinux.org: build fix]
[simon.kagstrom@bth.se: build fix, other fixes]
[uril@qumranet.com: KVM: Expose interrupt bitmap]
[akpm@osdl.org: i386 build fix]
[mingo@elte.hu: i386 fixes]
[rdreier@cisco.com: add log levels to all printks]
[randy.dunlap@oracle.com: Fix sparse NULL and C99 struct init warnings]
[anthony@codemonkey.ws: KVM: AMD SVM: 32-bit host support]
Signed-off-by: Yaniv Kamay <yaniv@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Cc: Simon Kagstrom <simon.kagstrom@bth.se>
Cc: Bernhard Rosenkraenzer <bero@arklinux.org>
Signed-off-by: Uri Lublin <uril@qumranet.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10 09:57:22 -08:00