set_msr_interception() is used by svm to set up which MSRs should be
intercepted. It can only fail if someone has changed the code to try
to intercept an MSR without updating the array of ranges.
The return value is ignored anyway: it should just BUG() if it doesn't
work. (A build-time failure would be better, but that's tricky).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
For some reason, mark_page_dirty open-codes __gfn_to_memslot().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
All the physical CPUs on the board should support the same VMX feature
set. Add check_processor_compatibility to kvm_arch_ops for the consistency
check.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
kvm_vm_ioctl_get_dirty_log scans bitmap to see it it's all zero, but
doesn't use that information.
Avi says:
Looks like it was used to guard kvm_mmu_slot_remove_write_access();
optimizing the case where the guest just leaves the screen alone (which
it usually does, especially in benchmarks).
I'd rather reinstate that optimization. See
90cb0529dd where the damage was done.
It's pretty simple: if the bitmap is all zero, we don't need to do anything to
clean it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Now we use a kmem cache for allocating vcpus, we can get the 16-byte
alignment required by fxsave & fxrstor instructions, and avoid
manually aligning the buffer.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Avi wants the allocations of vcpus centralized again. The easiest way
is to add a "size" arg to kvm_init_arch, and expose the thus-prepared
cache to the modules.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
... in favor of the more general emulator_{read,write}_*.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
... instead of a x86_emulate_ctxt, so that other callers can use it easily.
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Changes some svm.c internal function names:
1) io_adress -> io_address (de-germanify the spelling)
2) kvm_reput_irq -> reput_irq (it's not a generic kvm function)
3) kvm_do_inject_irq -> (it's not a generic kvm function)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
container_of is wonderful, but not casting at all is better. This
patch changes svm.c's internal functions to pass "struct vcpu_svm"
instead of "struct kvm_vcpu" and using container_of.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
There are several places where hardcoded numbers are used in place of
the easily-available constant, which is poor form.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
container_of is wonderful, but not casting at all is better. This
patch changes vmx.c's internal functions to pass "struct vcpu_vmx"
instead of "struct kvm_vcpu" and using container_of.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Now that kvm generally runs with preemption enabled, we need to protect
the fpu intialization sequence.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This allows the kvm mmu to perform sleepy operations, such as memory
allocation.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Current kvm disables preemption while the new virtualization registers are
in use. This of course is not very good for latency sensitive workloads (one
use of virtualization is to offload user interface and other latency
insensitive stuff to a container, so that it is easier to analyze the
remaining workload). This patch re-enables preemption for kvm; preemption
is now only disabled when switching the registers in and out, and during
the switch to guest mode and back.
Contains fixes from Shaohua Li <shaohua.li@intel.com>.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add the hypercall number to kvm_run and initialize it. This changes the ABI,
but as this particular ABI was unusable before this no users are affected.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Put cpu feature detecting part in hardware_setup, and stored the vmcs
condition in global variable for further check.
[glommer: fix for some i386-only machines not supporting CR8 load/store
exiting]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch converts the vcpus array in "struct kvm" to a pointer
array, and changes the "vcpu_create" and "vcpu_setup" hooks into one
"vcpu_create" call which does the allocation and initialization of the
vcpu (calling back into the kvm_vcpu_init core helper).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
struct kvm_vcpu has vmx-specific members; remove them to a private structure.
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
load_pdptrs can be handed an invalid cr3, and it should not oops.
This can happen because we injected #gp in set_cr3() after we set
vcpu->cr3 to the invalid value, or from kvm_vcpu_ioctl_set_sregs(), or
memory configuration changes after the guest did set_cr3().
We should also copy the pdpte array once, before checking and
assigning, otherwise an SMP guest can potentially alter the values
between the check and the set.
Finally one nitpick: ret = 1 should be done as late as possible: this
allows GCC to check for unset "ret" should the function change in
future.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The writeback fixes (02c03a326a) let
some dead code in the cmpxchg instruction emulation. Remove it.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch mainly imports some constants and rename two exist constants
of vmcs according to IA32 SDM.
It also adds two constants to indicate Lock bit and Enable bit in
MSR_IA32_FEATURE_CONTROL, and replace the hardcode _5_ with these two
bits.
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
gfn_to_page might sleep with swap support. Move it out of the kmap calls.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
vmx_cpu_run doesn't handle error correctly and kvm_mmu_reload might
sleep with mutex changes, so I move it above.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Right now, the bug is harmless as we never emulate one-byte 0xb6 or 0xb7.
But things may change.
Noted by the mysterious Gabriel C.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Intel manual (and KVM definition) say the TPR is 4 bits wide. Also fix
CR8_RESEVED_BITS typo.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Creating one's own BITMAP macro seems suboptimal: if we use manual
arithmetic in the one place exposed to userspace, we can use standard
macros elsewhere.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
On this machine (Intel), writing to the CR4 bits 0x00000800 and
0x00001000 cause a GPF. The Intel manual is a little unclear, but
AFIACT they're reserved, too.
Also fix spelling of CR4_RESEVED_BITS.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The kernel now has asm/cpu-features.h: use those macros instead of inventing
our own.
Also spell out definition of CR3_RESEVED_BITS, fix spelling and
tighten it for the non-PAE case.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The kernel now has asm/cpu-features.h: use those macros instead of
inventing our own.
Also spell out definition of CR0_RESEVED_BITS (no code change) and fix typo.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
I have shied away from touching x86_emulate.c (it could definitely use
some love, but it is forked from the Xen code, and it would be more
productive to cross-merge fixes).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Add string pio write support to support some version of Windows.
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch adds a `vcpu_id' field in `struct vcpu', so we can
differentiate BSP and APs without pointer comparison or arithmetic.
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
*nopage() in kvm_main.c should only store the type of mmap() fault if
the pointers are not NULL. This patch fixes the problem.
Signed-off-by: Nguyen Anh Quynh <aquynh@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
KVM reuses the IOAPIC register definitions, and needs them even if the
host is not compiled with IOAPIC support. Move the #ifdef below so that only
the IOAPIC variables and functions are protected, and the register definitions
are available to all.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Fix DMI const-ification fallout that appeared when merging subsystem
trees.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to latest memory ordering specification documents from Intel
and AMD, both manufacturers are committed to in-order loads from
cacheable memory for the x86 architecture. Hence, smp_rmb() may be a
simple barrier.
Also according to those documents, and according to existing practice in
Linux (eg. spin_unlock doesn't enforce ordering), stores to cacheable
memory are visible in program order too. Special string stores are safe
-- their constituent stores may be out of order, but they must complete
in order WRT surrounding stores. Nontemporal stores to WB memory can go
out of order, and so they should be fenced explicitly to make them
appear in-order WRT other stores. Hence, smp_wmb() may be a simple
barrier.
http://developer.intel.com/products/processor/manuals/318147.pdfhttp://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf
In userspace microbenchmarks on a core2 system, fence instructions range
anywhere from around 15 cycles to 50, which may not be totally
insignificant in performance critical paths (code size will go down
too).
However the primary motivation for this is to have the canonical barrier
implementation for x86 architecture.
smp_rmb on buggy pentium pros remains a locked op, which is apparently
required.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
wmb() on x86 must always include a barrier, because stores can go out of
order in many cases when dealing with devices (eg. WC memory).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
movnt* instructions are not strongly ordered with respect to other stores,
so if we are to assume stores are strongly ordered in the rest of the 64
bit code, we must fence these off (see similar examples in 32 bit code).
[ The AMD memory ordering document seems to say that nontemporal stores can
also pass earlier regular stores, so maybe we need sfences _before_
movnt* everywhere too? ]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (119 commits)
[libata] struct pci_dev related cleanups
libata: use ata_exec_internal() for PMP register access
libata: implement ATA_PFLAG_RESETTING
libata: add @timeout to ata_exec_internal[_sg]()
ahci: fix notification handling
ahci: clean up PORT_IRQ_BAD_PMP enabling
ahci: kill leftover from enabling NCQ over PMP
libata: wrap schedule_timeout_uninterruptible() in loop
libata: skip suppress reporting if ATA_EHI_QUIET
libata: clear ehi description after initial host report
pata_jmicron: match vendor and class code only
libata: add ST9160821AS / 3.ALD to NCQ blacklist
pata_acpi: ACPI driver support
libata-core: Expose gtm methods for driver use
libata: add HDT722516DLA380 to NCQ blacklist
libata: blacklist NCQ on Seagate Barracuda ST380817AS
[libata] Turn on ACPI by default
libata_scsi: Fix ATAPI transfer lengths
libata: correct handling of SRST reset sequences
libata: Integrate ACPI-based PATA/SATA hotplug - version 5
...