Commit graph

440951 commits

Author SHA1 Message Date
Alexander Graf
07fec1c2e7 KVM: PPC: E500: Ignore L1CSR1_ICFI,ICLFR
The L1 instruction cache control register contains bits that indicate
that we're still handling a request. Mask those out when we set the SPR
so that a read doesn't assume we're still doing something.

Signed-off-by: Alexander Graf <agraf@suse.de>
2014-05-30 14:26:17 +02:00
Nadav Amit
1f85411255 KVM: vmx: DR7 masking on task switch emulation is wrong
The DR7 masking which is done on task switch emulation should be in hex format
(clearing the local breakpoints enable bits 0,2,4 and 6).

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:18 +02:00
Dave Hansen
65a7f03f6b x86: fix page fault tracing when KVM guest support enabled
I noticed on some of my systems that page fault tracing doesn't
work:

	cd /sys/kernel/debug/tracing
	echo 1 > events/exceptions/enable
	cat trace;
	# nothing shows up

I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
KVM VM, enabling that option breaks page fault tracing, and
disabling fixes it.  I tried on some old kernels and this does
not appear to be a regression: it never worked.

There are two page-fault entry functions today.  One when tracing
is on and another when it is off.  The KVM code calls do_page_fault()
directly instead of calling the traced version:

> dotraplinkage void __kprobes
> do_async_page_fault(struct pt_regs *regs, unsigned long
> error_code)
> {
>         enum ctx_state prev_state;
>
>         switch (kvm_read_and_reset_pf_reason()) {
>         default:
>                 do_page_fault(regs, error_code);
>                 break;
>         case KVM_PV_REASON_PAGE_NOT_PRESENT:

I'm also having problems with the page fault tracing on bare
metal (same symptom of no trace output).  I'm unsure if it's
related.

Steven had an alternative to this which has zero overhead when
tracing is off where this includes the standard noops even when
tracing is disabled.  I'm unconvinced that the extra complexity
of his apporach:

	http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home

is worth it, expecially considering that the KVM code is already
making page fault entry slower here.  This solution is
dirt-simple.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:17 +02:00
Paolo Bonzini
ae9fedc793 KVM: x86: get CPL from SS.DPL
CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.

However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).

So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:17 +02:00
Paolo Bonzini
5045b46803 KVM: x86: check CS.DPL against RPL during task switch
Table 7-1 of the SDM mentions a check that the code segment's
DPL must match the selector's RPL.  This was not done by KVM,
fix it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:17 +02:00
Paolo Bonzini
fb5e336b97 KVM: x86: drop set_rflags callback
Not needed anymore now that the CPL is computed directly
during task switch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:47:16 +02:00
Paolo Bonzini
2356aaeb2f KVM: x86: use new CS.RPL as CPL during task switch
During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition
to all the other requirements) and will be the new CPL.  So far this
worked by carefully setting the CS selector and flag before doing the
task switch; setting CS.selector will already change the CPL.

However, this will not work once we get the CPL from SS.DPL, because
then you will have to set the full segment descriptor cache to change
the CPL.  ctxt->ops->cpl(ctxt) will then return the old CPL during the
task switch, and the check that SS.DPL == CPL will fail.

Temporarily assume that the CPL comes from CS.RPL during task switch
to a protected-mode task.  This is the same approach used in QEMU's
emulation code, which (until version 2.0) manually tracks the CPL.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-22 17:45:38 +02:00
Paolo Bonzini
afa538f0a1 1. Correct locking for lazy storage key handling
A test loop with multiple CPUs triggered a race in the lazy storage
    key handling as introduced by commit 934bc131ef
    (KVM: s390: Allow skeys to be enabled for the current process). This
    race should not happen with Linux guests, but let's fix it anyway.
    Patch touches !/kvm/ code, but is from the s390 maintainer.
 
 2. Better handling of broken guests
    If we detect a program check loop we stop the guest instead of
    wasting CPU cycles.
 
 3. Better handling on MVPG emulation
    The move page handling is improved to be architecturally correct.
 
 3. Trace point rework
    Let's rework the kvm trace points to have a common header file (for
    later perf usage) and provided a table based instruction decoder.
 
 4. Interpretive execution of SIGP external call
    Let the hardware handle most cases of SIGP external call (IPI) and
    wire up the fixup code for the corner cases.
 
 5. Initial preparations for the IBC facility
    Prepare the code to handle instruction blocking
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTdhD5AAoJEBF7vIC1phx80kcQAI4gpl1DR5GlIgbTmVCrX7qR
 gljTqGRJv8w4nG+ecrXSJyeX+a6LxEeCzn1bDNTyBm75kInq+/vPYw+BoBHvk4vO
 2NFi6jq4xwAmctGaysU1LgZetkiXoiEZ2TYW7450ZaGCFLSkNHPPBPutOa0t9154
 e+NlDZ8sCF2b3BM54pOleiADLN9J6xzGQQsJaSXQyeKhheOec1PCXJ0WlZt8g8ud
 EuhYVBC5QyiYI2kzvIMowx2Y62o8CKhGF9Q5t71GcVqGUsAi8aEg0eQ+5CBkLmXc
 MF4lfUQamxbLWZDOAJu1pbRg0rZD9OR2qZIrqyu56f41HNAVRtQqL3qMb1LmMfEL
 8vHonbdgCtfLIwtVq5z9uOyQ3q+DpEXeBvMlcKfJ6ms4Cs5aEFDQO+3SZwsTNjH0
 JyyLY+xC8imorHe8h6/H+rYP66GPdCrG7RK2jufUUpejt2Zg3wbEJRCjr73JN6T3
 iGMq1H7EEdSYOgQ8O1iJzqbWvlzZcNp2NyWlwcMBeXTCsqVtRDIBOnWqYpv/apFP
 DMtZd8rw9g0L41TWxVf5zWJcoQNIvSfuN8keOHWq6FRGMDp+0uAC+kSYJuzRZ8ef
 5K5Srp7+ykIElLKONzOpP4obo7WvfF9ZS7U+gX19ZRiWVoSm18oKdJEFOHOX1KDW
 ZLqrtMui01c26h42+gsQ
 =H1bu
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140516' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

1. Correct locking for lazy storage key handling
   A test loop with multiple CPUs triggered a race in the lazy storage
   key handling as introduced by commit 934bc131ef
   (KVM: s390: Allow skeys to be enabled for the current process). This
   race should not happen with Linux guests, but let's fix it anyway.
   Patch touches !/kvm/ code, but is from the s390 maintainer.

2. Better handling of broken guests
   If we detect a program check loop we stop the guest instead of
   wasting CPU cycles.

3. Better handling on MVPG emulation
   The move page handling is improved to be architecturally correct.

3. Trace point rework
   Let's rework the kvm trace points to have a common header file (for
   later perf usage) and provided a table based instruction decoder.

4. Interpretive execution of SIGP external call
   Let the hardware handle most cases of SIGP external call (IPI) and
   wire up the fixup code for the corner cases.

5. Initial preparations for the IBC facility
   Prepare the code to handle instruction blocking
2014-05-16 23:02:40 +02:00
Michael Mueller
fda902cb83 KVM: s390: split SIE state guest prefix field
This patch splits the SIE state guest prefix at offset 4
into a prefix bit field. Additionally it provides the
access functions:

 - kvm_s390_get_prefix()
 - kvm_s390_set_prefix()

to access the prefix per vcpu.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:31 +02:00
Michael Mueller
570126d370 s390/sclp: add sclp_get_ibc function
The patch adds functionality to retrieve the IBC configuration
by means of function sclp_get_ibc().

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:30 +02:00
David Hildenbrand
4953919fee KVM: s390: interpretive execution of SIGP EXTERNAL CALL
If the sigp interpretation facility is installed, most SIGP EXTERNAL CALL
operations will be interpreted instead of intercepted. A partial execution
interception will occurr at the sending cpu only if the target cpu is in the
wait state ("W" bit in the cpuflags set). Instruction interception will only
happen in error cases (e.g. cpu addr invalid).

As a sending cpu might set the external call interrupt pending flags at the
target cpu at every point in time, we can't handle this kind of interrupt using
our kvm interrupt injection mechanism. The injection will be done automatically
by the SIE when preparing the start of the target cpu.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
CC: Thomas Huth <thuth@linux.vnet.ibm.com>
[Adopt external call injection to check for sigp interpretion]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:28 +02:00
Alexander Yarygin
d26b8655f0 KVM: s390: Use intercept_insn decoder in trace event
The current trace definition doesn't work very well with the perf tool.
Perf shows a "insn_to_mnemonic not found" message. Let's handle the
decoding completely in a parseable format.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:27 +02:00
Alexander Yarygin
05db1f6e03 KVM: s390: decoder of SIE intercepted instructions
This patch adds a new decoder of SIE intercepted instructions.

The decoder implemented as a macro and potentially can be used in
both kernelspace and userspace.

Note that this simplified instruction decoder is only intended to be
used with the subset of instructions that may cause a SIE intercept.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:26 +02:00
Alexander Yarygin
6de1bf88df KVM: s390: Use trace tables from sie.h.
Use the symbolic translation tables from sie.h for decoding diag, sigp
and sie exit codes.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:24 +02:00
Alexander Yarygin
ceae283bb2 KVM: s390: add sie exit reasons tables
This patch defines tables of reasons for exiting from SIE mode
in a new sie.h header file. Tables contain SIE intercepted codes,
intercepted instructions and program interruptions codes.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:23 +02:00
Thomas Huth
f22166dcfd KVM: s390: Improved MVPG partial execution handler
Use the new helper function kvm_arch_fault_in_page() for faulting-in
the guest pages and only inject addressing errors when we've really
hit a bad address (and return other error codes to userspace instead).

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:22 +02:00
Thomas Huth
fa576c583d KVM: s390: Introduce helper function for faulting-in a guest page
Rework the function kvm_arch_fault_in_sync() to become a proper helper
function for faulting-in a guest page. Now it takes the guest address as
a parameter and does not ignore the possible error code from gmap_fault()
anymore (which could cause undetected error conditions before).

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:20 +02:00
Thomas Huth
684135e096 KVM: s390: Avoid endless loops of specification exceptions
If the new PSW for program interrupts is invalid, the VM ends up
in an endless loop of specification exceptions. Since there is not
much left we can do in this case, we should better drop to userspace
instead so that the crash can be reported to the user.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:19 +02:00
Thomas Huth
a3fb577e48 KVM: s390: Improve is_valid_psw()
As a program status word is also invalid (and thus generates an
specification exception) if the instruction address is not even,
we should test this in is_valid_psw(), too. This patch also exports
the function so that it becomes available for other parts of the
S390 KVM code as well.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:18 +02:00
Martin Schwidefsky
3a801517ad KVM: s390: correct locking for s390_enable_skey
Use the mm semaphore to serialize multiple invocations of s390_enable_skey.
The second CPU faulting on a storage key operation needs to wait for the
completion of the page table update. Taking the mm semaphore writable
has the positive side-effect that it prevents any host faults from
taking place which does have implications on keys vs PGSTE.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-16 14:57:17 +02:00
Jan Kiszka
d9f89b88f5 KVM: x86: Fix CR3 reserved bits check in long mode
Regression of 346874c9: PAE is set in long mode, but that does not mean
we have valid PDPTRs.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-12 20:04:01 +02:00
Gabriel L. Somlo
87c00572ba kvm: x86: emulate monitor and mwait instructions as nop
Treat monitor and mwait instructions as nop, which is architecturally
correct (but inefficient) behavior. We do this to prevent misbehaving
guests (e.g. OS X <= 10.7) from crashing after they fail to check for
monitor/mwait availability via cpuid.

Since mwait-based idle loops relying on these nop-emulated instructions
would keep the host CPU pegged at 100%, do NOT advertise their presence
via cpuid, to prevent compliant guests from using them inadvertently.

Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-08 15:40:49 +02:00
Michael S. Tsirkin
b63cf42fd1 kvm/x86: implement hv EOI assist
It seems that it's easy to implement the EOI assist
on top of the PV EOI feature: simply convert the
page address to the format expected by PV EOI.

Notes:
-"No EOI required" is set only if interrupt injected
 is edge triggered; this is true because level interrupts are going
 through IOAPIC which disables PV EOI.
 In any case, if guest triggers EOI the bit will get cleared on exit.
-For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets
 KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE
 seems sufficient
 In any case, bit is cleared on exit so worst case it's never re-enabled
-no handling of PV EOI data is performed at HV_X64_MSR_EOI write;
 HV_X64_MSR_EOI is a separate optimization - it's an X2APIC
 replacement that lets you do EOI with an MSR and not IO.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07 18:00:49 +02:00
Nadav Amit
5f7dde7bbb KVM: x86: Mark bit 7 in long-mode PDPTE according to 1GB pages support
In long-mode, bit 7 in the PDPTE is not reserved only if 1GB pages are
supported by the CPU. Currently the bit is considered by KVM as always
reserved.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07 17:25:22 +02:00
Nadav Amit
a4ab9d0cf1 KVM: vmx: handle_dr does not handle RSP correctly
The RSP register is not automatically cached, causing mov DR instruction with
RSP to fail.  Instead the regular register accessing interface should be used.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-07 17:24:59 +02:00
Bandan Das
4291b58885 KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptr
Some checks are common to all, and moreover,
according to the spec, the check for whether any bits
beyond the physical address width are set are also
applicable to all of them

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06 19:00:43 +02:00
Bandan Das
96ec146330 KVM: nVMX: fail on invalid vmclear/vmptrld pointer
The spec mandates that if the vmptrld or vmclear
address is equal to the vmxon region pointer, the
instruction should fail with error "VMPTRLD with
VMXON pointer" or "VMCLEAR with VMXON pointer"

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06 19:00:37 +02:00
Bandan Das
3573e22cfe KVM: nVMX: additional checks on vmxon region
Currently, the vmxon region isn't used in the nested case.
However, according to the spec, the vmxon instruction performs
additional sanity checks on this region and the associated
pointer. Modify emulated vmxon to better adhere to the spec
requirements

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06 19:00:27 +02:00
Bandan Das
19677e32fe KVM: nVMX: rearrange get_vmx_mem_address
Our common function for vmptr checks (in 2/4) needs to fetch
the memory address

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-06 18:59:57 +02:00
Paolo Bonzini
2ce316f0b9 1. Fixes an error return code for the breakpoint setup
2. External interrupt fixes
 2.1. Some interrupt conditions like cpu timer or clock comparator
 stay pending even after the interrupt is injected. If the external
 new PSW is enabled for interrupts this will result in an endless
 loop. Usually this indicates a programming error in the guest OS.
 Lets detect such situations and go to userspace. We will provide
 a QEMU patch that sets the guest in panicked/crashed state to avoid
 wasting CPU cycles.
 2.2 Resend external interrupts back to the guest if the HW could
 not do it.
 -
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTaOGKAAoJEBF7vIC1phx8IqkP/0zQ3gWbYdGV20UEvIB+oHsO
 u7OysZdyfXS3wx6rysTWepQJ6rtWJ/yQSyzTt+RnCTYxUnyhMVPKMJOmoztyhkD5
 37I9ricqMS/Ob5A3pKGEW2p/TojPYL5o8svCRt+UWbyxz05AQiCEPteeD7MrcOK+
 ASULR2z2h95EYfrMhZSeFjFoXHrPfeMoR5OVESP8gef7uGTlqIZO1mZ6QkAFqL/b
 VtqCI74oTc+XpNj7jxnvxznilqnvjD31oaci2oK+AX+DQcwOnTIGuUlU1bS+XOwm
 WFbDKUbksNC/QQ2hPqcCvZTtK+U7XlPZz7pRyEdvHYRckaNDzLbiLzYHvRGgCHoq
 uy9u429L1pthoj1vQvUY2ZD4HyI4K/UusApie5x3hmYlePNSEcC7TNDt2SvdjrID
 yX6X9zWC9ffHSmKLBI11PWNs5R1EUrUlBcZ7CFDDmJDCeKRmwmY1+nuYSm7x80iB
 ctfpXTJG4Ajrbbki5LCdoLPU0piR/IkSEwxeEY0u/5XLcdEiY/Z3SEJzlWeuIPf6
 bNuWQK8YP6ane8p3Vc/UwmtMgaCEsnAwYrcRfmjOEQfVDxmRzHARIxbIFs0EsM54
 S+6SH6LN1HCeFsG3zvpwPrm9gK2GojvJ0tCwZ78UZZx5m4CrgtHVHHfbspygftv8
 6L/YJ/Q0PQja0s3lx/Eh
 =R95o
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140506' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

1. Fixes an error return code for the breakpoint setup

2. External interrupt fixes
2.1. Some interrupt conditions like cpu timer or clock comparator
stay pending even after the interrupt is injected. If the external
new PSW is enabled for interrupts this will result in an endless
loop. Usually this indicates a programming error in the guest OS.
Lets detect such situations and go to userspace. We will provide
a QEMU patch that sets the guest in panicked/crashed state to avoid
wasting CPU cycles.
2.2 Resend external interrupts back to the guest if the HW could
not do it.
-
2014-05-06 17:20:37 +02:00
Thomas Huth
f14d82e06a KVM: s390: Fix external interrupt interception
The external interrupt interception can only occur in rare cases, e.g.
when the PSW of the interrupt handler has a bad value. The old handler
for this interception simply ignored these events (except for increasing
the exit_external_interrupt counter), but for proper operation we either
have to inject the interrupts manually or we should drop to userspace in
case of errors.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-06 14:58:10 +02:00
Thomas Huth
e029ae5b78 KVM: s390: Add clock comparator and CPU timer IRQ injection
Add an interface to inject clock comparator and CPU timer interrupts
into the guest. This is needed for handling the external interrupt
interception.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-06 14:58:05 +02:00
Dan Carpenter
fcc9aec3de KVM: s390: return -EFAULT if copy_from_user() fails
When copy_from_user() fails, this code returns the number of bytes
remaining instead of a negative error code.  The positive number is
returned to the user but otherwise it is harmless.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-05-06 14:57:59 +02:00
Ulrich Obergfell
1171903d89 KVM: x86: improve the usability of the 'kvm_pio' tracepoint
This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated()
and emulator_pio_out_emulated(), and it adds an argument (a pointer to
the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched
from 'pio_data' (depending on 'size'), and the value is included in the
trace record ('val'). If 'count' is greater than one, this is indicated
by the string "(...)" in the trace output.

Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-05 22:42:05 +02:00
Christian Borntraeger
719d93cd5f kvm/irqchip: Speed up KVM_SET_GSI_ROUTING
When starting lots of dataplane devices the bootup takes very long on
Christian's s390 with irqfd patches. With larger setups he is even
able to trigger some timeouts in some components.  Turns out that the
KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
when having multiple CPUs.  This is caused by the  synchronize_rcu and
the HZ=100 of s390.  By changing the code to use a private srcu we can
speed things up.  This patch reduces the boot time till mounting root
from 8 to 2 seconds on my s390 guest with 100 disks.

Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
uses rcu_dereference_raw rather than rcu_dereference, and write-sides
do not do rcu lockdep at all).

Note that we're hardly relying on the "sleepable" part of srcu.  We just
want SRCU's faster detection of grace periods.

Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS
and RR.  The difference between results "before" and "after" the patch
has mean -0.2% and standard deviation 0.6%.  Using a paired t-test on the
data points says that there is a 2.5% probability that the patch is the
cause of the performance difference (rather than a random fluctuation).

(Restricting the t-test to RR, which is the most likely to be affected,
changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8%
probability that the numbers actually say something about the patch.
The probability increases mostly because there are fewer data points).

Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-05-05 16:29:11 +02:00
Paolo Bonzini
57b5981cd3 1. Guest handling fixes
The handling of MVPG, PFMF and Test Block is fixed to better follow
 the architecture. None of these fixes is critical for any current
 Linux guests, but let's play safe.
 
 2. Optimization for single CPU guests
 We can enable the IBS facility if only one VCPU is running (!STOPPED
 state). We also enable this optimization for guest > 1 VCPU as soon
 as all but one VCPU is in stopped state. Thus will help guests that
 have tools like cpuplugd (from s390-utils) that do dynamic offline/
 online of CPUs.
 
 3. NOTES
 There is one non-s390 change in include/linux/kvm_host.h that
 introduces 2 defines for VCPU requests:
 define KVM_REQ_ENABLE_IBS        23
 define KVM_REQ_DISABLE_IBS       24
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTX6eTAAoJEBF7vIC1phx8S8YQAKRQ1Oe75qLS+F1yipNHi905
 7byi0K2nuGF/K4NSkQeWyv8mqjlWIEE+a8PR69mrgx6biae/Sn6l2ZZiV3Ml0flH
 9FEIlu0PU/RvyPfERT9MxUsbY2Dbec4r3Q3U+RLftlAht6oA/AaiLaY6cKSzwXqa
 vTqm7VORgTsM7JUkdoC/BdwNH6+94I7IM6CGaWMmWqELAhKq6SUCmc8g26bvEjBd
 94bkUEBYgHuceEPYAmDA1r7QSStpnU+cgj0haHRI12g4y1PhuBuFkmAGPv/E60wT
 iGOkhQ5XchR6dmZBLC/zRbjObi5NqqRojRYcI8RZfbdfvxtD8+xKg7G++KXw+VkK
 lR2CyJfrXms9r90mo7Oi3oCEuy0gC5ToOtRbOb/Xo8CYjCd6pS8DJraaQQfUhsoX
 koWiPVu87Gk6GMALZdJYCAdHgOhwC/dglKVGHThsFFVwLgsC7ZKsluWYA/Y0FRtA
 B2A7DBIC5O1NNXs3CYIv+v0M3jNF+4tt7wxV4omNPiSZTb4IAhG+0ucvU+hIZFbm
 i07a1sULAOSAX7qeizmHamomrC5NeEOUQeQ2ciQbSH9L+GEZQj10YaOXXKQh79C+
 7LXTtXWzESzx+EXBuCPcFsNt84GH294YoFTitHzN0hAburkVqwQzIAEAby7ylgX2
 WUC6UjhyheZr0rBqO2g9
 =Lxxg
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140429' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

1. Guest handling fixes
The handling of MVPG, PFMF and Test Block is fixed to better follow
the architecture. None of these fixes is critical for any current
Linux guests, but let's play safe.

2. Optimization for single CPU guests
We can enable the IBS facility if only one VCPU is running (!STOPPED
state). We also enable this optimization for guest > 1 VCPU as soon
as all but one VCPU is in stopped state. Thus will help guests that
have tools like cpuplugd (from s390-utils) that do dynamic offline/
online of CPUs.

3. NOTES
There is one non-s390 change in include/linux/kvm_host.h that
introduces 2 defines for VCPU requests:
define KVM_REQ_ENABLE_IBS        23
define KVM_REQ_DISABLE_IBS       24
2014-04-30 12:29:41 +02:00
Marcelo Tosatti
e4c9a5a175 KVM: x86: expose invariant tsc cpuid bit (v2)
Invariant TSC is a property of TSC, no additional
support code necessary.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-29 15:22:43 +02:00
David Hildenbrand
8ad3575517 KVM: s390: enable IBS for single running VCPUs
This patch enables the IBS facility when a single VCPU is running.
The facility is dynamically turned on/off as soon as other VCPUs
enter/leave the stopped state.

When this facility is operating, some instructions can be executed
faster for single-cpu guests.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:54 +02:00
David Hildenbrand
6852d7b69b KVM: s390: introduce kvm_s390_vcpu_{start,stop}
This patch introduces two new functions to set/clear the CPUSTAT_STOPPED bit and
makes use of it at all applicable places. These functions prepare the additional
execution of code when starting/stopping a vcpu.

The CPUSTAT_STOPPED bit should not be touched outside of these functions.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:54 +02:00
Thomas Huth
e45efa28e5 KVM: s390: Add low-address protection to TEST BLOCK
TEST BLOCK is also subject to the low-address protection, so we need
to check the destination address in our handler.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:53 +02:00
Thomas Huth
fb34c60365 KVM: s390: Fixes for PFMF
Add a check for low-address protection to the PFMF handler and
convert real-addresses to absolute if necessary, as it is defined
in the Principles of Operations specification.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:53 +02:00
Thomas Huth
f8232c8cf7 KVM: s390: Add a function for checking the low-address protection
The s390 architecture has a special protection mechanism that can
be used to prevent write access to the vital data in the low-core
memory area. This patch adds a new helper function that can be used
to check for such write accesses and in case of protection, it also
sets up the exception data accordingly.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:52 +02:00
Thomas Huth
9a558ee3cc KVM: s390: Handle MVPG partial execution interception
When the guest executes the MVPG instruction with DAT disabled,
and the source or destination page is not mapped in the host,
the so-called partial execution interception occurs. We need to
handle this event by setting up a mapping for the corresponding
user pages.

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-29 15:01:51 +02:00
Oleg Nesterov
e9545b9f8a KVM: async_pf: change async_pf_execute() to use get_user_pages(tsk => NULL)
async_pf_execute() passes tsk == current to gup(), this is doesn't
hurt but unnecessary and misleading. "tsk" is only used to account
the number of faults and current is the random workqueue thread.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-28 17:24:55 +02:00
Oleg Nesterov
d72d946d0b KVM: async_pf: kill the unnecessary use_mm/unuse_mm async_pf_execute()
async_pf_execute() has no reasons to adopt apf->mm, gup(current, mm)
should work just fine even if current has another or NULL ->mm.

Recently kvm_async_page_present_sync() was added insedie the "use_mm"
section, but it seems that it doesn't need current->mm too.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-04-28 17:24:25 +02:00
Xiao Guangrong
198c74f43f KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
Now we can flush all the TLBs out of the mmu lock without TLB corruption when
write-proect the sptes, it is because:
- we have marked large sptes readonly instead of dropping them that means we
  just change the spte from writable to readonly so that we only need to care
  the case of changing spte from present to present (changing the spte from
  present to nonpresent will flush all the TLBs immediately), in other words,
  the only case we need to care is mmu_spte_update()

- in mmu_spte_update(), we haved checked
  SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that
  means it does not depend on PT_WRITABLE_MASK anymore

Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:52 -03:00
Xiao Guangrong
7f31c9595e KVM: MMU: flush tlb if the spte can be locklessly modified
Relax the tlb flush condition since we will write-protect the spte out of mmu
lock. Note lockless write-protection only marks the writable spte to readonly
and the spte can be writable only if both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)

This patch is used to avoid this kind of race:

      VCPU 0                         VCPU 1
lockless wirte protection:
      set spte.w = 0
                                 lock mmu-lock

                                 write protection the spte to sync shadow page,
                                 see spte.w = 0, then without flush tlb

				 unlock mmu-lock

                                 !!! At this point, the shadow page can still be
                                     writable due to the corrupt tlb entry
     Flush all TLB

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:51 -03:00
Xiao Guangrong
c126d94f2c KVM: MMU: lazily drop large spte
Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them un-present, the page fault caused by read access can
be avoided

The idea is from Avi:
| As I mentioned before, write-protecting a large spte is a good idea,
| since it moves some work from protect-time to fault-time, so it reduces
| jitter.  This removes the need for the return value.

This version has fixed the issue reported in 6b73a9606, the reason of that
issue is that fast_page_fault() directly sets the readonly large spte to
writable but only dirty the first page into the dirty-bitmap that means
other pages are missed. Fixed it by only the normal sptes (on the
PT_PAGE_TABLE_LEVEL level) can be fast fixed

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:50 -03:00
Xiao Guangrong
92a476cbfc KVM: MMU: properly check last spte in fast_page_fault()
Using sp->role.level instead of @level since @level is not got from the
page table hierarchy

There is no issue in current code since the fast page fault currently only
fixes the fault caused by dirty-log that is always on the last level
(level = 1)

This patch makes the code more readable and avoids potential issue in the
further development

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:49 -03:00
Xiao Guangrong
a086f6a1eb Revert "KVM: Simplify kvm->tlbs_dirty handling"
This reverts commit 5befdc385d.

Since we will allow flush tlb out of mmu-lock in the later
patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2014-04-23 17:49:48 -03:00