Commit graph

383 commits

Author SHA1 Message Date
Heiko Carstens
6b70a92080 s390/memory hotplug: use pfmf instruction to initialize storage keys
Move and rename init_storage_keys() to pageattr.c, so it can also be
used from the sclp memory hotplug code in order to initialize
storage keys.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-23 11:14:30 +01:00
Heiko Carstens
a4f32bdbd9 s390/mm: keep fault_init() private to fault.c
Just convert fault_init() to an early initcall. That's still early
enough since it only needs be called before user space processes get
executed. No reason to externalize it.
Also add the function to the init section and move the store_indication
variable to the read_mostly section.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-23 11:14:29 +01:00
Heiko Carstens
f7817968d0 s390/mm,vmemmap: use 1MB frames for vmemmap
Use 1MB frames for vmemmap if EDAT1 is available in order to
reduce TLB pressure
Always use a 1MB frame even if its only partially needed for
struct pages. Otherwise we would end up with a mix of large
frame and page mappings, because vmemmap_populate gets called
for each section (256MB -> 3.5MB memmap) separately.
Worst case is that we would waste 512KB.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-23 11:14:25 +01:00
Heiko Carstens
18da236908 s390/mm,vmem: use 2GB frames for identity mapping
Use 2GB frames for indentity mapping if EDAT2 is
available to reduce TLB pressure.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-23 11:14:24 +01:00
Heiko Carstens
516bad44b9 s390/gup: fix access_ok() usage in __get_user_pages_fast()
access_ok() returns always "true" on s390. Therefore all access_ok()
invocations are rather pointless.
However when walking page tables we need to make sure that everything
is within bounds of the ASCE limit of the task's address space.
So remove the access_ok() call and add the same check we have in
get_user_pages_fast().

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-13 11:02:28 +01:00
Heiko Carstens
d55c4c613f s390/gup: add missing TASK_SIZE check to get_user_pages_fast()
When walking page tables we need to make sure that everything
is within bounds of the ASCE limit of the task's address space.
Otherwise we might calculate e.g. a pud pointer which is not
within a pud and dereference it.
So check against TASK_SIZE (which is the ASCE limit) before
walking page tables.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-11-13 11:02:26 +01:00
Gerald Schaefer
156152f84e s390/mm: use pmd_large() instead of pmd_huge()
Without CONFIG_HUGETLB_PAGE, pmd_huge() will always return 0. So
pmd_large() should be used instead in places where both transparent
huge pages and hugetlbfs pages can occur.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-26 16:44:23 +02:00
Heiko Carstens
fc7e48aad3 s390/mm,vmem: fix vmem_add_mem()/vmem_remove_range()
vmem_add_mem() should only then insert a large page if pmd_none() is true
for the specific entry. We might have a leftover from a previous mapping.
In addition make vmem_remove_range()'s page table walk code more complete
and fix a couple of potential endless loops (which can never happen :).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09 14:17:01 +02:00
Heiko Carstens
c972cc60c2 s390/vmalloc: have separate modules area
Add a special module area on top of the vmalloc area, which may be only
used for modules and bpf jit generated code.
This makes sure that inter module branches will always happen without a
trampoline and in addition having all the code within a 2GB frame is
branch prediction unit friendly.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09 14:17:01 +02:00
Heiko Carstens
8fe234d3c8 s390/mm: fix mapping of read-only kernel text section
Within the identity mapping the kernel text section is mapped read-only.
However when mapping the first and last page of the text section we must
round upwards and downwards respectively, if only parts of a page belong
to the section.
Otherwise potential rw data can be mapped read-only. So the rounding must
be done just the other way we have it right now.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09 14:16:59 +02:00
Heiko Carstens
e76e82d772 s390/mm: add page table dumper
This is more or less the same as the x86 page table dumper which was
merged four years ago: 926e5392 "x86: add code to dump the (kernel)
page tables for visual inspection by kernel developers".

We add a file at /sys/kernel/debug/kernel_page_tables for debugging
purposes so it's quite easy to see the kernel page table layout and
possible odd mappings:

---[ Identity Mapping ]---
0x0000000000000000-0x0000000000100000        1M PTE RW
---[ Kernel Image Start ]---
0x0000000000100000-0x0000000000800000        7M PMD RO
0x0000000000800000-0x00000000008a9000      676K PTE RO
0x00000000008a9000-0x0000000000900000      348K PTE RW
0x0000000000900000-0x0000000001500000       12M PMD RW
---[ Kernel Image End ]---
0x0000000001500000-0x0000000280000000    10219M PMD RW
0x0000000280000000-0x000003d280000000     3904G PUD I
---[ vmemmap Area ]---
0x000003d280000000-0x000003d288c00000      140M PTE RW
0x000003d288c00000-0x000003d300000000     1908M PMD I
0x000003d300000000-0x000003e000000000       52G PUD I
---[ vmalloc Area ]---
0x000003e000000000-0x000003e000009000       36K PTE RW
0x000003e000009000-0x000003e0000ee000      916K PTE I
0x000003e0000ee000-0x000003e000146000      352K PTE RW
0x000003e000146000-0x000003e000200000      744K PTE I
0x000003e000200000-0x000003e080000000     2046M PMD I
0x000003e080000000-0x0000040000000000      126G PUD I

This usually makes only sense for kernel developers. The output
with CONFIG_DEBUG_PAGEALLOC is not very helpful, because of the
huge number of mapped out pages, however I decided for the time
being to not add a !DEBUG_PAGEALLOC dependency.
Maybe it's helpful for somebody even with that option.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09 14:16:58 +02:00
Heiko Carstens
cc5b9a4518 s390/mm,pageattr: remove superfluous EXPORT_SYMBOLs
Remove some EXPORT_SYMBOLs. There is no module code anywhere
which calls these pageattr functions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09 14:16:57 +02:00
Heiko Carstens
5b1ba9e30c s390/mm,pageattr: add more page table walk sanity checks
The current page table walk code in pageattr.c only checks for large pages
while walking the kernel page table, but happily assumes that everything
else is just fine.
Add more checks so we never access invalid memory regions.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09 14:16:57 +02:00
Heiko Carstens
378b1e7a80 s390/mm: fix pmd_huge() usage for kernel mapping
pmd_huge() will always return 0 on !HUGETLBFS, however we use that helper
function when walking the kernel page tables to decide if we have a
1MB page frame or not.
Since we create 1MB frames for the kernel 1:1 mapping independently of
HUGETLBFS this can lead to incorrect storage accesses since the code
can assume that we have a pointer to a page table instead of a pointer
to a 1MB frame.

Fix this by adding a pmd_large() primitive like other architectures have
it already and remove all references to HUGETLBFS/HUGETLBPAGE from the
code that walks kernel page tables.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-10-09 14:16:56 +02:00
Shaohua Li
45cac65b0f readahead: fault retry breaks mmap file read random detection
.fault now can retry.  The retry can break state machine of .fault.  In
filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
try, since the page is in page cache now, ra->mmap_miss is decreased.  And
these are done in one fault, so we can't detect random mmap file access.

Add a new flag to indicate .fault is tried once.  In the second try, skip
ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

I only tested x86, didn't test other archs, but looks the change for other
archs is obvious, but who knows :)

Signed-off-by: Shaohua Li <shaohua.li@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:47 +09:00
Gerald Schaefer
1ae1c1d09f thp, s390: architecture backend for thp on s390
This implements the architecture backend for transparent hugepages
on s390.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:31 +09:00
Gerald Schaefer
274023da1e thp, s390: disable thp for kvm host on s390
This patch is part of the architecture backend for thp on s390.  It
disables thp for kvm hosts, because there is no kvm host hugepage support
so far.  Existing thp mappings are split by follow_page() with FOLL_SPLIT,
and future thp mappings are prevented by setting VM_NOHUGEPAGE in
mm->def_flags.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:30 +09:00
Gerald Schaefer
9501d09fa3 thp, s390: thp pagetable pre-allocation for s390
This patch is part of the architecture backend for thp on s390.  It
provides the pagetable pre-allocation functions
pgtable_trans_huge_deposit() and pgtable_trans_huge_withdraw().  Unlike
other archs, s390 has no struct page * as pgtable_t, but rather a pointer
to the page table.  So instead of saving the pagetable pre- allocation
list info inside the struct page, it is being saved within the pagetable
itself.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:30 +09:00
Gerald Schaefer
75077afbec thp, s390: thp splitting backend for s390
This patch is part of the architecture backend for thp on s390.  It
provides the functions related to thp splitting, including serialization
against gup.  Unlike other archs, pmdp_splitting_flush() cannot use a tlb
flushing operation to serialize against gup on s390, because that wouldn't
be stopped by the disabled IRQs.  So instead, smp_call_function() is
called with an empty function, which will have the expected effect.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:30 +09:00
Heiko Carstens
5e249d6e11 s390/mm: mark free_initrd_mem() as __init
Same as 0d26d1d8 "x86/mm: Mark free_initrd_mem() as __init".
In addition also add the __init annotation to setup_zero_pages().

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26 15:45:27 +02:00
Heiko Carstens
41459d36cf s390: add uninitialized_var() to suppress false positive compiler warnings
Get rid of these:

arch/s390/kernel/smp.c:134:19: warning: ‘status’ may be used uninitialized in this function [-Wuninitialized]
arch/s390/mm/pgtable.c:641:10: warning: ‘table’ may be used uninitialized in this function [-Wuninitialized]
arch/s390/mm/pgtable.c:644:12: warning: ‘page’ may be used uninitialized in this function [-Wuninitialized]
drivers/s390/cio/cio.c:1037:14: warning: ‘schid’ may be used uninitialized in this function [-Wuninitialized]

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26 15:45:23 +02:00
Heiko Carstens
eb608fb366 s390/exceptions: switch to relative exception table entries
This is the s390 port of 70627654 "x86, extable: Switch to relative
exception table entries".
Reduces the size of our exception tables by 50% on 64 bit builds.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26 15:45:10 +02:00
Gerald Schaefer
34cda99260 s390/mm: implement __get_user_pages_fast()
This patch introduces the __get_user_pages_fast() function on s390,
which will be needed with CONFIG_TRANSPARENT_HUGEPAGE and futex.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26 15:45:08 +02:00
Heiko Carstens
d1b0d842c4 s390/mm: rename addressing_mode to s390_user_mode
Renaming the globally visible variable "user_mode" to "addressing_mode" in
order to fix a name clash was not a good idea. (Commit 37fe1d73 "s390/mm:
rename user_mode variable to addressing_mode")
Looking at the code after a couple of weeks one thinks: addressing mode of
what?
So rename the variable again. This time to s390_user_mode. Which hopefully
makes more sense.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26 15:45:05 +02:00
Heiko Carstens
1c725922dd s390/cpu hotplug: mask out CPU_TASKS_FROZEN in cu hotplug notifiers
Unify all our cpu hotplug notifiers to mask out the CPU_TASKS_FROZEN
bit, so we don't have to add all the *_FROZEN variant cases to the
notifiers.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26 15:44:53 +02:00
Gerald Schaefer
648609e3f2 s390: enable large page support with CONFIG_DEBUG_PAGEALLOC
So far, large page support was completely disabled with
CONFIG_DEBUG_PAGEALLOC, although it would be sufficient if only the
large page kernel mapping was disabled. This patch enables large page
support with CONFIG_DEBUG_PAGEALLOC, while it prevents the large kernel
mapping in that case.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-09-26 15:44:50 +02:00
Heiko Carstens
7d25617597 s390: make use of user_mode() macro where possible
We use the user_mode() helper already at several places but also
have the open coded variant at other places.
Convert the code to always use the helper function.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30 11:03:12 +02:00
Heiko Carstens
37fe1d73a4 s390/mm: rename user_mode variable to addressing_mode
Fix name clash with user_mode() define which is also used in common code.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30 11:03:11 +02:00
Heiko Carstens
008c2e8f24 s390/mm: fix fault handling for page table walk case
Make sure the kernel does not incorrectly create a SIGBUS signal during
user space accesses:

For user space accesses in the switched addressing mode case the kernel
may walk page tables and access user address space via the kernel
mapping. If a page table entry is invalid the function __handle_fault()
gets called in order to emulate a page fault and trigger all the usual
actions like paging in a missing page etc. by calling handle_mm_fault().

If handle_mm_fault() returns with an error fixup handling is necessary.
For the switched addressing mode case all errors need to be mapped to
-EFAULT, so that the calling uaccess function can return -EFAULT to
user space.

Unfortunately the __handle_fault() incorrectly calls do_sigbus() if
VM_FAULT_SIGBUS is set. This however should only happen if a page fault
was triggered by a user space instruction. For kernel mode uaccesses
the correct action is to only return -EFAULT.
So user space may incorrectly see SIGBUS signals because of this bug.

For current machines this would only be possible for the switched
addressing mode case in conjunction with futex operations.

Cc: stable@vger.kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30 11:03:09 +02:00
Heiko Carstens
f2c76e3b6f s390/mm: make page faults killable
This is the s390 variant of 37b23e05 "x86,mm: make pagefault killable".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-30 11:03:08 +02:00
Martin Schwidefsky
0f6f281b73 s390/mm: downgrade page table after fork of a 31 bit process
The downgrade of the 4 level page table created by init_new_context is
currently done only in start_thread31. If a 31 bit process forks the
new mm uses a 4 level page table, including the task size of 2<<42
that goes along with it. This is incorrect as now a 31 bit process
can map memory beyond 2GB. Define arch_dup_mmap to do the downgrade
after fork.

Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-26 16:24:14 +02:00
Heiko Carstens
a53c8fab3f s390/comments: unify copyright messages and remove file names
Remove the file name from the comment at top of many files. In most
cases the file name was wrong anyway, so it's rather pointless.

Also unify the IBM copyright statement. We did have a lot of sightly
different statements and wanted to change them one after another
whenever a file gets touched. However that never happened. Instead
people start to take the old/"wrong" statements to use as a template
for new files.
So unify all of them in one go.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-07-20 11:15:04 +02:00
Michael Holzheu
73bf463efa s390/kernel: Introduce memcpy_absolute() function
This patch introduces the new function memcpy_absolute() that allows to
copy memory using absolute addressing. This means that the prefix swap
does not apply when this function is used.

With this patch also all s390 kernel code that accesses absolute zero
now uses the new memcpy_absolute() function. The old and less generic
copy_to_absolute_zero() function is removed.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-30 09:04:49 +02:00
Heiko Carstens
f4815ac6c9 s390/headers: replace __s390x__ with CONFIG_64BIT where possible
Replace __s390x__ with CONFIG_64BIT in all places that are not exported
to userspace or guarded with #ifdef __KERNEL__.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-24 10:10:10 +02:00
Heiko Carstens
d49f47f83d s390/pfault: add sanity check
If the task that was found on an initial interrupt doesn't match the
current task execute a WARN_ON_ONCE() and don't put the task to sleep.

When this happened something went wrong between the interface of the
hypervisor and the kernel. In such a case keep the tasks alive to
avoid a hanging system.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:43 +02:00
Heiko Carstens
0a16ba7866 s390/pfault: use __set_task_state
Use __set_task_state() instead of set_task_state(). Saves a couple of
instructions, since the memory barrier is not needed here.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:42 +02:00
Heiko Carstens
54c2779122 s390/pfault: always search for task with reported pid
Make the code a bit more symmetric and always search for the task of the
reported pid. This simplifies the code a bit.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:42 +02:00
Heiko Carstens
d5e50a51cc s390/pfault: fix task state race
When setting the current task state to TASK_UNINTERRUPTIBLE this can
race with a different cpu. The other cpu could set the task state after
it inspected it (while it was still TASK_RUNNING) to TASK_RUNNING which
would change the state from TASK_UNINTERRUPTIBLE to TASK_RUNNING again.

This race was always present in the pfault interrupt code but didn't
cause anything harmful before commit f2db2e6c "[S390] pfault: cpu hotplug
vs missing completion interrupts" which relied on the fact that after
setting the task state to TASK_UNINTERRUPTIBLE the task would really
sleep.
Since this is not necessarily the case the result may be a list corruption
of the pfault_list or, as observed, a use-after-free bug while trying to
access the task_struct of a task which terminated itself already.

To fix this, we need to get a reference of the affected task when receiving
the initial pfault interrupt and add special handling if we receive yet
another initial pfault interrupt when the task is already enqueued in the
pfault list.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org> # needed for v3.0 and newer
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:42 +02:00
Christian Borntraeger
2739b6d124 s390/kvm: bad rss-counter state
commit c3f0327f8e
    mm: add rss counters consistency check
detected the following problem with kvm on s390:

BUG: Bad rss-counter state mm:00000004f73ef000 idx:0 val:-10
BUG: Bad rss-counter state mm:00000004f73ef000 idx:1 val:-5

We have to make sure that we accumulate all rss values into
the mm before we replace the mm to avoid triggering this (harmless)
bug message.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:41 +02:00
Gerald Schaefer
a686425b31 s390/hugepages: clear page table for sw large page emulation
The software large page emulation on s390 did not clear the the
pre-allocated page table in arch_release_hugepage() before freeing
it. This could trigger the WARN_ON(!pte_none(*pte) in mm/vmalloc.c:106
and make vmap_pte_range() fail, because the page table could be reused
in page_table_alloc(). This is fixed now by calling clear_table()
before page_table_free().

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:39 +02:00
Martin Schwidefsky
5e8010cb50 s390: replace TIF_SIE with PF_VCPU
Replace the check for TIF_SIE in the fault handler by a check for PF_VCPU.
With the last user of TIF_SIE gone we can now remove the bit.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:39 +02:00
Michael Holzheu
b2a68c2356 s390: allow absolute memory access for /dev/mem
Currently dev/mem for s390 provides only real memory access. This means
that the CPU prefix pages are swapped. The prefix swap for real memory
works as follows:

Each CPU owns a prefix register that points to a page aligned memory
location "P". If this CPU accesses the address range [0,0x1fff], it is
translated by the hardware to [P,P+0x1fff]. Accordingly if this CPU
accesses the address range [P,P+0x1fff], it is translated by the hardware
to [0,0x1fff].  Therefore, if [P,P+0x1fff] or [0,0x1fff] is read from
the current /dev/mem device, the incorrectly swapped memory content is
returned.

With this patch the /dev/mem architecture code is modified to provide
absolute memory access. This is done via the arch specific functions
xlate_dev_mem_ptr() and unxlate_dev_mem_ptr(). For swapped pages on
s390 the function xlate_dev_mem_ptr() now returns a new buffer with a
copy of the requested absolute memory. In case the buffer was allocated,
the unxlate_dev_mem_ptr() function frees it after /dev/mem code has
called copy_to_user().

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-05-16 14:42:38 +02:00
Martin Schwidefsky
cd94154cc6 [S390] fix tlb flushing for page table pages
Git commit 36409f6353 "use generic RCU
page-table freeing code" introduced a tlb flushing bug. Partially revert
the above git commit and go back to s390 specific page table flush code.

For s390 the TLB can contain three types of entries, "normal" TLB
page-table entries, TLB combined region-and-segment-table (CRST) entries
and real-space entries. Linux does not use real-space entries which
leaves normal TLB entries and CRST entries. The CRST entries are
intermediate steps in the page-table translation called translation paths.
For example a 4K page access in a three-level page table setup will
create two CRST TLB entries and one page-table TLB entry. The advantage
of that approach is that a page access next to the previous one can reuse
the CRST entries and needs just a single read from memory to create the
page-table TLB entry. The disadvantage is that the TLB flushing rules are
more complicated, before any page-table may be freed the TLB needs to be
flushed.

In short: the generic RCU page-table freeing code is incorrect for the
CRST entries, in particular the check for mm_users < 2 is troublesome.

This is applicable to 3.0+ kernels.

Cc: <stable@vger.kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11 14:28:24 +02:00
Michael Holzheu
b785e0d06a [S390] kernel: Use local_irq_save() for memcpy_real()
Currently in the memcpy_real() function interrupts are disabled with
__arch_local_irq_stnsm(). In order to notify lockdep that interrupts
are disabled, with this patch local_irq_save() is used instead. The
function __arch_local_irq_stnsm() is still used for switching to
real mode.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11 14:28:24 +02:00
Linus Torvalds
0195c00244 Disintegrate and delete asm/system.h
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk
 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m
 u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB
 ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m
 rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl
 eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45
 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+
 /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9
 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK
 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR
 FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD
 NypQthI85pc=
 =G9mT
 -----END PGP SIGNATURE-----

Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system

Pull "Disintegrate and delete asm/system.h" from David Howells:
 "Here are a bunch of patches to disintegrate asm/system.h into a set of
  separate bits to relieve the problem of circular inclusion
  dependencies.

  I've built all the working defconfigs from all the arches that I can
  and made sure that they don't break.

  The reason for these patches is that I recently encountered a circular
  dependency problem that came about when I produced some patches to
  optimise get_order() by rewriting it to use ilog2().

  This uses bitops - and on the SH arch asm/bitops.h drags in
  asm-generic/get_order.h by a circuituous route involving asm/system.h.

  The main difficulty seems to be asm/system.h.  It holds a number of
  low level bits with no/few dependencies that are commonly used (eg.
  memory barriers) and a number of bits with more dependencies that
  aren't used in many places (eg.  switch_to()).

  These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

        Move memory barriers here.  This already done for MIPS and Alpha.

    (2) asm/switch_to.h

        Move switch_to() and related stuff here.

    (3) asm/exec.h

        Move arch_align_stack() here.  Other process execution related bits
        could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

        Move xchg() and cmpxchg() here as they're full word atomic ops and
        frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

        Move die() and related bits.

    (6) asm/auxvec.h

        Move AT_VECTOR_SIZE_ARCH here.

  Other arch headers are created as needed on a per-arch basis."

Fixed up some conflicts from other header file cleanups and moving code
around that has happened in the meantime, so David's testing is somewhat
weakened by that.  We'll find out anything that got broken and fix it..

* tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
  Delete all instances of asm/system.h
  Remove all #inclusions of asm/system.h
  Add #includes needed to permit the removal of asm/system.h
  Move all declarations of free_initmem() to linux/mm.h
  Disintegrate asm/system.h for OpenRISC
  Split arch_align_stack() out from asm-generic/system.h
  Split the switch_to() wrapper out of asm-generic/system.h
  Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
  Create asm-generic/barrier.h
  Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
  Disintegrate asm/system.h for Xtensa
  Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
  Disintegrate asm/system.h for Tile
  Disintegrate asm/system.h for Sparc
  Disintegrate asm/system.h for SH
  Disintegrate asm/system.h for Score
  Disintegrate asm/system.h for S390
  Disintegrate asm/system.h for PowerPC
  Disintegrate asm/system.h for PA-RISC
  Disintegrate asm/system.h for MN10300
  ...
2012-03-28 15:58:21 -07:00
David Howells
a0616cdebc Disintegrate asm/system.h for S390
Disintegrate asm/system.h for S390.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-s390@vger.kernel.org
2012-03-28 18:30:02 +01:00
Ben Hutchings
8ea7fddb2d [S390] Remove unncessary export of arch_pick_mmap_layout
This function is defined for use in exec, not in modules.
No other architecture exports its implementation.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-23 11:13:23 +01:00
Heiko Carstens
fde15c3a3a [S390] irq: external interrupt code passing
The external interrupt handlers have a parameter called ext_int_code.
Besides the name this paramter does not only contain the ext_int_code
but in addition also the "cpu address" (POP) which caused the external
interrupt.
To make the code a bit more obvious pass a struct instead so the called
function can easily distinguish between external interrupt code and
cpu address. The cpu address field however is named "subcode" since
some external interrupt sources do not pass a cpu address but a
different parameter (or none at all).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-03-11 11:59:29 -04:00
Linus Torvalds
6bba07c613 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  [S390] memory hotplug: prevent memory zone interleave
  [S390] crash_dump: remove duplicate include
  [S390] KEYS: Enable the compat keyctl wrapper on s390x
2012-03-01 18:22:55 -08:00
Heiko Carstens
048cd4e51d compat: fix compile breakage on s390
The new is_compat_task() define for the !COMPAT case in
include/linux/compat.h conflicts with a similar define in
arch/s390/include/asm/compat.h.

This is the minimal patch which fixes the build issues.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-27 07:54:27 -08:00
Gerald Schaefer
892365ab4d [S390] memory hotplug: prevent memory zone interleave
This fixes a kernel oops with CONFIG_DEBUG_VM triggered by a
VM_BUG_ON(bad_range()): kernel BUG at mm/page_alloc.c:748.

With memory hotplug on System z, it is possible that the memory
online/offline state is preserved over a system restart, e.g. there
may be offline memory blocks in ZONE_DMA or ZONE_NORMAL. So far,
the offline memory range has always been added to ZONE_MOVABLE during
system start, so that it was possible to have ZONE_MOVABLE interleave
with ZONE_DMA or ZONE_NORMAL. This patch fixes that by checking for
zone overlap before adding memory.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-02-24 18:01:36 +01:00
Martin Schwidefsky
2320c57937 [S390] incorrect PageTables counter for kvm page tables
The page_table_free_pgste function is used for kvm processes to free page
tables that have the pgste extension. It calls pgtable_page_ctor instead of
pgtable_page_dtor which increases NR_PAGETABLE instead of decreasing it.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-02-17 10:29:33 +01:00
Martin Schwidefsky
aa33c8cbba [S390] cleanup trap handling
Move the program interruption code and the translation exception identifier
to the pt_regs structure as 'int_code' and 'int_parm_long' and make the
first level interrupt handler in entry[64].S store the two values. That
makes it possible to drop 'prot_addr' and 'trap_no' from the thread_struct
and to reduce the number of arguments to a lot of functions. Finally
un-inline do_trap. Overall this saves 5812 bytes in the .text section of
the 64 bit kernel.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:12 +01:00
Carsten Otte
f32269a0d0 [S390] disable MACHINE_IS_VM check for pfault
This patch disables the check for MACHINE_IS_VM when initializing the
pfault infrastructure. The code checks for successful completion of
diag 258 anyway, thus it's safe to try initialization on LPAR anyway.
This is needed to use pfault on kvm

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:10 +01:00
Martin Schwidefsky
14045ebf1e [S390] add support for physical memory > 4TB
The kernel address space of a 64 bit kernel currently uses a three level
page table and the vmemmap array has a fixed address and a fixed maximum
size. A three level page table is good enough for systems with less than
3.8TB of memory, for bigger systems four page table levels need to be
used. Each page table level costs a bit of performance, use 3 levels for
normal systems and 4 levels only for the really big systems.
To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
maximum of 64TB of memory.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:27:10 +01:00
Christian Borntraeger
c86cce2a20 [S390] kvm: fix sleeping function ... at mm/page_alloc.c:2260
commit cc772456ac
    [S390] fix list corruption in gmap reverse mapping

added a potential dead lock:

BUG: sleeping function called from invalid context at mm/page_alloc.c:2260
in_atomic(): 1, irqs_disabled(): 0, pid: 1108, name: qemu-system-s39
3 locks held by qemu-system-s39/1108:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
 #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
 #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298
CPU: 0 Not tainted 3.1.3 #45
Process qemu-system-s39 (pid: 1108, task: 00000004f8b3cb30, ksp: 00000004fd5978d0)
00000004fd5979a0 00000004fd597920 0000000000000002 0000000000000000
       00000004fd5979c0 00000004fd597938 00000004fd597938 0000000000617e96
       0000000000000000 00000004f8b3cf58 0000000000000000 0000000000000000
       000000000000000d 000000000000000c 00000004fd597988 0000000000000000
       0000000000000000 0000000000100a18 00000004fd597920 00000004fd597960
Call Trace:
([<0000000000100926>] show_trace+0xee/0x144)
 [<0000000000131f3a>] __might_sleep+0x12a/0x158
 [<0000000000217fb4>] __alloc_pages_nodemask+0x224/0xadc
 [<0000000000123086>] gmap_alloc_table+0x46/0x114
 [<000000000012395c>] gmap_map_segment+0x268/0x298
 [<000003e00486b014>] kvm_arch_commit_memory_region+0x44/0x6c [kvm]
 [<000003e004866414>] __kvm_set_memory_region+0x3b0/0x4a4 [kvm]
 [<000003e004866554>] kvm_set_memory_region+0x4c/0x6c [kvm]
 [<000003e004867c7a>] kvm_vm_ioctl+0x14a/0x314 [kvm]
 [<0000000000292100>] do_vfs_ioctl+0x94/0x588
 [<0000000000292688>] SyS_ioctl+0x94/0xac
 [<000000000061e124>] sysc_noemu+0x22/0x28
 [<000003fffcd5e7ca>] 0x3fffcd5e7ca
3 locks held by qemu-system-s39/1108:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
 #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
 #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298

Fix this by freeing the lock on the alloc path. This is ok, since the
gmap table is never freed until we call gmap_free, so the table we are
walking cannot go.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-12-27 11:25:48 +01:00
Heiko Carstens
fa2fb2f4a5 [S390] pfault: ignore leftover completion interrupts
Ignore completion interrupts if the initial interrupt hasn't been
received and the addressed task is not running. This case can only
happen if leftover (pending) completion interrupt gets delivered
which wasn't removed with the PFAULT CANCEL operation during cpu
hotplug.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-11-14 11:19:08 +01:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Andrea Arcangeli
b35a35b556 thp: share get_huge_page_tail()
This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Andrea Arcangeli
0693bc9ce2 s390: gup_huge_pmd() return 0 if pte changes
s390 didn't return 0 in that case, if it's rolling back the *nr pointer it
should also return zero to avoid adding pages to the array at the wrong
offset.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Andrea Arcangeli
220a2eb228 s390: gup_huge_pmd() support THP tail recounting
Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Heiko Carstens
3a4c5d5964 s390: add missing module.h/export.h includes
Fix several compile errors on s390 caused by splitting module.h.

Some include additions [e.g. qdio_setup.c, zfcp_qdio.c] are in
anticipation of pending changes queued for s390 that increase
the modular use footprint.

[PG: added additional obvious changes since Heiko's original patch]

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:58 -04:00
Martin Schwidefsky
638ad34a88 [S390] sparse: fix sparse warnings about missing prototypes
Add prototypes and includes for functions used in different modules.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:46 +01:00
Christian Borntraeger
388186bc92 [S390] kvm: Handle diagnose 0x10 (release pages)
Linux on System z uses a ballooner based on diagnose 0x10. (aka as
collaborative memory management). This patch implements diagnose
0x10 on the guest address space.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:45 +01:00
Carsten Otte
499069e1a4 [S390] take mmap_sem when walking guest page table
gmap_fault needs to walk the guest page table. However, parts of
that may change if some other thread does munmap. In that case
gmap_unmap_notifier will also unmap the corresponding parts from
the guest page table. We need to take mmap_sem in order to serialize
these operations.
do_exception now calls __gmap_fault with mmap_sem held which does
not get exported to modules. The exported function, which is called
from KVM, now takes mmap_sem.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:45 +01:00
Carsten Otte
cc772456ac [S390] fix list corruption in gmap reverse mapping
This introduces locking via mm->page_table_lock to protect
the rmap list for guest mappings from being corrupted by concurrent
operations.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Carsten Otte
a9162f238a [S390] fix possible deadlock in gmap_map_segment
Fix possible deadlock reported by lockdep:
qemu-system-s39/2963 is trying to acquire lock:
(&mm->mmap_sem){++++++}, at: gmap_alloc_table+0x9c/0x120
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: gmap_map_segment+0xa6/0x27c

Actually gmap_alloc_table is the only called in gmap_map_segment with
mmap_sem held, thus it's safe to simply remove the inner lock.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:44 +01:00
Martin Schwidefsky
b50511e41a [S390] cleanup psw related bits and pieces
Split out addressing mode bits from PSW_BASE_BITS, rename PSW_BASE_BITS
to PSW_MASK_BASE, get rid of psw_user32_bits, remove unused function
enabled_wait(), introduce PSW_MASK_USER, and drop PSW_MASK_MERGE macros.
Change psw_kernel_bits / psw_user_bits to contain only the bits that
are always set in the respective mode.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:43 +01:00
Martin Schwidefsky
ccf45cafb0 [S390] addressing mode limits and psw address wrapping
An instruction with an address right below the adress limit for the
current addressing mode will wrap. The instruction restart logic in
the protection fault handler and the signal code need to follow the
wrapping rules to find the correct instruction address.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:43 +01:00
Michael Holzheu
60a0c68df2 [S390] kdump backend code
This patch provides the architecture specific part of the s390 kdump
support.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:42 +01:00
Michael Holzheu
7f0bf656c6 [S390] Add real memory access functions
Add access function for real memory needed by s390 kdump backend.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:42 +01:00
Martin Schwidefsky
e73b7fffe4 [S390] memory leak with RCU_TABLE_FREE
The rcu page table free code uses a couple of bits in the page table
pointer passed to tlb_remove_table to discern the different page table
types. __tlb_remove_table extracts the type with an incorrect mask which
leads to memory leaks. The correct mask is ((FRAG_MASK << 4) | FRAG_MASK).

Cc: stable@kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-10-30 15:16:15 +01:00
Carsten Otte
05873df981 [S390] gmap: always up mmap_sem properly
If gmap_unmap_segment figures that the segment was not mapped in the
first place, it need to up mmap_sem on exit.

Cc: <stable@kernel.org>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-09-26 16:40:50 +02:00
Christian Borntraeger
480e5926ce [S390] kvm: fix address mode switching
598841ca99 ([S390] use gmap address
spaces for kvm guest images) changed kvm to use a separate address
space for kvm guests. This address space was switched in __vcpu_run
In some cases (preemption, page fault) there is the possibility that
this address space switch is lost.
The typical symptom was a huge amount of validity intercepts or
random guest addressing exceptions.
Fix this by doing the switch in sie_loop and sie_exit and saving the
address space in the gmap structure itself. Also use the preempt
notifier.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2011-09-20 17:07:34 +02:00
Michael Holzheu
7dd6b3343f [S390] Add PSW restart shutdown trigger
With this patch a new S390 shutdown trigger "restart" is added. If under
z/VM "systerm restart" is entered or under the HMC the "PSW restart" button
is pressed, the PSW located at 0 (31 bit) or 0x1a0 (64 bit) bit is loaded.
Now we execute do_restart() that processes the restart action that is
defined under /sys/firmware/shutdown_actions/on_restart. Currently the
following actions are possible: reipl (default), stop, vmcmd, dump, and
dump_reipl.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2011-08-03 16:44:19 +02:00
Jan Glauber
944291de33 [S390] missing return in page_table_alloc_pgste
Fix the following compile warning for !CONFIG_PGSTE:

  CC      arch/s390/mm/pgtable.o
arch/s390/mm/pgtable.c: In function ‘page_table_alloc_pgste’:
arch/s390/mm/pgtable.c:531:1: warning: no return statement in function returning non-void [-Wreturn-type]

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2011-08-03 16:44:19 +02:00
Martin Schwidefsky
e5992f2e6c [S390] kvm guest address space mapping
Add code that allows KVM to control the virtual memory layout that
is seen by a guest. The guest address space uses a second page table
that shares the last level pte-tables with the process page table.
If a page is unmapped from the process page table it is automatically
unmapped from the guest page table as well.

The guest address space mapping starts out empty, KVM can map any
individual 1MB segments from the process virtual memory to any 1MB
aligned location in the guest virtual memory. If a target segment in
the process virtual memory does not exist or is unmapped while a
guest mapping exists the desired target address is stored as an
invalid segment table entry in the guest page table.
The population of the guest page table is fault driven.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24 10:48:21 +02:00
Peter Zijlstra
a8b0ca17b8 perf: Remove the nmi parameter from the swevent and overflow interface
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

  - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
  - tracepoint: nmi=0; since tracepoint could be from NMI context.
  - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01 11:06:35 +02:00
Martin Schwidefsky
36409f6353 [S390] use generic RCU page-table freeing code
Replace the s390 specific rcu page-table freeing code with the
generic variant. This requires to duplicate the definition for the
struct mmu_table_batch as s390 does not use the generic tlb flush
code.

While we are at it remove the restriction that page table fragments
can not be reused after a single fragment has been freed with rcu
and split out allocation and freeing of page tables with pgstes.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-06-06 14:14:56 +02:00
Heiko Carstens
3c5cffb66d [S390] mm: fix mmu_gather rework
Quite a few functions that get called from the tlb gather code require that
preemption must be disabled. So disable preemption inside of the called
functions instead.
The only drawback is that rcu_table_freelist_finish() doesn't get necessarily
called on the cpu(s) that filled the free lists. So we may see a delay, until
we finally see an rcu callback. However over time this shouldn't matter.

So we get rid of lots of "BUG: using smp_processor_id() in preemptible"
messages.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2011-05-29 12:40:51 +02:00
Linus Torvalds
c4a227d89f Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
  perf: Fix SIGIO handling
  perf top: Don't stop if no kernel symtab is found
  perf top: Handle kptr_restrict
  perf top: Remove unused macro
  perf events: initialize fd array to -1 instead of 0
  perf tools: Make sure kptr_restrict warnings fit 80 col terms
  perf tools: Fix build on older systems
  perf symbols: Handle /proc/sys/kernel/kptr_restrict
  perf: Remove duplicate headers
  ftrace: Add internal recursive checks
  tracing: Update btrfs's tracepoints to use u64 interface
  tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine
  ftrace: Set ops->flag to enabled even on static function tracing
  tracing: Have event with function tracer check error return
  ftrace: Have ftrace_startup() return failure code
  jump_label: Check entries limit in __jump_label_update
  ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM
  scripts/tags.sh: Add magic for trace-events for etags too
  scripts/tags.sh: Fix ctags for DEFINE_EVENT()
  x86/ftrace: Fix compiler warning in ftrace.c
  ...
2011-05-28 12:55:55 -07:00
Ingo Molnar
d6a72fe465 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent 2011-05-27 14:28:09 +02:00
Heiko Carstens
69dbb2f79a [S390] mm: add ZONE_DMA to 31-bit config again
Add ZONE_DMA to 31-bit config again. The performance gain is minimal
and hardly anybody cares anymore about a 31-bit kernel.
So add ZONE_DMA again to help with SLAB_CACHE_DMA removal for
!CONFIG_ZONE_DMA configurations.

Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2011-05-26 09:48:25 +02:00
Heiko Carstens
33ce614029 [S390] mm: add page fault retry handling
s390 arch backend for d065bd81 "mm: retry page fault when blocking on
disk transfer".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2011-05-26 09:48:25 +02:00
Heiko Carstens
99583181cb [S390] mm: handle kernel caused page fault oom situations
If e.g. copy_from_user() generates a page fault and the kernel runs
into an OOM situation the system might lock up.
If the OOM killer sends a SIG_KILL to the current process it can't
handle it since it is stuck in a copy_from_user() - page fault loop.

Fix this by adding the same fix as other architectures have.

E.g. the x86 variant f86268 "x86/mm: Handle mm_fault_error() in kernel
space"

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2011-05-26 09:48:25 +02:00
Heiko Carstens
d7b250e2a2 [S390] irq: merge irq.c and s390_ext.c
Merge irq.c and s390_ext.c into irq.c. That way all external interrupt
related functions are together.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-26 09:48:24 +02:00
Heiko Carstens
df7997ab1c [S390] irq: fix service signal external interrupt handling
Interrupt sources like pfault, sclp, dasd_diag and virtio all use the
service signal external interrupt subclass mask in control register 0
to enable and disable the corresponding interrupt.
Because no reference counting is implemented each subsystem thinks it
is the only user of subclass and sets and clears the bit like it wants.
This leads to case that unloading the dasd diag module under z/VM
causes both sclp and pfault interrupts to be masked. The result will
be locked up system sooner or later.
Fix this by introducing a new way to set (register) and clear
(unregister) the service signal subclass mask bit in cr0.
Also convert all drivers.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-26 09:48:24 +02:00
Heiko Carstens
902050bcde [S390] pfault: always enable service signal interrupt
Always enable the service signal subclass mask bit in cr0, if pfault
is available. That way we use the normal cpu hotplug way to propagate
the subclass mask bit in cr0 instead of open coding it.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-26 09:48:24 +02:00
Steven Rostedt
f29c50419c maccess,probe_kernel: Make write/read src const void *
The functions probe_kernel_write() and probe_kernel_read() do not modify
the src pointer. Allow const pointers to be passed in without the need
of a typecast.

Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1305824936.1465.4.camel@gandalf.stny.rr.com
2011-05-25 19:56:23 -04:00
Peter Zijlstra
1c39517696 mm: now that all old mmu_gather code is gone, remove the storage
Fold all the mmu_gather rework patches into one for submission

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:16 -07:00
Martin Schwidefsky
b2fa47e6bf [S390] refactor page table functions for better pgste support
Rework the architecture page table functions to access the bits in the
page table extension array (pgste). There are a number of changes:
1) Fix missing pgste update if the attach_count for the mm is <= 1.
2) For every operation that affects the invalid bit in the pte or the
   rcp byte in the pgste the pcl lock needs to be acquired. The function
   pgste_get_lock gets the pcl lock and returns the current pgste value
   for a pte pointer. The function pgste_set_unlock stores the pgste
   and releases the lock. Between these two calls the bits in the pgste
   can be shuffled.
3) Define two software bits in the pte _PAGE_SWR and _PAGE_SWC to avoid
   calling SetPageDirty and SetPageReferenced from pgtable.h. If the
   host reference backup bit or the host change backup bit has been
   set the dirty/referenced state is transfered to the pte. The common
   code will pick up the state from the pte.
4) Add ptep_modify_prot_start and ptep_modify_prot_commit for mprotect.
5) Remove pgd_populate_kernel, pud_populate_kernel, pmd_populate_kernel
   pgd_clear_kernel, pud_clear_kernel, pmd_clear_kernel and ptep_invalidate.
6) Rename kvm_s390_test_and_clear_page_dirty to
   ptep_test_and_clear_user_dirty and add ptep_test_and_clear_user_young.
7) Define mm_exclusive() and mm_has_pgste() helper to improve readability.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23 10:24:31 +02:00
Heiko Carstens
7dd8fe1f91 [S390] pfault: cleanup code
Small code cleanup.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23 10:24:30 +02:00
Heiko Carstens
f2db2e6cb3 [S390] pfault: cpu hotplug vs missing completion interrupts
On cpu hot remove a PFAULT CANCEL command is sent to the hypervisor
which in turn will cancel all outstanding pfault requests that have
been issued on that cpu (the same happens with a SIGP cpu reset).

The result is that we end up with uninterruptible processes where
the interrupt that would wake up these processes never arrives.

In order to solve this all processes which wait for a pfault
completion interrupt get woken up after a cpu hot remove. The worst
case that could happen is that they fault again and in turn need to
wait again.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23 10:24:29 +02:00
Heiko Carstens
89db4df160 [S390] extmem: get rid of compile warning
Get rid of these:

arch/s390/mm/extmem.c: In function 'segment_modify_shared':
arch/s390/mm/extmem.c:622:3: warning: 'end_addr' may be used uninitialized in this function [-Wuninitialized]
arch/s390/mm/extmem.c:627:18: warning: 'start_addr' may be used uninitialized in this function [-Wuninitialized]
arch/s390/mm/extmem.c: In function 'segment_load':
arch/s390/mm/extmem.c:481:11: warning: 'end_addr' may be used uninitialized in this function [-Wuninitialized]
arch/s390/mm/extmem.c:480:18: warning: 'start_addr' may be used uninitialized in this function [-Wuninitialized]

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23 10:24:29 +02:00
Heiko Carstens
7712f83aa9 [S390] get rid of unused variables
Remove trivially unused variables as detected with -Wunused-but-set-variable.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23 10:24:28 +02:00
Martin Schwidefsky
043d07084b [S390] Remove data execution protection
The noexec support on s390 does not rely on a bit in the page table
entry but utilizes the secondary space mode to distinguish between
memory accesses for instructions vs. data. The noexec code relies
on the assumption that the cpu will always use the secondary space
page table for data accesses while it is running in the secondary
space mode. Up to the z9-109 class machines this has been the case.
Unfortunately this is not true anymore with z10 and later machines.
The load-relative-long instructions lrl, lgrl and lgfrl access the
memory operand using the same addressing-space mode that has been
used to fetch the instruction.
This breaks the noexec mode for all user space binaries compiled
with march=z10 or later. The only option is to remove the current
noexec support.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23 10:24:28 +02:00
Jan Glauber
448694a1d5 module: undo module RONX protection correctly.
While debugging I stumbled over two problems in the code that protects module
pages.

First issue is that disabling the protection before freeing init or unload of
a module is not symmetric with the enablement. For instance, if pages are set
to RO the page range from module_core to module_core + core_ro_size is
protected. If a module is unloaded the page range from module_core to
module_core + core_size is set back to RW.
So pages that were not set to RO are also changed to RW.
This is not critical but IMHO it should be symmetric.

Second issue is that while set_memory_rw & set_memory_ro are used for
RO/RW changes only set_memory_nx is involved for NX/X. One would await that
the inverse function is called when the NX protection should be removed,
which is not the case here, unless I'm missing something.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2011-05-19 16:55:26 +09:30
Michael Holzheu
83ace2701b [S390] replace diag10() with diag10_range() function
Currently the diag10() function can only release one page. For exploiters
that have to call diag10 on a contiguous memory region this is suboptimal.
This patch replaces the diag10() function with diag10_range() that is
able to release multiple pages. In addition to that the new function now
allows to release memory with addresses higher than 2047 MiB. This was
due to a restriction of the diagnose implementation under z/VM prior to
release 5.2.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-10 17:13:43 +02:00
Heiko Carstens
a985183285 [S390] irqstats: fix counting of pfault, dasd diag and virtio irqs
pfault, dasd diag and virtio all use the same external interrupt number.
The respective interrupt handlers decide by the subcode if they are
meant to handle the interrupt.
Counting is currently done before looking at the subcode which means
each handler counts an interrupt even if it is not handling it.
Fix this by moving the kstat code after the code which looks at the
subcode.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-04-29 10:42:25 +02:00
Heiko Carstens
e35c76cd47 [S390] pfault: fix token handling
f6649a7e "[S390] cleanup lowcore access from external interrupts" changed
handling of external interrupts. Instead of letting the external interrupt
handlers accessing the per cpu lowcore the entry code of the kernel reads
already all fields that are necessary and passes them to the handlers.
The pfault interrupt handler was incorrectly converted. It tries to
dereference a value which used to be a pointer to a lowcore field. After
the conversion however it is not anymore the pointer to the field but its
content. So instead of a dereference only a cast is needed to get the
task pointer that caused the pfault.

Fixes a NULL pointer dereference and a subsequent kernel crash:

Unable to handle kernel pointer dereference at virtual kernel address (null)
Oops: 0004 [#1] SMP
Modules linked in: nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
                   loop qeth_l3 qeth vmur ccwgroup ext3 jbd mbcache dm_mod
                   dasd_eckd_mod dasd_diag_mod dasd_mod
CPU: 0 Not tainted 2.6.38-2-s390x #1
Process cron (pid: 1106, task: 000000001f962f78, ksp: 000000001fa0f9d0)
Krnl PSW : 0404200180000000 000000000002c03e (pfault_interrupt+0xa2/0x138)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
Krnl GPRS: 0000000000000000 0000000000000001 0000000000000000 0000000000000001
           000000001f962f78 0000000000518968 0000000090000002 000000001ff03280
           0000000000000000 000000000064f000 000000001f962f78 0000000000002603
           0000000006002603 0000000000000000 000000001ff7fe68 000000001ff7fe48
Krnl Code: 000000000002c036: 5820d010            l       %r2,16(%r13)
           000000000002c03a: 1832                lr      %r3,%r2
           000000000002c03c: 1a31                ar      %r3,%r1
          >000000000002c03e: ba23d010            cs      %r2,%r3,16(%r13)
           000000000002c042: a744fffc            brc     4,2c03a
           000000000002c046: a7290002            lghi    %r2,2
           000000000002c04a: e320d0000024        stg     %r2,0(%r13)
           000000000002c050: 07f0                bcr     15,%r0
Call Trace:
 ([<000000001f962f78>] 0x1f962f78)
  [<000000000001acda>] do_extint+0xf6/0x138
  [<000000000039b6ca>] ext_no_vtime+0x30/0x34
  [<000000007d706e04>] 0x7d706e04
Last Breaking-Event-Address:
  [<0000000000000000>] 0x0

For stable maintainers:
the first kernel which contains this bug is 2.6.37.

Reported-by: Stephen Powell <zlinuxman@wowway.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-04-20 10:15:44 +02:00
Jan Glauber
e4c031b4f2 [S390] fix page table walk for changing page attributes
The page table walk for changing page attributes used the wrong
address for pgd/pud/pmd lookups if the range was bigger than
a pmd entry. Fix the lookup by using the correct address.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-04-20 10:15:43 +02:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Jan Glauber
305b152325 [S390] Write protect module text and RO data
Implement write protection for kernel modules text and read-only
data sections by implementing set_memory_[ro|rw] on s390.
Since s390 has no execute bit in the pte's NX is not supported.

set_memory_[ro|rw] will only work on normal pages and not on
large pages, so in case a large page should be modified bail
out with a warning.

Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-03-15 17:08:23 +01:00
Martin Schwidefsky
f1be77bb21 [S390] pgtable_list corruption
After page_table_free_rcu removed a page from the pgtable_list
page_table_free better not add it again. Otherwise a page_table_alloc
can reuse a page table fragment that is still in the rcu process.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-31 11:30:20 +01:00
Heiko Carstens
df1ca53cba [S390] Randomize mmap start address
Randomize mmap start address with 8MB.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-12 09:55:25 +01:00
Heiko Carstens
1060f62ea4 [S390] Rearrange mmap.c
Shuffle code around so it looks more like x86 and powerpc.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-12 09:55:25 +01:00
Heiko Carstens
7e0d48574e [S390] Enable flexible mmap layout for 64 bit processes
Historically 64 bit processes use the legacy address layout. However
there is no reason why 64 bit processes shouldn't benefit from the
flexible mmap layout advantages.
Therefore just enable it.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-12 09:55:24 +01:00
Heiko Carstens
9e78a13bfb [S390] reduce miminum gap between stack and mmap_base
Reduce minimum gap between stack and mmap_base to 32MB. That way there
is a bit more space for heap and mmap for tight 31 bit address spaces.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-12 09:55:24 +01:00
Heiko Carstens
9046e401e7 [S390] mmap: consider stack address randomization
Consider stack address randomization when calulating mmap_base for
flexible mmap layout . Because of address randomization the stack
address can be up to 8MB lower than STACK_TOP.
When calculating mmap_base this isn't taken into account, which could
lead to the case that the gap between the real stack top and mmap_base
is lower than what ulimit specifies for the maximum stack size.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-12 09:55:24 +01:00
Martin Schwidefsky
5e9a26928f [S390] ptrace cleanup
Overhaul program event recording and the code dealing with the ptrace
user space interface.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-05 12:47:31 +01:00
Heiko Carstens
fb0a9d7e86 [S390] pfault: delay register of pfault interrupt
Use an early init call to initialize pfault. That way it is possible to
use the register_external_interrupt() instead of the early variant.
No need to enable pfault any earlier since it has only effect if user
space processes are running.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-05 12:47:26 +01:00
Heiko Carstens
052ff461c8 [S390] irq: have detailed statistics for interrupt types
Up to now /proc/interrupts only has statistics for external and i/o
interrupts but doesn't split up them any further.
This patch adds a line for every single interrupt source so that it
is possible to easier tell what the machine is/was doing.
Part of the output now looks like this;

           CPU0       CPU2       CPU4
EXT:       3898       4232       2305
I/O:        782        315        245
CLK:       1029       1964        727   [EXT] Clock Comparator
IPI:       2868       2267       1577   [EXT] Signal Processor
TMR:          0          0          0   [EXT] CPU Timer
TAL:          0          0          0   [EXT] Timing Alert
PFL:          0          0          0   [EXT] Pseudo Page Fault
[...]
NMI:          0          1          1   [NMI] Machine Checks

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-01-05 12:47:25 +01:00
Martin Schwidefsky
25591b0703 [S390] fix get_user_pages_fast
The check for the _PAGE_RO bit in get_user_pages_fast for write==1 is
the wrong way around. It must not be set for the fast path.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-11-10 10:05:53 +01:00
Martin Schwidefsky
14375bc4eb [S390] cleanup facility list handling
Store the facility list once at system startup with stfl/stfle and
reuse the result for all facility tests.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:21 +02:00
Christian Borntraeger
e05ef9bdb8 [S390] kvm: Fix badness at include/asm/mmu_context.h:83
commit 050eef364a
    [S390] fix tlb flushing vs. concurrent /proc accesses
broke KVM on s390x. On every schedule a
Badness at include/asm/mmu_context.h:83 appears. s390_enable_sie
replaces the mm on the __running__ task, therefore, we have to
increase the attach count of the new mm.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:20 +02:00
Martin Schwidefsky
f6649a7e5a [S390] cleanup lowcore access from external interrupts
Read external interrupts parameters from the lowcore in the first
level interrupt handler in entry[64].S.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:19 +02:00
Martin Schwidefsky
1e54622e04 [S390] cleanup lowcore access from program checks
Read all required fields for program checks from the lowcore in the
first level interrupt handler in entry[64].S. If the context that
caused the fault was enabled for interrupts we can now re-enable the
irqs in entry[64].S.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:19 +02:00
Martin Schwidefsky
36bf96801e [S390] fix SIGBUS handling
Raise SIGBUS with a siginfo structure. Deliver BUS_ADRERR as si_code and
the address of the fault in the si_addr field.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:19 +02:00
Heiko Carstens
a20852d2b7 [S390] cmm: fix crash on case conversion
When the cmm module is compiled into the kernel it will crash when
writing to the R/O data section.
Reason is the lower to upper case conversion of the "sender" module
parameter which ignored the fact that the pointer is preinitialized.

Introduced with 41b42876 "cmm, smsgiucv_app: convert sender to
uppercase"

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:17 +02:00
Martin Schwidefsky
92f842eac7 [S390] store indication fault optimization
Use the store indication bit in the translation exception code on
page faults to avoid the protection faults that immediatly follow
the page fault if the access has been a write.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:15 +02:00
Martin Schwidefsky
80217147a3 [S390] lockless get_user_pages_fast()
Implement get_user_pages_fast without locking in the fastpath on s390.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:15 +02:00
Martin Schwidefsky
238ec4efee [S390] zero page cache synonyms
If the zero page is mapped to virtual user space addresses that differ
only in bit 2^12 or 2^13 we get L1 cache synonyms which can affect
performance. Follow the mips model and use multiple zero pages to avoid
the synonyms.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25 16:10:14 +02:00
David Howells
df9ee29270 Fix IRQ flag handling naming
Fix the IRQ flag handling naming.  In linux/irqflags.h under one configuration,
it maps:

	local_irq_enable() -> raw_local_irq_enable()
	local_irq_disable() -> raw_local_irq_disable()
	local_irq_save() -> raw_local_irq_save()
	...

and under the other configuration, it maps:

	raw_local_irq_enable() -> local_irq_enable()
	raw_local_irq_disable() -> local_irq_disable()
	raw_local_irq_save() -> local_irq_save()
	...

This is quite confusing.  There should be one set of names expected of the
arch, and this should be wrapped to give another set of names that are expected
by users of this facility.

Change this to have the arch provide:

	flags = arch_local_save_flags()
	flags = arch_local_irq_save()
	arch_local_irq_restore(flags)
	arch_local_irq_disable()
	arch_local_irq_enable()
	arch_irqs_disabled_flags(flags)
	arch_irqs_disabled()
	arch_safe_halt()

Then linux/irqflags.h wraps these to provide:

	raw_local_save_flags(flags)
	raw_local_irq_save(flags)
	raw_local_irq_restore(flags)
	raw_local_irq_disable()
	raw_local_irq_enable()
	raw_irqs_disabled_flags(flags)
	raw_irqs_disabled()
	raw_safe_halt()

with type checking on the flags 'arguments', and then wraps those to provide:

	local_save_flags(flags)
	local_irq_save(flags)
	local_irq_restore(flags)
	local_irq_disable()
	local_irq_enable()
	irqs_disabled_flags(flags)
	irqs_disabled()
	safe_halt()

with tracing included if enabled.

The arch functions can now all be inline functions rather than some of them
having to be macros.

Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300]
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile]
Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze]
Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR]
Acked-by: Tony Luck <tony.luck@intel.com> [IA-64]
Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R]
Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS]
Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC]
Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC]
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390]
Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score]
Acked-by: Matt Fleming <matt@console-pimps.org> [SH]
Acked-by: David S. Miller <davem@davemloft.net> [Sparc]
Acked-by: Chris Zankel <chris@zankel.net> [Xtensa]
Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha]
Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300]
Cc: starvik@axis.com [CRIS]
Cc: jesper.nilsson@axis.com [CRIS]
Cc: linux-cris-kernel@axis.com
2010-10-07 14:08:55 +01:00
Martin Schwidefsky
050eef364a [S390] fix tlb flushing vs. concurrent /proc accesses
The tlb flushing code uses the mm_users field of the mm_struct to
decide if each page table entry needs to be flushed individually with
IPTE or if a global flush for the mm_struct is sufficient after all page
table updates have been done. The comment for mm_users says "How many
users with user space?" but the /proc code increases mm_users after it
found the process structure by pid without creating a new user process.
Which makes mm_users useless for the decision between the two tlb
flusing methods. The current code can be confused to not flush tlb
entries by a concurrent access to /proc files if e.g. a fork is in
progres. The solution for this problem is to make the tlb flushing
logic independent from the mm_users field.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-24 09:26:34 +02:00
Linus Torvalds
0d6ffdb8f1 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
  [S390] dasd: tunable missing interrupt handler
  [S390] dasd: allocate fallback cqr for reserve/release
  [S390] topology: use default MC domain initializer
  [S390] initrd: change default load address
  [S390] cmm, smsgiucv_app: convert sender to uppercase
  [S390] cmm: add missing __init/__exit annotations
  [S390] cio: use all available paths for some internal I/O
  [S390] ccwreq: add ability to use all paths
  [S390] cio: ccw_device_online_store return -EINVAL in case of missing driver
  [S390] cio: Log the response from the unit check handler
  [S390] cio: CHSC SIOSL Support
2010-08-10 14:01:26 -07:00
Heiko Carstens
a1b200e27c mm: provide init_mm mm_context initializer
Provide an INIT_MM_CONTEXT intializer macro which can be used to
statically initialize mm_struct:mm_context of init_mm.  This way we can
get rid of code which will do the initialization at run time (on s390).

In addition the current code can be found at a place where it is not
expected.  So let's have a common initializer which architectures
can use if needed.

This is based on a patch from Suzuki Poulose.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Suzuki Poulose <suzuki@in.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:44:54 -07:00
Hendrik Brueckner
41b4287677 [S390] cmm, smsgiucv_app: convert sender to uppercase
The sender kernel parameter contains a z/VM user ID where
alphabetic characters must be specified in uppercase.

Allow users to specify lowercase characters and convert the
sender string to uppercase at module initialization.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-09 18:12:54 +02:00
Heiko Carstens
2e85ba510e [S390] cmm: add missing __init/__exit annotations
Add missing __init and __exit annoations for module init and exit
functions. This will save us huge amounts of memory... sort of.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-08-09 18:12:54 +02:00
Heiko Carstens
c2f0e8c803 [S390] appldata/extmem/kvm: add missing GFP_KERNEL flag
Add missing GFP flag to memory allocations. The part in cio only
changes a comment.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-06-08 18:58:23 +02:00
Heiko Carstens
cf9daf4a73 [S390] cmm: get rid of CMM_PROC config option
All distros have this option switched on, so lets get rid of at least
one of the tons of config options that are available.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-05-26 23:27:10 +02:00
Heiko Carstens
db705e831a [S390] cmm: remove superfluous EXPORT_SYMBOLs plus cleanups
Remove superfluous EXPORT_SYMBOLS and do coding style cleanup while
being at it.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-05-26 23:27:10 +02:00
Heiko Carstens
1ef6acf597 [S390] cmm: fix crash on module unload
There might be a scheduled cmm_timer if the cmm module gets unloaded.
That timer was not deleted during module unload and thus could lead
to system crash later on.
Besides that reorder function calls in module init and exit code to
avoid a couple of other races which could lead to accesses to
uninitialized data.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-05-26 23:26:29 +02:00
Heiko Carstens
ab3c68ee5f [S390] debug: enable exception-trace debug facility
The exception-trace facility on x86 and other architectures prints
traces to dmesg whenever a user space application crashes.
s390 has such a feature since ages however it is called
userprocess_debug and is enabled differently.
This patch makes sure that whenever one of the two procfs files

/proc/sys/kernel/userprocess_debug
/proc/sys/debug/exception-trace

is modified the contents of the second one changes as well.
That way we keep backwards compatibilty but also support the same
interface like other architectures do.
Besides that the output of the traces is improved since it will now
also contain the corresponding filename of the vma (when available)
where the process caused a fault or trap.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-05-17 10:00:17 +02:00
Christian Borntraeger
6af7eea2ae [S390] s390: disable change bit override
commit 6a985c6194
([S390] s390: use change recording override for kernel mapping)
deactivated the change bit recording for the kernel mapping to
improve the performance. This works most of the time, but there
are cases (e.g. kernel runs in home space, futex atomic compare xcmg)
where we modify user memory with the kernel mapping instead of the
user mapping.
Instead of fixing these cases, this patch just deactivates change bit
override to avoid future problems with other kernel code that might
use the kernel mapping for user memory.

CC: stable@kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-04-09 13:43:02 +02:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Michael Holzheu
92fe31329c [S390] zcore: CPU registers are not saved under LPAR
To save the registers for all CPUs a sigp "store status" is done that
stores the registers to address absolute zero. To access storage at
absolute zero, normally the address of the prefix register of the
accessing CPU has to be used. This does not work when large pages are
active (currently only under LPAR). In order to fix that problem,
instead of memcpy memcpy_real is used, which switches to real mode
where prefixing works.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-03-24 11:49:53 +01:00
Hendrik Brueckner
09003ed90a [S390] smsgiucv: declare char pointers as "const"
Declare the smsgiucv prefix char pointer as "const" and use
use const char pointers in callback functions.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-03-08 12:26:28 +01:00
Heiko Carstens
22e0a04672 [S390] use kprobes_built_in() in mm/fault code
Use kprobes_built_in() to avoid ifdefs like most other architectures do.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-26 22:37:32 +01:00
Heiko Carstens
cbb870c822 [S390] Cleanup struct _lowcore usage and defines.
Use asm offsets to make sure the offset defines to struct _lowcore and
its layout don't get out of sync.
Also add a BUILD_BUG_ON() which checks that the size of the structure
is sane.
And while being at it change those sites which use odd casts to access
the current lowcore. These should use S390_lowcore instead.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-26 22:37:31 +01:00
Heiko Carstens
d96221ab1e [S390] free_initmem: reduce code duplication
free_initmem() and free_initrd_mem() are nearly identical. So make them
call a common function.
Also fixes a bug: if the initrd wouldn't start on a page boundary also
memory after the initrd would be initialized with the poison value.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-26 22:37:31 +01:00
Heiko Carstens
b8e660b83d [S390] Replace ENOTSUPP usage with EOPNOTSUPP
ENOTSUPP is not supposed to leak to userspace so lets just use
EOPNOTSUPP everywhere.
Doesn't fix a bug, but makes future reviews easier.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-02-26 22:37:31 +01:00
Jiri Slaby
a58c26bba9 [S390] use helpers for rlimits
Make sure compiler won't do weird things with limits. E.g. fetching
them twice may return 2 different values after writable limits are
implemented.

I.e. either use rlimit helpers added in
3e10e716ab
or ACCESS_ONCE if not applicable.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: linux-s390@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-01-13 20:44:45 +01:00
Linus Torvalds
67dd2f5a66 Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (72 commits)
  [S390] 3215/3270 console: remove wrong comment
  [S390] dasd: remove BKL from extended error reporting code
  [S390] vmlogrdr: remove BKL
  [S390] vmur: remove BKL
  [S390] zcrypt: remove BKL
  [S390] 3270: remove BKL
  [S390] vmwatchdog: remove lock_kernel() from open() function
  [S390] monwriter: remove lock_kernel() from open() function
  [S390] monreader: remove lock_kernel() from open() function
  [S390] s390: remove unused nfsd #includes
  [S390] ftrace: build ftrace.o when CONFIG_FTRACE_SYSCALLS is set for s390
  [S390] etr/stp: put correct per cpu variable
  [S390] tty3270: move keyboard compat ioctls
  [S390] sclp: improve servicability setting
  [S390] s390: use change recording override for kernel mapping
  [S390] MAINTAINERS: Add s390 drivers block
  [S390] use generic sockios.h header file
  [S390] use generic termbits.h header file
  [S390] smp: remove unused typedef and defines
  [S390] cmm: free pages on hibernate.
  ...
2009-12-09 19:01:47 -08:00
Christian Borntraeger
6a985c6194 [S390] s390: use change recording override for kernel mapping
We dont need the dirty bit if a write access is done via the kernel
mapping. In that case SetPageDirty and friends are used anyway, no
need to do that a second time. We can use the change-recording
overide function for the kernel mapping, if available.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:37 +01:00
Martin Schwidefsky
52b169c864 [S390] cmm: free pages on hibernate.
The pages allocated by the cmm memory balloon should be freed before
the hibernation image is created. Otherwise the memory reserved by the
balloon gets written to the swap device but there is no content in
these pages that need to be preserved.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:37 +01:00
Gerald Schaefer
6c1e3e7943 [S390] Use do_exception() in pagetable walk usercopy functions.
The pagetable walk usercopy functions have used a modified copy of the
do_exception() function for fault handling. This lead to inconsistencies
with recent changes to do_exception(), e.g. performance counters. This
patch changes the pagetable walk usercopy code to call do_exception()
directly, eliminating the redundancy. A new parameter is added to
do_exception() to specify the fault address.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:34 +01:00
Martin Schwidefsky
1ab947de29 [S390] fault handler access flags check.
Simplify the check of the vma->flags in do_exception for the
different fault types.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:34 +01:00
Martin Schwidefsky
50d7280d43 [S390] fault handler performance optimization.
Slim down the do_exception function to handle only the fast path of a
fault and move the exceptional cases into a new function. That slightly
increases the performance of the fault handling.

Build fix for !CONFIG_COMPAT by
Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:33 +01:00
Martin Schwidefsky
7ecb344ae8 [S390] Improve notify_page_fault implementation.
notify_page_fault does a preempt_disable/preempt_enable for each
fault generated by a kernel access to user space. If kprobes
is not active that is unnecessary since the interrupts are not
reenabled yet. To play safe repeat the kprobe_running check after
preempt_disable().

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:33 +01:00
Martin Schwidefsky
b11b533427 [S390] Improve address space mode selection.
Introduce user_mode to replace the two variables switch_amode and
s390_noexec. There are three valid combinations of the old values:
  1) switch_amode == 0 && s390_noexec == 0
  2) switch_amode == 1 && s390_noexec == 0
  3) switch_amode == 1 && s390_noexec == 1
They get replaced by
  1) user_mode == HOME_SPACE_MODE
  2) user_mode == PRIMARY_SPACE_MODE
  3) user_mode == SECONDARY_SPACE_MODE
The new kernel parameter user_mode=[primary,secondary,home] lets
you choose the address space mode the user space processes should
use. In addition the CONFIG_S390_SWITCH_AMODE config option
is removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07 12:51:33 +01:00