Commit graph

2918 commits

Author SHA1 Message Date
Chen Gang
72b7fb5fda s390/atomic: use 'unsigned int' instead of 'unsigned long' for atomic_*_mask()
The type of 'v->counter' is always 'int', and related inline assembly
code also process 'int', so use 'unsigned int' instead of 'unsigned
long' for the 'mask'.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:17:02 +02:00
Heiko Carstens
eb0bf929d5 s390/gup: handle zero nr_pages case correctly
If [__]get_user_pages_fast() gets called with nr_pages == 0, the current
code would walk the page tables and pin as many pages until the first
invalid pte (or the kernel crashed while writing struct page pointers to
the pages array).
So let's handle at least the nr_pages == 0 case correctly and exit early.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:17:01 +02:00
Heiko Carstens
01997bbc92 s390/gup: reduce code duplication between [__]get_user_pages_fast functions
Just call __get_user_pages_fast() from get_user_pages_fast() like powerpc.
This saves a lot of duplicated code.

Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:17:01 +02:00
Martin Schwidefsky
127c1fefff s390/mm: do not initialize storage keys
With dirty and referenced bits implemented in software it is unnecessary
to initialize the storage key for every page. With this patch not a single
storage key operation is done for a system that does not use KVM.
For KVM set_pte_at/pgste_set_key will do the initialization for the guest
view of the storage key when the mapping for the page is established in
the host.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:17:00 +02:00
Martin Schwidefsky
6cef30034c s390/bpf,jit: fix prolog oddity
The prolog of functions generated by the bpf jit compiler uses an
instruction sequence with an "ahi" instruction to create stack space
instead of using an "aghi" instruction. Using the 32-bit "ahi" is not
wrong as the stack we are operating on is an order-4 allocation which
is always aligned to 16KB. But it is more consistent to use an "aghi"
as the stack pointer is a 64-bit value.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:59 +02:00
Heiko Carstens
12325f0978 s390: cleanup and add sanity checks to control register macros
- turn some macros into functions
- merge two almost identical versions for 32/64 bit
- add BUILD_BUG_ON() check to make sure the passed in array is large enough

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:59 +02:00
Sebastian Ott
69db3b5e85 s390/pci: implement hibernation hooks
Implement architecture-specific functionality when a PCI device is
doing a hibernate transition.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:57 +02:00
Martin Schwidefsky
e258d719ff s390/uaccess: always run the kernel in home space
Simplify the uaccess code by removing the user_mode=home option.
The kernel will now always run in the home space mode.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:57 +02:00
Heiko Carstens
7d7c7b24e4 s390/bitops: rename find_first_bit_left() to find_first_bit_inv()
find_first_bit_left() and friends have nothing to do with the normal
LSB0 bit numbering for big endian machines used in Linux (least
significant bit has bit number 0).
Instead they use MSB0 bit numbering, where the most signficant bit has
bit number 0. So rename find_first_bit_left() and friends to
find_first_bit_inv(), to avoid any confusion.
Also provide inv versions of set_bit, clear_bit and test_bit.

This also removes the confusing use of e.g. set_bit() in airq.c which
uses a "be_to_le" bit number conversion, which could imply that instead
set_bit_le() could be used. But that is entirely wrong since the _le
bitops variant uses yet another bit numbering scheme.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:56 +02:00
Heiko Carstens
b1cb7e2b6c s390/bitops: use flogr instruction to implement __ffs, ffs, __fls, fls and fls64
Since z9 109 we have the flogr instruction which can be used to implement
optimized versions of __ffs, ffs, __fls, fls and fls64.
So implement and use them, instead of the generic variants.
This reduces the size of the kernel image (defconfig, -march=z9-109)
by 19,648 bytes.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:55 +02:00
Heiko Carstens
746479cdcb s390/bitops: use generic find bit functions / reimplement _left variant
Just like all other architectures we should use out-of-line find bit
operations, since the inline variant bloat the size of the kernel image.
And also like all other architecures we should only supply optimized
variants of the __ffs, ffs, etc. primitives.

Therefore this patch removes the inlined s390 find bit functions and uses
the generic out-of-line variants instead.

The optimization of the primitives follows with the next patch.

With this patch also the functions find_first_bit_left() and
find_next_bit_left() have been reimplemented, since logically, they are
nothing else but a find_first_bit()/find_next_bit() implementation that
use an inverted __fls() instead of __ffs().
Also the restriction that these functions only work on machines which
support the "flogr" instruction is gone now.

This reduces the size of the kernel image (defconfig, -march=z9-109)
by 144,482 bytes.
Alone the size of the function build_sched_domains() gets reduced from
7 KB to 3,5 KB.

We also git rid of unused functions like find_first_bit_le()...

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:55 +02:00
Hendrik Brueckner
f1d86b61fb s390/s390dbf: add debug_level_enabled() function
Add the debug_level_enabled() function to check if debug events for
a particular level would be logged.  This might help to save cycles
for debug events that require additional information collection.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:53 +02:00
Heiko Carstens
4ae803253e s390/bitops: optimize set_bit() for constant values
Since zEC12 we have the interlocked-access facility 2 which allows to
use the instructions ni/oi/xi to update a single byte in storage with
compare-and-swap semantics.
So change set_bit(), clear_bit() and change_bit() to generate such code
instead of a compare-and-swap loop (or using the load-and-* instruction
family), if possible.
This reduces the text segment by yet another 8KB (defconfig).

Alternatively the long displacement variants niy/oiy/xiy could have
been used, but the extended displacement field is usually not needed
and therefore would only increase the size of the text segment again.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:53 +02:00
Heiko Carstens
370b0b5f77 s390/bitops: remove CONFIG_SMP / simplify non-atomic bitops
Remove CONFIG_SMP from bitops code. This reduces the C code significantly
but also generates better code for the SMP case.

This means that for !CONFIG_SMP set_bit() and friends now also have
compare and swap semantics (read: more code). However nobody really cares
for !CONFIG_SMP and this is the trade-off to simplify the SMP code which we
do care about.

The non-atomic bitops like __set_bit() now generate also better code
because the old code did not have a __builtin_contant_p() check for the
CONFIG_SMP case and therefore always generated the inline assembly variant.
However the inline assemblies for the non-atomic case now got completely
removed since gcc can produce better code, which accesses less memory
operands.

test_bit() got also a bit simplified since it did have a
__builtin_constant_p() check, however two identical code pathes for each
case (written differently).

In result this mainly reduces the to be maintained code but is not very
relevant for code generation, since there are not many non-atomic bitops
usages that we care about.
(code reduction defconfig kernel image before/after: 560 bytes).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:52 +02:00
Heiko Carstens
9a70a42835 s390/atomic: various small cleanups
- add a typecheck to the defines to make sure they operate on an atomic_t
- simplify inline assembly constraints
- keep variable names common between functions

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:51 +02:00
Heiko Carstens
5692e4d11c s390/atomic: optimize atomic_add() for constant values
If the interlocked-access facility 1 is available we can use the asi
and agsi instructions for interlocked updates if the to be added
value is a contanst and small (in the range of -128..127).
asi and agsi do not not return the old or new value, therefore these
instructions can only be used for atomic_(add|sub|inc|dec)[64].

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:51 +02:00
Heiko Carstens
1ffa11abfe s390/kprobes: allow kprobes only on known instructions
Since we have an in-kernel disassembler we can make sure that
there won't be any kprobes set on random data.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:50 +02:00
Heiko Carstens
a882b3b067 s390/kprobes: use insn_length helper function
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:49 +02:00
Heiko Carstens
0f20822a69 s390/dis: move disassembler function prototypes to proper header file
Now that the in-kernel disassembler has an own header file move the
disassembler related function prototypes to that header file.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:48 +02:00
Suzuki K. Poulose
648ae35c54 s390/dis: move common definitions to a header file
The patch moves some of the definitions to a
header file. No functional changes involved.

I have retained the Copyright Statement from the
original file.

Signed-off-by: Suzuki K Poulose <suzuki@in.ibm.com>
[Heiko Carstens: rename s390-dis.h to dis.h]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:48 +02:00
Suzuki K. Poulose
f616d67607 s390/dis: rename structures for unique types
Rename 'insn' and 'operand' structures to more canonical names
to avoid conflicts.

struct insn represents information about an instruction, including
the mnemonics, format and opcode.

struct operand represents the 'properties' and information on howto
interpret the operand value and doesn't contain the value.

We rename these structures for avoiding a global conflict.

i.e,

1,$s/struct insn/struct s390_insn/g
1,$s/struct operand/struct s390_operand/g

Signed-off-by: Suzuki K Poulose <suzuki@in.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:47 +02:00
Heiko Carstens
75287430b4 s390/atomic: make use of interlocked-access facility 1 instructions
Same as for bitops: make use of the interlocked-access facility 1
instructions which allow to atomically update storage locations
without a compare-and-swap loop.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:46 +02:00
Heiko Carstens
86d51bc31f s390/atomic: implement atomic_sub_return() with atomic_add_return()
Get rid of the own atomic_sub_return() implementation. Otherwise we can't
make use of the interlocked-access facility 1 instructions for
atomic_sub_return(), since there is no "load and subtract" instruction
available.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:46 +02:00
Heiko Carstens
fcd05b50fc s390/kprobes: have more correct if statement in s390_get_insn_slot()
When checking the insn address wether it is a kernel image or module
address it should be an if-else-if statement not two independent if
statements.
This doesn't really fix a bug, but matches s390_free_insn_slot().

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:45 +02:00
Heiko Carstens
faf499dfcc s390: always set -march compiler option
Currently we only set the -march compiler option if the kbuild
system figured out that the compiler actually supports the selected
architecture (cc-option test).

In result this means that no -march compiler option is set when an
unsupported cpu architecture of the current compiler is selected.
The kernel compile will afterwards succeed but with the default
architecture instead of the (unsupported) selected one.

Change this behaviour, so compiles will fail if the compiler does
not support the selected cpu architecture.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:44 +02:00
Heiko Carstens
e344e52c7c s390/bitops: make use of interlocked-access facility 1 instructions
Make use of the interlocked-access facility 1 that got added with the
z196 architecure.
This facilility added new instructions which can atomically update a
storage location without a compare-and-swap loop. E.g. setting a bit
within a "long" can be done with a single instruction.

The size of the kernel image gets ~30kb smaller. Considering that there
are appr. 1900 bitops call sites this means that each one saves about
15-16 bytes per call site which is expected.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:44 +02:00
Linus Torvalds
320437af95 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Several last minute bug fixes.

  Two of them are on the larger side for rc7, the dasd format patch for
  older storage devices and the store-clock-fast patch where we have
  been to optimistic with an optimization"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/time: correct use of store clock fast
  s390/vmlogrdr: fix array access in vmlogrdr_open()
  s390/compat,signal: fix return value of copy_siginfo_(to|from)_user32()
  s390/dasd: check for availability of prefix command during format
  s390/mm,kvm: fix software dirty bits vs. kvm for old machines
2013-10-23 08:10:25 +01:00
Linus Torvalds
db10accfd2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Sorry I let so much accumulate, I was in Buffalo and wanted a few
  things to cook in my tree for a while before sending to you.  Anyways,
  it's a lot of little things as usual at this stage in the game"

 1) Make bonding MAINTAINERS entry reflect reality, from Andy
    Gospodarek.

 2) Fix accidental sock_put() on timewait mini sockets, from Eric
    Dumazet.

 3) Fix crashes in l2tp due to mis-handling of ipv4 mapped ipv6
    addresses, from François CACHEREUL.

 4) Fix heap overflow in __audit_sockaddr(), from the eagle eyed Dan
    Carpenter.

 5) tcp_shifted_skb() doesn't take handle FINs properly, from Eric
    Dumazet.

 6) SFC driver bug fixes from Ben Hutchings.

 7) Fix TX packet scheduling wedge after channel change in ath9k driver,
    from Felix Fietkau.

 8) Fix user after free in BPF JIT code, from Alexei Starovoitov.

 9) Source address selection test is reversed in
    __ip_route_output_key(), fix from Jiri Benc.

10) VLAN and CAN layer mis-size netlink attributes, from Marc
    Kleine-Budde.

11) Fix permission checks in sysctls to use current_euid() instead of
    current_uid().  From Eric W Biederman.

12) IPSEC policies can go away while a timer is still pending for them,
    add appropriate ref-counting to fix, from Steffen Klassert.

13) Fix mis-programming of FDR and RMCR registers on R8A7740 sh_eth
    chips, from Nguyen Hong Ky and Simon Horman.

14) MLX4 forgets to DMA unmap pages on RX, fix from Amir Vadai.

15) IPV6 GRE tunnel MTU upper limit is miscalculated, from Oussama
    Ghorbel.

16) Fix typo in fq_change(), we were assigning "initial quantum" to
    "quantum".  From Eric Dumazet.

17) Set a more appropriate sk_pacing_rate for non-TCP sockets, otherwise
    FQ packet scheduler does not pace those flows properly.  Also from
    Eric Dumazet.

18) rtlwifi miscalculates packet pointers, from Mark Cave-Ayland.

19) l2tp_xmit_skb() can be called from process context, not just softirq
    context, so we must always make sure to BH disable around it.  From
    Eric Dumazet.

20) On qdisc reset, we forget to purge the RB tree of SKBs in netem
    packet scheduler.  From Stephen Hemminger.

21) Fix info leak in farsync WAN driver ioctl() handler, from Dan
    Carpenter and Salva Peiró.

22) Fix PHY reset and other issues in dm9000 driver, from Nikita
    Kiryanov and Michael Abbott.

23) When hardware can do SCTP crc32 checksums, we accidently don't
    disable the csum offload when IPSEC transformations have been
    applied.  From Fan Du and Vlad Yasevich.

24) Tail loss probing in TCP leaves the socket in the wrong congestion
    avoidance state.  From Yuchung Cheng.

25) In CPSW driver, enable NAPI before interrupts are turned on, from
    Markus Pargmann.

26) Integer underflow and dual-assignment in YAM hamradio driver, from
    Dan Carpenter.

27) If we are going to mangle a packet in tcp_set_skb_tso_segs() we must
    unclone it.  This fixes various hard to track down crashes in
    drivers where the SKBs ->gso_segs was changing right from underneath
    the driver during TX queueing.  From Eric Dumazet.

28) Fix the handling of VLAN IDs, and in particular the special IDs 0
    and 4095, in the bridging layer.  From Toshiaki Makita.

29) Another info leak, this time in wanxl WAN driver, from Salva Peiró.

30) Fix race in socket credential passing, from Daniel Borkmann.

31) WHen NETLABEL is disabled, we don't validate CIPSO packets properly,
    from Seif Mazareeb.

32) Fix identification of fragmented frames in ipv4/ipv6 UDP
    Fragmentation Offload output paths, from Jiri Pirko.

33) Virtual Function fixes in bnx2x driver from Yuval Mintz and Ariel
    Elior.

34) When we removed the explicit neighbour pointer from ipv6 routes a
    slight regression was introduced for users such as IPVS, xt_TEE, and
    raw sockets.  We mix up the users requested destination address with
    the routes assigned nexthop/gateway.  From Julian Anastasov and
    Simon Horman.

35) Fix stack overruns in rt6_probe(), the issue is that can end up
    doing two full packet xmit paths at the same time when emitting
    neighbour discovery messages.  From Hannes Frederic Sowa.

36) davinci_emac driver doesn't handle IFF_ALLMULTI correctly, from
    Mariusz Ceier.

37) Make sure to set TCP sk_pacing_rate after the first legitimate RTT
    sample, from Neal Cardwell.

38) Wrong netlink attribute passed to xfrm_replay_verify_len(), from
    Steffen Klassert.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits)
  ax88179_178a: Add VID:DID for Samsung USB Ethernet Adapter
  ax88179_178a: Correct the RX error definition in RX header
  Revert "bridge: only expire the mdb entry when query is received"
  tcp: initialize passive-side sk_pacing_rate after 3WHS
  davinci_emac.c: Fix IFF_ALLMULTI setup
  mac802154: correct a typo in ieee802154_alloc_device() prototype
  ipv6: probe routes asynchronous in rt6_probe
  netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper
  ipv6: fill rt6i_gateway with nexthop address
  ipv6: always prefer rt6i_gateway if present
  bnx2x: Set NETIF_F_HIGHDMA unconditionally
  bnx2x: Don't pretend during register dump
  bnx2x: Lock DMAE when used by statistic flow
  bnx2x: Prevent null pointer dereference on error flow
  bnx2x: Fix config when SR-IOV and iSCSI are enabled
  bnx2x: Fix Coalescing configuration
  bnx2x: Unlock VF-PF channel on MAC/VLAN config error
  bnx2x: Prevent an illegal pointer dereference during panic
  bnx2x: Fix Maximum CoS estimation for VFs
  drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing
  ...
2013-10-23 07:47:42 +01:00
Martin Schwidefsky
8c071b0f19 s390/time: correct use of store clock fast
The result of the store-clock-fast (STCKF) instruction is a bit fuzzy.
It can happen that the value stored on one CPU is smaller than the value
stored on another CPU, although the order of the stores is the other
way around. This can cause deltas of get_tod_clock() values to become
negative when they should not be.

We need to be more careful with store-clock-fast, this patch partially
reverts git commit e4b7b4238e666682555461fa52eecd74652f36bb "time:
always use stckf instead of stck if available". The get_tod_clock()
function now uses the store-clock-extended (STCKE) instruction.
get_tod_clock_fast() can be used if the fuzziness of store-clock-fast
is acceptable e.g. for wait loops local to a CPU.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-22 09:16:40 +02:00
Heiko Carstens
0ebfd313fd s390/compat,signal: fix return value of copy_siginfo_(to|from)_user32()
The return value of copy_siginfo_(to|from)_user32() gets passed to
user space, however we do not convert a positive return value from
copy_(to|from)_user to -EFAULT.
Therefore these functions (and the calling system calls) my incorrectly
return a positive number (bytes not copied) instead of -EFAULT.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-15 13:47:59 +02:00
Martin Schwidefsky
af0ebc40a8 s390/mm,kvm: fix software dirty bits vs. kvm for old machines
For machines without enhanced supression on protection the software
dirty bit code forces the pte dirty bit and clears the page protection
bit in pgste_set_pte. This is done for all pte types, the check for
present ptes is missing. As a result swap ptes and other not-present
ptes can get corrupted.
Add a check for the _PAGE_PRESENT bit to pgste_set_pte before modifying
the pte value.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-15 13:47:57 +02:00
Ingo Molnar
3f0116c323 compiler/gcc4: Add quirk for 'asm goto' miscompilation bug
Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
constructs, as outlined here:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670

Implement a workaround suggested by Jakub Jelinek.

Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-11 07:39:14 +02:00
Alexei Starovoitov
d45ed4a4e3 net: fix unsafe set_memory_rw from softirq
on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&(&vb->lock)->rlock);
[   56.799593]   <Interrupt>
[   56.805889]     lock(&(&vb->lock)->rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
[   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
[   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
[   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
[   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
[   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
[   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
[   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
[   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
[   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
[   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
[   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
[   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
[   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 15:16:45 -04:00
Martin Schwidefsky
dbbfe487e5 s390: fix system call restart after inferior call
Git commit 616498813b "s390: system call path micro optimization"
introduced a regression in regard to system call restarting and inferior
function calls via the ptrace interface. The pointer to the system call
table needs to be loaded in sysc_sigpending if do_signal returns with
TIF_SYSCALl set after it restored a system call context.

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-30 13:04:40 +02:00
Michael Holzheu
4d3b0664a0 s390: Allow vmalloc target buffers for copy_from_oldmem()
Currently copy_from_oldmem() is not able to copy to virtual memory.
When using kexec pre-allocated ELF header, copy_from_oldmem()
is used to copy the ELF notes information to vmalloc buffers.

So fix this and use the new function copy_from_realmem() that allows
copying also to vmalloc memory.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-30 13:04:40 +02:00
Heiko Carstens
7423435511 s390/kprobes: add exrl to list of prohibited opcodes
"execute relative long" may have all sorts of side effects dependend on
the instructions it executes.
Therefore prohibit setting a kprobe on exrl just like we do for the
regular execute instruction.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-30 13:04:38 +02:00
Heiko Carstens
efc1d23b3d s390: enable ARCH_USE_CMPXCHG_LOCKREF
Enable ARCH_USE_CMPXCHG_LOCKREF since it shows performance improvements
with Linus' simple stat() test case of up to 50% on a 30 cpu system.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-28 12:46:29 +02:00
Heiko Carstens
083986e824 mutex: replace CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX with simple ifdef
Linus suggested to replace

 #ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX
 #define arch_mutex_cpu_relax() cpu_relax()
 #endif

with just a simple

  #ifndef arch_mutex_cpu_relax
  # define arch_mutex_cpu_relax() cpu_relax()
  #endif

to get rid of CONFIG_HAVE_CPU_RELAX_SIMPLE. So architectures can
simply define arch_mutex_cpu_relax if they want an architecture
specific function instead of having to add a select statement in
their Kconfig in addition.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-28 12:46:21 +02:00
Martin Schwidefsky
0244ad004a Remove GENERIC_HARDIRQ config option
After the last architecture switched to generic hard irqs the config
options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
for !CONFIG_GENERIC_HARDIRQS can be removed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-13 15:09:52 +02:00
Johannes Weiner
759496ba64 arch: mm: pass userspace fault flag to generic fault handler
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
Michael Holzheu
6f79d33228 s390/vmcore: use vmcore for zfcpdump
Modify the s390 copy_oldmem_page() and remap_oldmem_pfn_range() function
for zfcpdump to read from the HSA memory if memory below HSA_SIZE bytes is
requested.  Otherwise real memory is used.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Willeke <willeke@de.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:15 -07:00
Jan Willeke
23df79da8e s390/vmcore: implement remap_oldmem_pfn_range for s390
Introduce the s390 specific way to map pages from oldmem.  The memory area
below OLDMEM_SIZE is mapped with offset OLDMEM_BASE.  The other old memory
is mapped directly.

Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:12 -07:00
Michael Holzheu
97b0f6f9cd s390/vmcore: use ELF header in new memory feature
Exchange the old relocate mechanism with the new arch function call
override mechanism that allows to create the ELF core header in the 2nd
kernel.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Jan Willeke <willeke@de.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:10 -07:00
Heiko Carstens
63c40436a1 s390/kprobes: add support for pc-relative long displacement instructions
With the general-instruction extension facility (z10) a couple of
instructions with a pc-relative long displacement were introduced.  The
kprobes support for these instructions however was never implemented.

In result, if anybody ever put a probe on any of these instructions the
result would have been random behaviour after the instruction got executed
within the insn slot.

So lets add the missing handling for these instructions.  Since all of the
new instructions have 32 bit signed displacement the easiest solution is
to allocate an insn slot that is within the same 2GB area like the
original instruction and patch the displacement field.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:58:52 -07:00
Naoya Horiguchi
83467efbdb mm: migrate: check movability of hugepage in unmap_and_move_huge_page()
Currently hugepage migration works well only for pmd-based hugepages
(mainly due to lack of testing,) so we had better not enable migration of
other levels of hugepages until we are ready for it.

Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
page table walk and check pud/pmd_huge() there, so they are safe.  But the
other users (softoffline and memory hotremove) don't do this, so without
this patch they can try to migrate unexpected types of hugepages.

To prevent this, we introduce hugepage_migration_support() as an
architecture dependent check of whether hugepage are implemented on a pmd
basis or not.  And on some architecture multiple sizes of hugepages are
available, so hugepage_migration_support() also checks hugepage size.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:49 -07:00
Linus Torvalds
e831cbfc1a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
 "This includes one bpf/jit bug fix where the jit compiler could
  sometimes write generated code out of bounds of the allocated memory
  area.

  The rest of the patches are only cleanups and minor improvements"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/irq: reduce size of external interrupt handler hash array
  s390/compat,uid16: use current_cred()
  s390/ap_bus: use and-mask instead of a cast
  s390/ftrace: avoid pointer arithmetics with function pointers
  s390: make various functions static, add declarations to header files
  s390/compat signal: add couple of __force annotations
  s390/mm: add __releases()/__acquires() annotations to gmap_alloc_table()
  s390: keep Kconfig sorted
  s390/irq: rework irq subclass handling
  s390/irq: use hlists for external interrupt handler array
  s390/dumpstack: convert print_symbol to %pSR
  s390/perf: Remove print_hex_dump_bytes() debug output
  s390: update defconfig
  s390/bpf,jit: fix address randomization
2013-09-11 08:36:03 -07:00
Heiko Carstens
9e75c6274a s390/irq: reduce size of external interrupt handler hash array
Change the hash algorithm a bit so it produces only values in the
range of 0..31.
This allows to reduce the size of the external interrupt handler hash
array even further while making sure that each of the known interrupt
sources keeps its unique hash with the slightly modified algorithm:

0x1004 --> 12
0x1201 --> 10
0x1202 --> 11
0x1406 --> 16
0x1407 --> 17
0x2401 --> 19
0x2603 --> 22
0x4000 --> 0

This also means that the entire array now fits into exactly one cache
line; so add a proper align statement as well.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-09 08:57:32 +02:00
Heiko Carstens
2ec7f4aec4 s390/compat,uid16: use current_cred()
86a264ab "CRED: Wrap current->cred and a few other accessors" converted
all uses of current->cred into current_cred() but left s390 alone.

So let's convert s390 finally as well, only five years later.

This way we also get rid of a sparse warning which complains about a
possible invalid rcu dereference which however is a false positive.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-07 11:58:21 +02:00
Heiko Carstens
5eb8ae503e s390/ftrace: avoid pointer arithmetics with function pointers
Pointer arithmetics with function pointers is not really defined, but
seems to do the right thing. Let's cast to a void pointer to have a
defined behaviour, at least when using gcc.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-07 11:58:07 +02:00
Heiko Carstens
63df41d663 s390: make various functions static, add declarations to header files
Make various functions static, add declarations to header files to
fix a couple of sparse findings.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2013-09-07 11:58:03 +02:00