Commit graph

364477 commits

Author SHA1 Message Date
Patrick McHardy
cf2c014ade net: vlan: fix memory leak in vlan_info_rcu_free()
The following leak is reported by kmemleak:

[   86.812073] kmemleak: Found object by alias at 0xffff88006ecc76f0
[   86.816019] Pid: 739, comm: kworker/u:1 Not tainted 3.9.0-rc5+ #842
[   86.816019] Call Trace:
[   86.816019]  <IRQ>  [<ffffffff81151c58>] find_and_get_object+0x8c/0xdf
[   86.816019]  [<ffffffff8190e90d>] ? vlan_info_rcu_free+0x33/0x49
[   86.816019]  [<ffffffff81151cbe>] delete_object_full+0x13/0x2f
[   86.816019]  [<ffffffff8194bbb6>] kmemleak_free+0x26/0x45
[   86.816019]  [<ffffffff8113e8c7>] slab_free_hook+0x1e/0x7b
[   86.816019]  [<ffffffff81141c05>] kfree+0xce/0x14b
[   86.816019]  [<ffffffff8190e90d>] vlan_info_rcu_free+0x33/0x49
[   86.816019]  [<ffffffff810d0b0b>] rcu_do_batch+0x261/0x4e7

The reason is that in vlan_info_rcu_free() we don't take the VLAN protocol
into account when iterating over the vlan_devices_array.

Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-21 15:55:42 -04:00
Linus Torvalds
3125929454 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Fix offcore_rsp valid mask for SNB/IVB
  perf: Treat attr.config as u64 in perf_swevent_init()
2013-04-21 10:25:42 -07:00
Linus Torvalds
12c71c4b60 Merge branch 'vm_ioremap_memory-examples'
I'm going to do an -rc8, so I'm just going to do this rather than delay
it any further. They are arguably stable material anyway.

* vm_ioremap_memory-examples:
  mtdchar: remove no-longer-used vma helpers
  vm: convert snd_pcm_lib_mmap_iomem() to vm_iomap_memory() helper
  vm: convert fb_mmap to vm_iomap_memory() helper
  vm: convert mtdchar mmap to vm_iomap_memory() helper
  vm: convert HPET mmap to vm_iomap_memory() helper
2013-04-21 10:16:56 -07:00
Linus Torvalds
830ac8524f Merge branch 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull kdump fixes from Peter Anvin:
 "The kexec/kdump people have found several problems with the support
  for loading over 4 GiB that was introduced in this merge cycle.  This
  is partly due to a number of design problems inherent in the way the
  various pieces of kdump fit together (it is pretty horrifically manual
  in many places.)

  After a *lot* of iterations this is the patchset that was agreed upon,
  but of course it is now very late in the cycle.  However, because it
  changes both the syntax and semantics of the crashkernel option, it
  would be desirable to avoid a stable release with the broken
  interfaces."

I'm not happy with the timing, since originally the plan was to release
the final 3.9 tomorrow.  But apparently I'm doing an -rc8 instead...

* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kexec: use Crash kernel for Crash kernel low
  x86, kdump: Change crashkernel_high/low= to crashkernel=,high/low
  x86, kdump: Retore crashkernel= to allocate under 896M
  x86, kdump: Set crashkernel_low automatically
2013-04-20 18:40:36 -07:00
Linus Torvalds
db93f8b420 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Three groups of fixes:

   1. Make sure we don't execute the early microcode patching if family
      < 6, since it would touch MSRs which don't exist on those
      families, causing crashes.

   2. The Xen partial emulation of HyperV can be dealt with more
      gracefully than just disabling the driver.

   3. More EFI variable space magic.  In particular, variables hidden
      from runtime code need to be taken into account too."

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: Verify the family before dispatching microcode patching
  x86, hyperv: Handle Xen emulation of Hyper-V more gracefully
  x86,efi: Implement efi_no_storage_paranoia parameter
  efi: Export efi_query_variable_store() for efivars.ko
  x86/Kconfig: Make EFI select UCS2_STRING
  efi: Distinguish between "remaining space" and actually used space
  efi: Pass boot services variable info to runtime code
  Move utf16 functions to kernel core and rename
  x86,efi: Check max_size only if it is non-zero.
  x86, efivars: firmware bug workarounds should be in platform code
2013-04-20 18:38:48 -07:00
Linus Torvalds
8c3a13c84b Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
 "A set of fixes from various people - Will Deacon gets a prize for
  removing code this time around.  The biggest fix in this lot is
  sorting out the ARM740T mess.  The rest are relatively small fixes."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7699/1: sched_clock: Add more notrace to prevent recursion
  ARM: 7698/1: perf: fix group validation when using enable_on_exec
  ARM: 7697/1: hw_breakpoint: do not use __cpuinitdata for dbg_cpu_pm_nb
  ARM: 7696/1: Fix kexec by setting outer_cache.inv_all for Feroceon
  ARM: 7694/1: ARM, TCM: initialize TCM in paging_init(), instead of setup_arch()
  ARM: 7692/1: iop3xx: move IOP3XX_PERIPHERAL_VIRT_BASE
  ARM: modules: don't export cpu_set_pte_ext when !MMU
  ARM: mm: remove broken condition check for v4 flushing
  ARM: mm: fix numerous hideous errors in proc-arm740.S
  ARM: cache: remove ARMv3 support code
  ARM: tlbflush: remove ARMv3 support
2013-04-20 18:38:06 -07:00
Linus Torvalds
851b3f3238 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:

 1) Fix race in sparc64 TLB shootdowns, we have to synchronize with the
    sibling cpus completing if we are passing them a reference via
    pointer to a data structure.

 2) Fix cleaning of bitmaps in sparc32, from Akinobu Mita.

 3) Fix various sparc header mistakes, some of which resulted in
    userland build breakage.  From Sam Ravnborg.

 4) Kill ghost declarations and defines missed when several bits of code
    got deleted recently.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix race in TLB batch processing.
  sparc: use asm-generic version of types.h
  bbc_i2c: fix section mismatch warning
  sparc: use generic headers
  sparc:cleanup unused code in smp_32.h
  sparc/iommu: fix typo s/265KB/256KB/
  sparc/srmmu: clear trailing edge of bitmap properly
  sparc:remove unused declaration smp_boot_cpus()
2013-04-20 18:23:08 -07:00
Linus Torvalds
c437d888c7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates from David Miller:

 1) ax88796 does 64-bit divides which causes link errors on ARM, fix
    from Arnd Bergmann.

 2) Once an improper offload setting is detected on an SKB we don't rate
    limit the log message so we can very easily live lock.  From Ben
    Greear.

 3) Openvswitch cannot report vport configuration changes reliably
    because it didn't preallocate the netlink notification message
    before changing state.  From Jesse Gross.

 4) The effective UID/GID SCM credentials fix, from Linus.

 5) When a user explicitly asks for wireless authentication, cfg80211
    isn't told about the AP detachment leaving inconsistent state.  Fix
    from Johannes Berg.

 6) Fix self-MAC checks in batman-adv on multi-mesh nodes, from Antonio
    Quartulli.

 7) Revert build_skb() change sin IGB driver, can result in memory
    corruption.  From Alexander Duyck.

 8) Fix setting VLANs on virtual functions in IXGBE, from Greg Rose.

 9) Fix TSO races in qlcnic driver, from Sritej Velaga.

10) In bnx2x the kernel driver and UNDI firmware can try to program the
    chip at the same time, resulting in corruption.  Add proper
    synchronization.  From Dmitry Kravkov.

11) Fix corruption of status block in firmware ram in bxn2x, from Ariel
    Elior.

12) Fix load balancing hash regression of bonding driver in forwarding
    configurations, from Eric Dumazet.

13) Fix TS ECR regression in TCP by calling tcp_replace_ts_recent() in
    all the right spots, from Eric Dumazet.

14) Fix several bonding bugs having to do with address manintainence,
    including not removing address when configuration operations
    encounter errors, missed locking on the address lists, missing
    refcounting on VLAN objects, etc.  All from Nikolay Aleksandrov.

15) Add workarounds for firmware bugs in LTE qmi_wwan devices, wherein
    the devices fail to add a proper ethernet header while on LTE
    networks but otherwise properly do so on 2G and 3G ones.  From Bjørn
    Mork.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
  net: fix incorrect credentials passing
  net: rate-limit warn-bad-offload splats.
  net: ax88796: avoid 64 bit arithmetic
  qlge: Update version to 1.00.00.32.
  qlge: Fix ethtool autoneg advertising.
  qlge: Fix receive path to drop error frames
  net: qmi_wwan: prevent duplicate mac address on link (firmware bug workaround)
  net: qmi_wwan: fixup destination address (firmware bug workaround)
  net: qmi_wwan: fixup missing ethernet header (firmware bug workaround)
  bonding: in bond_mc_swap() bond's mc addr list is walked without lock
  bonding: disable netpoll on enslave failure
  bonding: primary_slave & curr_active_slave are not cleaned on enslave failure
  bonding: vlans don't get deleted on enslave failure
  bonding: mc addresses don't get deleted on enslave failure
  pkt_sched: fix error return code in fw_change_attrs()
  irda: small read past the end of array in debug code
  tcp: call tcp_replace_ts_recent() from tcp_ack()
  netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too
  netfilter: ipset: bitmap:ip,mac: fix listing with timeout
  bonding: fix l23 and l34 load balancing in forwarding path
  ...
2013-04-20 18:21:05 -07:00
Linus Torvalds
83f1b4ba91 net: fix incorrect credentials passing
Commit 257b5358b3 ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.

Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.

This just undoes that (presumably unintentional) part of the commit.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-20 16:56:42 -04:00
H. Peter Anvin
c0a9f451e4 Merge remote-tracking branch 'efi/urgent' into x86/urgent
Matt Fleming (1):
      x86, efivars: firmware bug workarounds should be in platform
      code

Matthew Garrett (3):
      Move utf16 functions to kernel core and rename
      efi: Pass boot services variable info to runtime code
      efi: Distinguish between "remaining space" and actually used
      space

Richard Weinberger (2):
      x86,efi: Check max_size only if it is non-zero.
      x86,efi: Implement efi_no_storage_paranoia parameter

Sergey Vlasov (2):
      x86/Kconfig: Make EFI select UCS2_STRING
      efi: Export efi_query_variable_store() for efivars.ko

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-04-19 17:09:03 -07:00
H. Peter Anvin
74c3e3fcf3 x86, microcode: Verify the family before dispatching microcode patching
For each CPU vendor that implements CPU microcode patching, there will
be a minimum family for which this is implemented.  Verify this
minimum level of support.

This can be done in the dispatch function or early in the application
functions.  Doing the latter turned out to be somewhat awkward because
of the ineviable split between the BSP and the AP paths, and rather
than pushing deep into the application functions, do this in
the dispatch function.

Reported-by: "Bryan O'Donoghue" <bryan.odonoghue.lkml@nexus-software.ie>
Suggested-by: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1366392183-4149-1-git-send-email-bryan.odonoghue.lkml@nexus-software.ie
2013-04-19 16:36:03 -07:00
Ben Greear
c846ad9b88 net: rate-limit warn-bad-offload splats.
If one does do something unfortunate and allow a
bad offload bug into the kernel, this the
skb_warn_bad_offload can effectively live-lock the
system, filling the logs with the same error over
and over.

Add rate limitation to this so that box remains otherwise
functional in this case.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:57:49 -04:00
Arnd Bergmann
b261c20fe0 net: ax88796: avoid 64 bit arithmetic
When building ax88796 on an ARM platform with 64-bit resource_size_t,
we currently get

drivers/net/ethernet/8390/ax88796.c:875: undefined reference to `__aeabi_uldivmod'

because we do a division on the length of the MMIO resource.
Since we know that this resource is very short, using an
"unsigned long" instead of "resource_size_t" is entirely
sufficient, and avoids this link-time error.

Cc: Ben Dooks <ben-linux@fluff.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:57:48 -04:00
David S. Miller
95a06161e6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains a small batch of Netfilter
updates for your net-next tree, they are:

* Three patches that provide more accurate error reporting to
  user-space, instead of -EPERM, in IPv4/IPv6 netfilter re-routing
  code and NAT, from Patrick McHardy.

* Update copyright statements in Netfilter filters of
  Patrick McHardy, from himself.

* Add Kconfig dependency on the raw/mangle tables to the
  rpfilter, from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:55:29 -04:00
Jitendra Kalsaria
e393ce5780 qlge: Update version to 1.00.00.32.
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:53:57 -04:00
Jitendra Kalsaria
c5e991af93 qlge: Fix ethtool autoneg advertising.
Autoneg is supported on specific port types only. Fix the driver to advertise
autoneg based on the port type.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:53:56 -04:00
Sritej Velaga
ae721f3ab0 qlge: Fix receive path to drop error frames
o Fix the driver to drop error frames in the receive path
o Update error counter which was not getting incremented

Signed-off-by: Sritej Velaga <sritej.velaga@qlogic.com>
Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:53:56 -04:00
David S. Miller
b79d4a8dfd Merge branch 'qmi_wwan'
Bjørn Mork says:

====================
This series adds workarounds for 3 different firmware bugs, each
preventing the affected devices from working at all. I therefore
humbly request that these fixes go to stable-3.8 (if still
maintained) and 3.9 (either via net if still possible, or via
stable if not).

All 3 workarounds are applied to all devices supported by the driver.
Adding quirks for specific devices was considered as an alternative,
but was rejected because we have too little information about the
exact distribution of the buggy firmwares. All we know is that the
same bug shows up in devices from at least 3 different, and presumably
independent, vendors.

The workarounds have instead been designed to automatically apply
when necessary, and to have as little impact as possible on unaffected
devices.  The series has been tested on a number of devices both with
and without these bugs.

The series should apply cleanly to net/master, net-next/master and
stable/linux-3.8.y
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:51:26 -04:00
Bjørn Mork
cc6ba5fdaa net: qmi_wwan: prevent duplicate mac address on link (firmware bug workaround)
We normally trust and use the CDC functional descriptors provided by a
number of devices.  But some of these will erroneously list the address
reserved for the device end of the link.  Attempting to use this on
both the device and host side will naturally not work.

Work around this bug by ignoring the functional descriptor and assign a
random address instead in this case.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:51:17 -04:00
Bjørn Mork
6483bdc9d7 net: qmi_wwan: fixup destination address (firmware bug workaround)
Received packets are sometimes addressed to 00:a0:c6:00:00:00
instead of the address the device firmware should have learned
from the host:

321.224126 77.16.85.204 -> 148.122.171.134 ICMP 98 Echo (ping) request  id=0x4025, seq=64/16384, ttl=64

0000  82 c0 82 c9 f1 67 82 c0 82 c9 f1 67 08 00 45 00   .....g.....g..E.
0010  00 54 00 00 40 00 40 01 57 cc 4d 10 55 cc 94 7a   .T..@.@.W.M.U..z
0020  ab 86 08 00 62 fc 40 25 00 40 b2 bc 6e 51 00 00   ....b.@%.@..nQ..
0030  00 00 6b bd 09 00 00 00 00 00 10 11 12 13 14 15   ..k.............
0040  16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25   .......... !"#$%
0050  26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35   &'()*+,-./012345
0060  36 37                                             67

321.240607 148.122.171.134 -> 77.16.85.204 ICMP 98 Echo (ping) reply    id=0x4025, seq=64/16384, ttl=55

0000  00 a0 c6 00 00 00 02 50 f3 00 00 00 08 00 45 00   .......P......E.
0010  00 54 00 56 00 00 37 01 a0 76 94 7a ab 86 4d 10   .T.V..7..v.z..M.
0020  55 cc 00 00 6a fc 40 25 00 40 b2 bc 6e 51 00 00   U...j.@%.@..nQ..
0030  00 00 6b bd 09 00 00 00 00 00 10 11 12 13 14 15   ..k.............
0040  16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25   .......... !"#$%
0050  26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35   &'()*+,-./012345
0060  36 37                                             67

The bogus address is always the same, and matches the address
suggested by many devices as a default address.  It is likely a
hardcoded firmware default.

The circumstances where this bug has been observed indicates that
the trigger is related to timing or some other factor the host
cannot control. Repeating the exact same configuration sequence
that caused it to trigger once, will not necessarily cause it to
trigger the next time. Reproducing the bug is therefore difficult.
This opens up a possibility that the bug is more common than we can
confirm, because affected devices often will work properly again
after a reset.  A procedure most users are likely to try out before
reporting a bug.

Unconditionally rewriting the destination address if the first digit
of the received packet is 0, is considered an acceptable compromise
since we already have to inspect this digit.  The simplification will
cause unnecessary rewrites if the real address starts with 0, but this
is still better than adding additional tests for this particular case.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:51:17 -04:00
Bjørn Mork
6ff509af38 net: qmi_wwan: fixup missing ethernet header (firmware bug workaround)
A number of LTE devices from different vendors all suffer from the
same firmware bug: Most of the packets received from the device while
it is attached to a LTE network will not have an ethernet header. The
devices work as expected when attached to 2G or 3G networks, sending
an ethernet header with all packets.

This driver is not aware of which network the modem attached to, and
even if it were there are still some packet types which are always
received with the header intact.

All devices supported by this driver have severely limited
networking capabilities:
 - can only transmit IPv4, IPv6 and possibly ARP
 - can only support a single host hardware address at any time
 - will only do point-to-point communcation with the host

Because of this, we are able to reliably identify any bogus raw IP
packets by simply looking at the 4 IP version bits.  All we need to
do is to avoid 4 or 6 in the first digit of the mac address.  This
workaround ensures this, and fix up the received packets as necessary.

Given the distribution of the bug, it is believed that the source is
the chipset vendor.  The devices which are verified to be affected are:
 Huawei E392u-12 (Qualcomm MDM9200)
 Pantech UML290  (Qualcomm MDM9600)
 Novatel USB551L (Qualcomm MDM9600)
 Novatel E362    (Qualcomm MDM9600)

It is believed that the bug depend on firmware revision, which means
that possibly all devices based on the above mentioned chipset may be
affected if we consider all available firmware revisions.

The information about affected devices and versions is likely
incomplete.  As the additional overhead for packets not needing this
fixup is very small, it is considered acceptable to apply the
workaround to all devices handled by this driver.

Reported-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:51:16 -04:00
David S. Miller
0cb670eef5 Merge branch 'bonding'
Nikolay Aleksandrov says:

====================
This patch-set fixes mainly bugs on enslave failure and one occasion
of a needed locking. The patches are:

	1. On enslave failure mc addresses are not flushed from the slave
	2. On enslave failure vlans are not cleaned up from the slave
	3. On enslave failure the bond's primary and curr_active_slave
	   are not cleaned up (which might result in use of freed memory)
	4. On enslave failure netpoll is not disabled which might result in
	   a memory leak
	5. In bond_mc_swap() the bond's mc addr list is walked without
	   netif_addr_lock, since it can be called without rtnl, add it

v2: patch 01 - fix log message and remove unnecessary code move
====================

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:49:11 -04:00
nikolay@redhat.com
d632ce989c bonding: in bond_mc_swap() bond's mc addr list is walked without lock
Use netif_addr_lock_bh() to acquire the appropriate lock before walking.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:19 -04:00
nikolay@redhat.com
fc7a72ac86 bonding: disable netpoll on enslave failure
slave_disable_netpoll() is not called upon enslave failure which would
lead to a memory leak. Call slave_disable_netpoll() after err_detach as
that's the first error path after enabling netpoll on that slave.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:19 -04:00
nikolay@redhat.com
3c5913b53f bonding: primary_slave & curr_active_slave are not cleaned on enslave failure
On enslave failure primary_slave can point to new_slave which is to be
freed, and the same applies to curr_active_slave. So check if this is
the case and clean up properly after err_detach because that's the first
error code path after they're set.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:19 -04:00
nikolay@redhat.com
a506e7b479 bonding: vlans don't get deleted on enslave failure
The main problem is with vid refcount which only gets bumped up.
Delete the vlans after err_detach as that's the first error path
after the vlans are added.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:18 -04:00
nikolay@redhat.com
25e40305d4 bonding: mc addresses don't get deleted on enslave failure
Add bond_mc_list_flush() after err_detach as that's the first error path
after the addresses are added. The main issue is the mc addresses' refcount
which only gets bumped up.

v2: update log message and don't move code unnecessarily

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:48:18 -04:00
Wei Yongjun
cb95ec6261 pkt_sched: fix error return code in fw_change_attrs()
Fix to return -EINVAL when tb[TCA_FW_MASK] is set and head->mask != 0xFFFFFFFF
instead of 0 (ifdef CONFIG_NET_CLS_IND and tb[TCA_FW_INDEV]), as done elsewhere
in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:34:53 -04:00
Dan Carpenter
e15465e180 irda: small read past the end of array in debug code
The "reason" can come from skb->data[] and it hasn't been capped so it
can be from 0-255 instead of just 0-6.  For example in irlmp_state_dtr()
the code does:

	reason = skb->data[3];
	...
	irlmp_disconnect_indication(self, reason, skb);

Also LMREASON has a couple other values which don't have entries in the
irlmp_reasons[] array.  And 0xff is a valid reason as well which means
"unknown".

So far as I can see we don't actually care about "reason" except for in
the debug code.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 17:32:31 -04:00
David S. Miller
f36391d279 sparc64: Fix race in TLB batch processing.
As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.

So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.

Fix this by using generic infrastructure to synchonize on the
completion of the cross call.

This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().

We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
region, we flush TLBs synchronously.

1) Get rid of xcall_flush_tlb_pending and per-cpu type
   implementations.

2) Do TLB batch cross calls instead via:

	smp_call_function_many()
		tlb_pending_func()
			__flush_tlb_pending()

3) Batch only in lazy mmu sequences:

	a) Add 'active' member to struct tlb_batch
	b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
	c) Set 'active' in arch_enter_lazy_mmu_mode()
	d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
	e) Check 'active' in tlb_batch_add_one() and do a synchronous
           flush if it's clear.

4) Add infrastructure for synchronous TLB page flushes.

	a) Implement __flush_tlb_page and per-cpu variants, patch
	   as needed.
	b) Likewise for xcall_flush_tlb_page.
	c) Implement smp_flush_tlb_page() to invoke the cross-call.
	d) Wire up global_flush_tlb_page() to the right routine based
           upon CONFIG_SMP

5) It turns out that singleton batches are very common, 2 out of every
   3 batch flushes have only a single entry in them.

   The batch flush waiting is very expensive, both because of the poll
   on sibling cpu completeion, as well as because passing the tlb batch
   pointer to the sibling cpus invokes a shared memory dereference.

   Therefore, in flush_tlb_pending(), if there is only one entry in
   the batch perform a completely asynchronous global_flush_tlb_page()
   instead.

Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2013-04-19 17:26:26 -04:00
Stephen Boyd
cea15092f0 ARM: 7699/1: sched_clock: Add more notrace to prevent recursion
cyc_to_sched_clock() is called by sched_clock() and cyc_to_ns()
is called by cyc_to_sched_clock(). I suspect that some compilers
inline both of these functions into sched_clock() and so we've
been getting away without having a notrace marking. It seems that
my compiler isn't inlining cyc_to_sched_clock() though, so I'm
hitting a recursion bug when I enable the function graph tracer,
causing my system to crash. Marking these functions notrace fixes
it. Technically cyc_to_ns() doesn't need the notrace because it's
already marked inline, but let's just add it so that if we ever
remove inline from that function it doesn't blow up.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-04-19 22:23:55 +01:00
Andy Gospodarek
bb5b052f75 bond: add support to read speed and duplex via ethtool
This patch adds support for the get_settings ethtool op to the bonding
driver.  This was motivated by users who wanted to get the speed of the
bond and compare that against throughput to understand utilization.
The behavior before this patch was added was problematic when computing
line utilization after trying to get link-speed and throughput via SNMP.

Output from ethtool looks like this for a round-robin bond:

Settings for bond0:
	Supported ports: [ ]
	Supported link modes:   Not reported
	Supported pause frame use: No
	Supports auto-negotiation: No
	Advertised link modes:  Not reported
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Speed: 11000Mb/s
	Duplex: Full
	Port: Other
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: off
	MDI-X: Unknown
	Link detected: yes

I tested this and verified it works as expected.  A test was also done
on a version backported to an older kernel and it worked well there.

v2: Switch to using ethtool_cmd_speed_set to set speed, added check to
SLAVE_IS_OK for each slave in bond, dropped mode-specific calculations
as they were not needed, and set port type to 'Other.'

v3: Fix useless assignment and checkpatch warning.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:39:50 -04:00
Daniel Borkmann
4b457bdf1d packet: move hw/sw timestamp extraction into a small helper
This patch introduces a small, internal helper function, that is used by
PF_PACKET. Based on the flags that are passed, it extracts the packet
timestamp in the receive path. This is merely a refactoring to remove
some duplicate code in tpacket_rcv(), to make it more readable, and to
enable others to use this function in PF_PACKET as well, e.g. for TX.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:39:13 -04:00
Daniel Borkmann
6e94d1ef37 net: socket: move ktime2ts to ktime header api
Currently, ktime2ts is a small helper function that is only used in
net/socket.c. Move this helper into the ktime API as a small inline
function, so that i) it's maintained together with ktime routines,
and ii) also other files can make use of it. The function is named
ktime_to_timespec_cond() and placed into the generic part of ktime,
since we internally make use of ktime_to_timespec(). ktime_to_timespec()
itself does not check the ktime variable for zero, hence, we name
this function ktime_to_timespec_cond() for only a conditional
conversion, and adapt its users to it.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:39:13 -04:00
David S. Miller
cf27014866 net: Add .gitignore to networking selftests directory.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:36:12 -04:00
David S. Miller
2d6577f17b net: Add missing netdev feature strings for NETIF_F_HW_VLAN_STAG_*
Noticed by Ben Hutchings.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:16:50 -04:00
David S. Miller
92352df1db Merge branch 'qlcnic'
Rajesh Borundia says:

====================
* "qlcnic: Change 82xx adapter VLAN id endian type".
  - Adapter requires VLAN id in little endian. VLAN id was being
    converted to __le16 and then passed as a parameter. Pass VLAN id
    as u16 and then use cpu_to_le16 at appropriate places. It is
    appropriate for net-next as SR-IOV patches have a dependency on it.
* "qlcnic: Fix loopback test for SR-IOV PF".
  - It is appropriate for net-next as change is needed for SRIOV PF
    only.
* Remaining patches add enhancements to SR-IOV functionality like
  - FLR handling
  - Adapter reset recovery handling
  - iproute2 tool support for configuring MAC address, Tx rate and
    VLAN id.
  - Mailbox polling support for SR-IOV PF in case mailbox interrupts
    are disabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:16:42 -04:00
Rajesh Borundia
c637627820 qlcnic: Update version to 5.2.41
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:14:54 -04:00
Rajesh Borundia
7ed3ce4800 qlcnic: Support polling for mailbox events.
o When mailbox interrupt is disabled PF should be
  able to process request from VF. Enable polling
  for such cases.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:14:54 -04:00
Rajesh Borundia
d1a1105efd qlcnic: Fix loopback test for SR-IOV PF.
o Do not disable mailbox interrupts while running
  loopback test through SR-IOV PF.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:14:53 -04:00
Rajesh Borundia
91b7282b61 qlcnic: Support VLAN id config.
o Add support for VLAN id configuration per VF using
  iproute2 tool.
o VLAN id's 1-4094 are treated as PVID by the PF and
  Guest VLAN tagging is not allowed by default.
o PVID is disabled when the VLAN id is set to 0
o Guest VLAN tagging is allowed when the VLAN id is set to 4095.
o Only one Guest VLAN id  is supported.
o VLAN id can be changed only when the VF driver is not loaded.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:14:40 -04:00
Rajesh Borundia
4000e7a78d qlcnic: Support MAC address, Tx rate config.
o Add support for MAC address and Tx rate configuration
  per VF via iproute2 tool.
o Tx rate change is allowed while the guest is running
  and the VF driver is loaded.
o MAC address change is allowed only when VF driver
  is not loaded.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:02:38 -04:00
Rajesh Borundia
f036e4f44e qlcnic: VF reset recovery implementation.
o Implement recovery mechanism for VF to recover from
  adapter resets.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:02:38 -04:00
Rajesh Borundia
97d8105cf3 qlcnic: VF FLR implementation.
o FLR from Hypervisor - When hypervisor issues a VF FLR request,
  adapter notifies the parent PF driver of the FLR request for PF
  driver to perform any cleanup on behalf of that VF.
o FLR from VF Driver - VF driver may initiate a VF FLR request,
  if VF state needs to be cleaned up before a re-initialization.
  VF re-initialization during kdump is an example.
o PF driver cleans up all resources allocated on behalf of a  VF,
  on VF FLR notifications from the adapter or from the VF driver.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:02:38 -04:00
Rajesh Borundia
f80bc8fe6d qlcnic: Change 82xx adapter VLAN id endian type.
o 82xx adapter requires VLAN id in little endian format.
  Instead of passing vlan id parameter as __le16, pass the
  parameter as u16 and  use cpu_to_le16 at appropriate places.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 16:02:38 -04:00
David S. Miller
42bbcb7803 Merge branch 'netlink-mmap'
Patrick McHardy says:

====================
The following patches contain an implementation of memory mapped I/O for
netlink. The implementation is modelled after AF_PACKET memory mapped I/O
with a few differences:

- In order to perform memory mapped I/O to userspace, the kernel allocates
  skbs with the data area pointing to the data area of the mapped frames.
  All netlink subsystems assume a linear data area, so for the sake of
  simplicity, the mapped data area is not attached to the paged area but
  to skb->data. This requires introduction of a special skb alloction
  function that just allocates an skb head without the data area. Since this
  is a quite rare use case, I introduced a new function based on __alloc_skb
  instead of splitting it up into head and data alloction. The alternative
  would be to   introduce an __alloc_skb_head and __alloc_skb_data function,
  which would actually be useful for a specific error case in memory mapped
  netlink, but would require a couple of extra instructions for the common
  skb allocation case, so it doesn't really seem worth it.

  In order to get the destination memory area for skb->data before message
  construction, memory mapped netlink I/O needs to look up the destination
  socket during allocation instead of during transmission because the
  ring is owned by the receiveing socket/process. A special skb allocation
  function (netlink_alloc_skb) taking the destination pid as an argument is
  used for this, all subsystems that want to support memory mapped I/O need
  to use this function, automatic fallback to the receive queue happens
  for unconverted subsystems. Dumps automatically use memory mapped I/O if
  the receiving socket has enabled it.

  The visible effect of looking up the destination socket during allocation
  instead of transmission is that message ordering in userspace might
  change in case allocation and transmission aren't performed atomically.
  This usually doesn't matter since most subsystems have a BKL-like lock
  like the rtnl mutex, to my knowledge the currently only existing case
  where it might matter is nfnetlink_queue combined with the recently
  introduced batched verdicts, but a) that subsystem already includes
  sequence numbers which allow userspace to reorder messages in case it
  cares to, also the reodering window is quite small and b) with memory
  mapped transmission batching can be performed in a subsystem indepandant
  manner.

- AF_NETLINK contains flow control for database dumps, with regular I/O
  dump continuation are triggered based on the sockets receive queue space
  and by recvmsg() calls. Since with memory mapped I/O there are no
  recvmsg() calls under normal operation, this is done in netlink_poll(),
  under the assumption that userspace has processed all pending frames
  before invoking poll(), thus the ring is expected to have room for new
  messages. Dumps currently don't benefit as much as they could from
  memory mapped I/O because each single continuation requires a poll()
  call. A more agressive approach seems like a good idea to me, especially
  in case the socket is not subscribed to any multicast groups (IOW only
  receiving explicitly requested data).

Besides that, the memory mapped netlink implementation extends the states
defined by AF_PACKET between userspace and the kernel by a SKIP status, this
is intended for the case that userspace wants to queue frames (specifically
when using nfnetlink_queue, an IDS and stream reassembly, requested by
Eric Leblond) for a longer period of time. The kernel skips over all frames
marked with SKIP when looking or unused frames and only fails when not finding
a free frame or when having skipped the entire ring.

Also noteworthy is memory mapped sendmsg: the kernel performs validation
of messages before accepting and processing them, in order to prevent
userspace from changing the messages contents after validation, the
kernel checks that the ring is only mapped once and the file descriptor
is not shared (in order to avoid having userspace set up another mapping
after the first mentioned check). If either of both is not true, the
message copied to an allocated skb and processed as with regular I/O.
I'd especially appreciate review of this part since I'm not really versed
in memory, file and process management,

The remaining interesting details are included in the changelogs of the
individual patches and the documentation, so I won't repeat them here.

As an example, nfnetlink_queue is convererted to support memory mapped
I/O. Other subsystems that would probably benefit are nfnetlink_log,
audit and maybe ISCSI, not sure.

Following are some numbers collected by Florian Westphal based on a
slightly older version, which included an experimental patch for the
nfnetlink_queue ordering issue.

===

Test hardware is a 12-core machine
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
ixgbe interfaces are used (i.e., multiqueue nics).
irqs are distributed across the cpus.

I've made several tests.

The simple one consists of 3GBit UDP traffic, packets are 1500 bytes
in size (i.e., no fragmentation), with a single nfqueue
and the test client programs in libmnl examples directory.
Packets are sent from one /24 net to another /24 net, i.e.
there are a few hundred flows active at any given time.

I've also tested with snort, but I disabled all rules.
6Gbit UDP traffic is generated in the snort case, and
6 nfqueues are used (i.e., 6 snorts run in parallel).

I've tested with 3 different kernels, all based on 3.7.1.
- 3.7.1, without the mmap patches
- 3.7.1, with Patricks mmap patches
- 3.7.1, with mmap patches and extended spinlock to ensure packet ids are
  monotonically increasing and cannot be re-ordered.  This is what we
  currently ship in our product.

  [ the spinlock that is extended is the per nfqueue spinlock, it will
    be held from the time the netlink skb is allocated until the netlink
    skb is sent to userspace:

    http://1984.lsi.us.es/git/nf-next/commit/?h=mmap-netlink3&id=b8eb19c46650fef4e9e4fe53f367f99bbf72afc9
  ]

snort is normally used in "batch mode", i.e., after processing 25 packets
a single "batch verdict" is sent to accept the packets seen so far.
"mmap snort" means RX_RING + sendmsg(), i.e. TX_RING is not used at this
time (except where noted below).

One reason is that snort has a reload thread, so kernel needs to copy;
also in the snort case no payload rewrite takes place, so compared
to the rx path the tx path is cheap.

Results:

3.7.1, without mmap patches, i.e. recv()+sendmsg() for everyone
nfq-queue:           1.7 gbit out
snort-recv-batch-25  5.1 gbit out
snort-recv-no-batch  3.1 gbit out

3.7.1 + mmap + without extended spinlocked section
nfq-queue:           1.7 gbit out (recv/sendmsg)
nfq-queue-mmap:      2.4 gbit out
snort-mmap-batch-25	 5.6 gbit out  (warning: since ids can be
                                        re-ordered, this version is "broken").
snort-recv-batch-25	 5.1 gbit out
snort-mmap-no-batch	 4.6 gbit out (i.e., one verdict per packet)

Kernel 3.7.1 + mmap + extended spinlock section:
nfq-queue:	1.4 gbit out
nfq-queue-mmap: 2.3 gbit out
snort:          5.6 gbit out

Conclusions:
- The "extended spinlocked section" hurts performance in the
  single queue case; with 6 snorts there is no measureable slowdown.
- I tried to re-write the mmap-snort to work without batch verdicts, but
  results were not very encouraging:

kernel 3.7.1 + mmap (without extended spinlocked section):

snort-mmap-batch-25      5.6 gbit out (what we currenlty ship)
snort-recv-batch-25      5.1 gbit out (without using mmap)
snort-mmap-batch-1       4.6 gbit out (with mmap but without batch verdicts)
snort-mmap-txring-25     5.2 gbit out (with mmap but without batch verdicts)
snort-mmap-txring-1      4.6 gbit out (with mmap but without batch verdicts)

The difference between the last two is that in the txring-25 case, we
put a verdict into the tx ring after every packet, but will only
invoke sendmsg(, NULL, 0) after processing 25 packets.  So the only
difference is the number of sendmsg calls/context switches.

So, i.o.w, kernel 3.7.1 + mmap + the extra locking crap is faster
than 3.7.1 + mmap-without-extra-locking and single-verdict-per packet.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 15:37:09 -04:00
Patrick McHardy
3ab1f683bf nfnetlink: add support for memory mapped netlink
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:58:36 -04:00
Patrick McHardy
ec464e5dc5 netfilter: rename netlink related "pid" variables to "portid"
Get rid of the confusing mix of pid and portid and use portid consistently
for all netlink related socket identities.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:58:36 -04:00
Patrick McHardy
5683264c39 netlink: add documentation for memory mapped I/O
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:58:36 -04:00
Patrick McHardy
4ae9fbee16 netlink: add RX/TX-ring support to netlink diag
Based on AF_PACKET.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-19 14:57:58 -04:00