Commit graph

27266 commits

Author SHA1 Message Date
Eric Dumazet
f4541d60a4 tcp: preserve ACK clocking in TSO
A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.

Current code leads to following bad bursty behavior :

20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

While the expected behavior is more like :

20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:34:03 -04:00
Johannes Berg
8b305780ed mac80211: fix virtual monitor interface locking
The virtual monitor interface has a locking issue, it calls
into the channel context code with the iflist mutex held
which isn't allowed since it is usually acquired the other
way around. The mutex is still required for the interface
iteration, but need not be held across the channel calls.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-20 22:26:35 +01:00
Johannes Berg
ce1eadda6b cfg80211: fix wdev tracing crash
Arend reported a crash in tracing if the driver returns an
ERR_PTR() value from the add_virtual_intf() callback. This
is due to the tracing then still attempting to dereference
the "pointer", fix this by using IS_ERR_OR_NULL().

Reported-by: Arend van Spriel <arend@broadcom.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-20 22:21:31 +01:00
John W. Linville
b9d5319041 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-20 14:26:37 -04:00
Kees Cook
896ee0eee6 net/irda: add missing error path release_sock call
This makes sure that release_sock is called for all error conditions in
irda_getsockopt.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:23:13 -04:00
Martin Fuzzey
283951f95b ipconfig: Fix newline handling in log message.
When using ipconfig the logs currently look like:

Single name server:
[    3.467270] IP-Config: Complete:
[    3.470613]      device=eth0, hwaddr=ac🇩🇪48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
[    3.480670]      host=infigo-1, domain=, nis-domain=(none)
[    3.486166]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
[    3.492910]      nameserver0=172.16.42.1[    3.496853] ALSA device list:

Three name servers:
[    3.496949] IP-Config: Complete:
[    3.500293]      device=eth0, hwaddr=ac🇩🇪48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
[    3.510367]      host=infigo-1, domain=, nis-domain=(none)
[    3.515864]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
[    3.522635]      nameserver0=172.16.42.1, nameserver1=172.16.42.100
[    3.529149] , nameserver2=172.16.42.200

Fix newline handling for these cases

Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:15:58 -04:00
Daniel Borkmann
8ed781668d flow_keys: include thoff into flow_keys for later usage
In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:14:36 -04:00
Tom Parkin
f6e16b299b l2tp: unhash l2tp sessions on delete, not on free
If we postpone unhashing of l2tp sessions until the structure is freed, we
risk:

 1. further packets arriving and getting queued while the pseudowire is being
    closed down
 2. the recv path hitting "scheduling while atomic" errors in the case that
    recv drops the last reference to a session and calls l2tp_session_free
    while in atomic context

As such, l2tp sessions should be unhashed from l2tp_core data structures early
in the teardown process prior to calling pseudowire close.  For pseudowires
like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
7b7c0719cd l2tp: avoid deadlock in l2tp stats update
l2tp's u64_stats writers were incorrectly synchronised, making it possible to
deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp
code netlink commands while passing data through l2tp sessions.

Previous discussion on netdev determined that alternative solutions such as
spinlock writer synchronisation or per-cpu data would bring unjustified
overhead, given that most users interested in high volume traffic will likely
be running 64bit kernels on 64bit hardware.

As such, this patch replaces l2tp's use of u64_stats with atomic_long_t,
thereby avoiding the deadlock.

Ref:
http://marc.info/?l=linux-netdev&m=134029167910731&w=2
http://marc.info/?l=linux-netdev&m=134079868111131&w=2

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
cf2f5c886a l2tp: push all ppp pseudowire shutdown through .release handler
If userspace deletes a ppp pseudowire using the netlink API, either by
directly deleting the session or by deleting the tunnel that contains the
session, we need to tear down the corresponding pppox channel.

Rather than trying to manage two pppox unbind codepaths, switch the netlink
and l2tp_core session_close handlers to close via. the l2tp_ppp socket
.release handler.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
4c6e2fd354 l2tp: purge session reorder queue on delete
Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall
and l2tp_session_delete.  Pseudowire implementations which are deleted only
via. l2tp_core l2tp_session_delete calls can dispense with their own code for
flushing the reorder queue.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
48f72f92b3 l2tp: add session reorder queue purge function to core
If an l2tp session is deleted, it is necessary to delete skbs in-flight
on the session's reorder queue before taking it down.

Rather than having each pseudowire implementation reaching into the
l2tp_session struct to handle this itself, provide a function in l2tp_core to
purge the session queue.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
02d13ed5f9 l2tp: don't BUG_ON sk_socket being NULL
It is valid for an existing struct sock object to have a NULL sk_socket
pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
8abbbe8ff5 l2tp: take a reference for kernel sockets in l2tp_tunnel_sock_lookup
When looking up the tunnel socket in struct l2tp_tunnel, hold a reference
whether the socket was created by the kernel or by userspace.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
2b551c6e7d l2tp: close sessions before initiating tunnel delete
When a user deletes a tunnel using netlink, all the sessions in the tunnel
should also be deleted.  Since running sessions will pin the tunnel socket
with the references they hold, have the l2tp_tunnel_delete close all sessions
in a tunnel before finally closing the tunnel socket.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
936063175a l2tp: close sessions in ip socket destroy callback
l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel
socket being closed from userspace.  We need to do the same thing for
IP-encapsulation sockets.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
e34f4c7050 l2tp: export l2tp_tunnel_closeall
l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a
tunnel when a UDP-encapsulation socket is destroyed.  We need to do something
similar for IP-encapsulation sockets.

Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to
call it from their .destroy handlers.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
9980d001ce l2tp: add udp encap socket destroy handler
L2TP sessions hold a reference to the tunnel socket to prevent it going away
while sessions are still active.  However, since tunnel destruction is handled
by the sock sk_destruct callback there is a catch-22: a tunnel with sessions
cannot be deleted since each session holds a reference to the tunnel socket.
If userspace closes a managed tunnel socket, or dies, the tunnel will persist
and it will be neccessary to individually delete the sessions using netlink
commands.  This is ugly.

To prevent this occuring, this patch leverages the udp encapsulation socket
destroy callback to gain early notification when the tunnel socket is closed.
This allows us to safely close the sessions running in the tunnel, dropping
the tunnel socket references in the process.  The tunnel socket is then
destroyed as normal, and the tunnel resources deallocated in sk_destruct.

While we're at it, ensure that l2tp_tunnel_closeall correctly drops session
references to allow the sessions to be deleted rather than leaking.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
44046a593e udp: add encap_destroy callback
Users of udp encapsulation currently have an encap_rcv callback which they can
use to hook into the udp receive path.

In situations where a encapsulation user allocates resources associated with a
udp encap socket, it may be convenient to be able to also hook the proto
.destroy operation.  For example, if an encap user holds a reference to the
udp socket, the destroy hook might be used to relinquish this reference.

This patch adds a socket destroy hook into udp, which is set and enabled
in the same way as the existing encap_rcv hook.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Masatake YAMATO
f1e79e2080 genetlink: trigger BUG_ON if a group name is too long
Trigger BUG_ON if a group name is longer than GENL_NAMSIZ.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:05:51 -04:00
Thierry Escande
b315515544 NFC: llcp: Remove possible double call to kfree_skb
kfree_skb was called twice when the socket receive queue is full

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-20 16:46:40 +01:00
David S. Miller
90b2621fd4 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are:

* Restrict IPv6 stateless NPT targets to the mangle table. Many users are
  complaining that this target does not work in the nat table, which is the
  wrong table for it, from Florian Westphal.

* Fix possible use before initialization in the netns init path of several
  conntrack protocol trackers (introduced recently while improving conntrack
  netns support), from Gao Feng.

* Fix incorrect initialization of copy_range in nfnetlink_queue, spotted
  by Eric Dumazet during the NFWS2013, patch from myself.

* Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov.

* Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu
  not required anymore after change introduced in 3.7, again from Julian.

* Fix SYN looping in IPVS state sync if the backup is used a real server
  in DR/TUN modes, this required a new /proc entry to disable the director
  function when acting as backup, also from Julian.

* Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by
  Paul Bolle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 10:23:52 -04:00
Steffen Klassert
0017c0b575 xfrm: Fix replay notification for esn.
We may miscalculate the sequence number difference from the
last time we send a notification if a sequence number wrap
occured in the meantime. We fix this by adding a separate
replay notify function for esn. Here we take the high bits
of the sequence number into account to calculate the
difference.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-03-20 11:57:52 +01:00
Samuel Ortiz
bec964ed3b NFC: llcp: Detach socket from process context only when releasing the socket
Calling sock_orphan when e.g. the NFC adapter is removed can lead to
kernel crashes when e.g. a connection less client is sleeping on the
Rx workqueue, waiting for data to show up.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-20 11:30:37 +01:00
Paul Bolle
3dd6664fac netfilter: remove unused "config IP_NF_QUEUE"
Kconfig symbol IP_NF_QUEUE is unused since commit
d16cf20e2f ("netfilter: remove ip_queue
support"). Let's remove it too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-20 00:11:43 +01:00
Linus Torvalds
7b1b3fd74e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix ARM BPF JIT handling of negative 'k' values, from Chen Gang.

 2) Insufficient space reserved for bridge netlink values, fix from
    Stephen Hemminger.

 3) Some dst_neigh_lookup*() callers don't interpret error pointer
    correctly, fix from Zhouyi Zhou.

 4) Fix transport match in SCTP active_path loops, from Xugeng Zhang.

 5) Fix qeth driver handling of multi-order SKB frags, from Frank
    Blaschka.

 6) fec driver is missing napi_disable() call, resulting in crashes on
    unload, from Georg Hofmann.

 7) Don't try to handle PMTU events on a listening socket, fix from Eric
    Dumazet.

 8) Fix timestamp location calculations in IP option processing, from
    David Ward.

 9) FIB_TABLE_HASHSZ setting is not controlled by the correct kconfig
    tests, from Denis V Lunev.

10) Fix TX descriptor push handling in SFC driver, from Ben Hutchings.

11) Fix isdn/hisax and tulip/de4x5 kconfig dependencies, from Arnd
    Bergmann.

12) bnx2x statistics don't handle 4GB rollover correctly, fix from
    Maciej Żenczykowski.

13) Openvswitch bug fixes for vport del/new error reporting, missing
    genlmsg_end() call in netlink processing, and mis-parsing of
    LLC/SNAP ethernet types.  From Rich Lane.

14) SKB pfmemalloc state should only be propagated from the head page of
    a compound page, fix from Pavel Emelyanov.

15) Fix link handling in tg3 driver for 5715 chips when autonegotation
    is disabled.  From Nithin Sujir.

16) Fix inverted test of cpdma_check_free_tx_desc return value in
    davinci_emac driver, from Mugunthan V N.

17) vlan_depth is incorrectly calculated in skb_network_protocol(), from
    Li RongQing.

18) Fix probing of Gobi 1K devices in qmi_wwan driver, and fix NCM
    device mode backwards compat in cdc_ncm driver.  From Bjørn Mork.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  inet: limit length of fragment queue hash table bucket lists
  qeth: Fix scatter-gather regression
  qeth: Fix invalid router settings handling
  qeth: delay feature trace
  tcp: dont handle MTU reduction on LISTEN socket
  bnx2x: fix occasional statistics off-by-4GB error
  vhost/net: fix heads usage of ubuf_info
  bridge: Add support for setting BR_ROOT_BLOCK flag.
  bnx2x: add missing napi deletion in error path
  drivers: net: ethernet: ti: davinci_emac: fix usage of cpdma_check_free_tx_desc()
  ethernet/tulip: DE4x5 needs VIRT_TO_BUS
  isdn: hisax: netjet requires VIRT_TO_BUS
  net: cdc_ncm, cdc_mbim: allow user to prefer NCM for backwards compatibility
  rtnetlink: Mask the rta_type when range checking
  Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally"
  Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug
  smsc75xx: configuration help incorrectly mentions smsc95xx
  net: fec: fix missing napi_disable call
  net: fec: restart the FEC when PHY speed changes
  skb: Propagate pfmemalloc on skb from head page only
  ...
2013-03-19 13:20:51 -07:00
Hannes Frederic Sowa
5a3da1fe95 inet: limit length of fragment queue hash table bucket lists
This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 10:28:36 -04:00
Julian Anastasov
bf93ad72cd ipvs: remove extra rcu lock
In 3.7 we added code that uses ipv4_update_pmtu but after commit
c5ae7d4192 (ipv4: must use rcu protection while calling fib_lookup)
the RCU lock is not needed.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 21:21:52 +09:00
Julian Anastasov
0c12582fbc ipvs: add backup_only flag to avoid loops
Dmitry Akindinov is reporting for a problem where SYNs are looping
between the master and backup server when the backup server is used as
real server in DR mode and has IPVS rules to function as director.

Even when the backup function is enabled we continue to forward
traffic and schedule new connections when the current master is using
the backup server as real server. While this is not a problem for NAT,
for DR and TUN method the backup server can not determine if a request
comes from client or from director.

To avoid such loops add new sysctl flag backup_only. It can be needed
for DR/TUN setups that do not need backup and director function at the
same time. When the backup function is enabled we stop any forwarding
and pass the traffic to the local stack (real server mode). The flag
disables the director function when the backup function is enabled.

For setups that enable backup function for some virtual services and
director function for other virtual services there should be another
more complex solution to support DR/TUN mode, may be to assign
per-virtual service syncid value, so that we can differentiate the
requests.

Reported-by: Dmitry Akindinov <dimak@stalker.com>
Tested-by: German Myzovsky <lawyer@sipnet.ru>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 21:21:51 +09:00
Julian Anastasov
cf2e39429c ipvs: fix sctp chunk length order
Fix wrong but non-fatal access to chunk length.
sch->length should be in network order, next chunk should
be aligned to 4 bytes. Problem noticed in sparse output.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 09:37:27 +09:00
John W. Linville
8fa48cbdfb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2013-03-18 15:17:11 -04:00
Eric Dumazet
0d4f060861 tcp: dont handle MTU reduction on LISTEN socket
When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a
LISTEN socket, and this socket is currently owned by the user, we
set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags.

This is bad because if we clone the parent before it had a chance to
clear the flag, the child inherits the tsq_flags value, and next
tcp_release_cb() on the child will decrement sk_refcnt.

Result is that we might free a live TCP socket, as reported by
Dormando.

IPv4: Attempt to release TCP socket in state 1

Fix this issue by testing sk_state against TCP_LISTEN early, so that we
set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one)

This bug was introduced in commit 563d34d057
(tcp: dont drop MTU reduction indications)

Reported-by: dormando <dormando@rydia.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-18 13:31:28 -04:00
Eric W. Biederman
92f28d973c scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
Don't allow spoofing pids over unix domain sockets in the corner
cases where a user has created a user namespace but has not yet
created a pid namespace.

Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-17 17:16:16 -07:00
Vlad Yasevich
3d84fa98ac bridge: Add support for setting BR_ROOT_BLOCK flag.
Most of the support was already there.  The only thing that was missing
was the call to set the flag.  Add this call.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17 12:41:29 -04:00
David S. Miller
c62dce6126 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
On the NFC bits, Samuel says:

"With this one we have:

- A fix for properly decreasing socket ack log.
- A timer and works cleanup upon NFC device removal.
- A monitoroing socket cleanup round from llcp_socket_release.
- A proper error report to pending sockets upon NFC device removal."

Regarding the Bluetooth bits, Gustavo says:

"I have these two patches for 3.9, these add support for two more devices to
the bluetooth drivers."

Along with those, we have a few wireless driver fixes...

Bing Zhao provides an mwifiex to prevent an out-of-bounds memory
access.

John Crispin offers a Kconfig fix to enable some otherwise dead code
in rt2x00.  The correct symbols were added in -rc1 through a different
tree, but the symbols for enabling the wireless driver didn't match.

Larry Finger brings an rtlwifi fix for a scheduling while atomic bug,
and another fix for a reassociation problem caused by failing to
clear the BSSID after a disconnect.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17 12:26:18 -04:00
Vlad Yasevich
a5b8db9144 rtnetlink: Mask the rta_type when range checking
Range/validity checks on rta_type in rtnetlink_rcv_msg() do
not account for flags that may be set.  This causes the function
to return -EINVAL when flags are set on the type (for example
NLA_F_NESTED).

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-17 11:43:40 -04:00
Timo Teräs
8c6216d7f1 Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally"
This reverts commit 412ed94744.

The commit is wrong as tiph points to the outer IPv4 header which is
installed at ipgre_header() and not the inner one which is protocol dependant.

This commit broke succesfully opennhrp which use PF_PACKET socket with
ETH_P_NHRP protocol. Additionally ssl_addr is set to the link-layer
IPv4 address. This address is written by ipgre_header() to the skb
earlier, and this is the IPv4 header tiph should point to - regardless
of the inner protocol payload.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-16 23:00:41 -04:00
John W. Linville
f3a3440063 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-15 10:44:36 -04:00
David S. Miller
296b60109e Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
A few different bug fixes, including several for issues with userspace
communication that have gone unnoticed up until now.  These are intended
for net/3.9.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15 09:00:39 -04:00
Florian Westphal
a82783c91d netfilter: ip6t_NPT: restrict to mangle table
As the translation is stateless, using it in nat table
doesn't work (only initial packet is translated).
filter table OUTPUT works but won't re-route the packet after translation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 12:58:21 +01:00
Pablo Neira Ayuso
bae99f7a1d netfilter: nfnetlink_queue: fix incorrect initialization of copy range field
2^16 = 0xffff, not 0xfffff (note the extra 'f'). Not dangerous since you
adjust it to min_t(data_len, skb->len) just after on.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 12:35:49 +01:00
Gao feng
0d98da5d84 netfilter: nf_conntrack: register pernet subsystem before register L4 proto
In (c296bb4 netfilter: nf_conntrack: refactor l4proto support for netns)
the l4proto gre/dccp/udplite/sctp registration happened before the pernet
subsystem, which is wrong.

Register pernet subsystem before register L4proto since after register
L4proto, init_conntrack may try to access the resources which allocated
in register_pernet_subsys.

Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 12:29:25 +01:00
Vinicius Costa Gomes
eb20ff9c91 Bluetooth: Fix not closing SCO sockets in the BT_CONNECT2 state
With deferred setup for SCO, it is possible that userspace closes the
socket when it is in the BT_CONNECT2 state, after the Connect Request is
received but before the Accept Synchonous Connection is sent.

If this happens the following crash was observed, when the connection is
terminated:

[  +0.000003] hci_sync_conn_complete_evt: hci0 status 0x10
[  +0.000005] sco_connect_cfm: hcon ffff88003d1bd800 bdaddr 40:98:4e:32:d7:39 status 16
[  +0.000003] sco_conn_del: hcon ffff88003d1bd800 conn ffff88003cc8e300, err 110
[  +0.000015] BUG: unable to handle kernel NULL pointer dereference at 0000000000000199
[  +0.000906] IP: [<ffffffff810620dd>] __lock_acquire+0xed/0xe82
[  +0.000000] PGD 3d21f067 PUD 3d291067 PMD 0
[  +0.000000] Oops: 0002 [#1] SMP
[  +0.000000] Modules linked in: rfcomm bnep btusb bluetooth
[  +0.000000] CPU 0
[  +0.000000] Pid: 1481, comm: kworker/u:2H Not tainted 3.9.0-rc1-25019-gad82cdd #1 Bochs Bochs
[  +0.000000] RIP: 0010:[<ffffffff810620dd>]  [<ffffffff810620dd>] __lock_acquire+0xed/0xe82
[  +0.000000] RSP: 0018:ffff88003c3c19d8  EFLAGS: 00010002
[  +0.000000] RAX: 0000000000000001 RBX: 0000000000000246 RCX: 0000000000000000
[  +0.000000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003d1be868
[  +0.000000] RBP: ffff88003c3c1a98 R08: 0000000000000002 R09: 0000000000000000
[  +0.000000] R10: ffff88003d1be868 R11: ffff88003e20b000 R12: 0000000000000002
[  +0.000000] R13: ffff88003aaa8000 R14: 000000000000006e R15: ffff88003d1be850
[  +0.000000] FS:  0000000000000000(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000
[  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  +0.000000] CR2: 0000000000000199 CR3: 000000003c1cb000 CR4: 00000000000006b0
[  +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  +0.000000] Process kworker/u:2H (pid: 1481, threadinfo ffff88003c3c0000, task ffff88003aaa8000)
[  +0.000000] Stack:
[  +0.000000]  ffffffff81b16342 0000000000000000 0000000000000000 ffff88003d1be868
[  +0.000000]  ffffffff00000000 00018c0c7863e367 000000003c3c1a28 ffffffff8101efbd
[  +0.000000]  0000000000000000 ffff88003e3d2400 ffff88003c3c1a38 ffffffff81007c7a
[  +0.000000] Call Trace:
[  +0.000000]  [<ffffffff8101efbd>] ? kvm_clock_read+0x34/0x3b
[  +0.000000]  [<ffffffff81007c7a>] ? paravirt_sched_clock+0x9/0xd
[  +0.000000]  [<ffffffff81007fd4>] ? sched_clock+0x9/0xb
[  +0.000000]  [<ffffffff8104fd7a>] ? sched_clock_local+0x12/0x75
[  +0.000000]  [<ffffffff810632d1>] lock_acquire+0x93/0xb1
[  +0.000000]  [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth]
[  +0.000000]  [<ffffffff8105f3d8>] ? lock_release_holdtime.part.22+0x4e/0x55
[  +0.000000]  [<ffffffff814f6038>] _raw_spin_lock+0x40/0x74
[  +0.000000]  [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth]
[  +0.000000]  [<ffffffff814f6936>] ? _raw_spin_unlock+0x23/0x36
[  +0.000000]  [<ffffffffa0022339>] spin_lock+0x9/0xb [bluetooth]
[  +0.000000]  [<ffffffffa00230cc>] sco_conn_del+0x76/0xbb [bluetooth]
[  +0.000000]  [<ffffffffa002391d>] sco_connect_cfm+0x2da/0x2e9 [bluetooth]
[  +0.000000]  [<ffffffffa000862a>] hci_proto_connect_cfm+0x38/0x65 [bluetooth]
[  +0.000000]  [<ffffffffa0008d30>] hci_sync_conn_complete_evt.isra.79+0x11a/0x13e [bluetooth]
[  +0.000000]  [<ffffffffa000cd96>] hci_event_packet+0x153b/0x239d [bluetooth]
[  +0.000000]  [<ffffffff814f68ff>] ? _raw_spin_unlock_irqrestore+0x48/0x5c
[  +0.000000]  [<ffffffffa00025f6>] hci_rx_work+0xf3/0x2e3 [bluetooth]
[  +0.000000]  [<ffffffff8103efed>] process_one_work+0x1dc/0x30b
[  +0.000000]  [<ffffffff8103ef83>] ? process_one_work+0x172/0x30b
[  +0.000000]  [<ffffffff8103e07f>] ? spin_lock_irq+0x9/0xb
[  +0.000000]  [<ffffffff8103fc8d>] worker_thread+0x123/0x1d2
[  +0.000000]  [<ffffffff8103fb6a>] ? manage_workers+0x240/0x240
[  +0.000000]  [<ffffffff81044211>] kthread+0x9d/0xa5
[  +0.000000]  [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60
[  +0.000000]  [<ffffffff814f75bc>] ret_from_fork+0x7c/0xb0
[  +0.000000]  [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60
[  +0.000000] Code: d7 44 89 8d 50 ff ff ff 4c 89 95 58 ff ff ff e8 44 fc ff ff 44 8b 8d 50 ff ff ff 48 85 c0 4c 8b 95 58 ff ff ff 0f 84 7a 04 00 00 <f0> ff 80 98 01 00 00 83 3d 25 41 a7 00 00 45 8b b5 e8 05 00 00
[  +0.000000] RIP  [<ffffffff810620dd>] __lock_acquire+0xed/0xe82
[  +0.000000]  RSP <ffff88003c3c19d8>
[  +0.000000] CR2: 0000000000000199
[  +0.000000] ---[ end trace e73cd3b52352dd34 ]---

Cc: stable@vger.kernel.org [3.8]
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Tested-by: Frederic Dalleau <frederic.dalleau@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-14 13:14:21 -03:00
Eric Dumazet
16fad69cfe tcp: fix skb_availroom()
Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack :

https://code.google.com/p/chromium/issues/detail?id=182056

commit a21d45726a (tcp: avoid order-1 allocations on wifi and tx
path) did a poor choice adding an 'avail_size' field to skb, while
what we really needed was a 'reserved_tailroom' one.

It would have avoided commit 22b4a4f22d (tcp: fix retransmit of
partially acked frames) and this commit.

Crash occurs because skb_split() is not aware of the 'avail_size'
management (and should not be aware)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mukesh Agrawal <quiche@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-14 11:49:45 -04:00
Linus Torvalds
aea8b5d1e5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace bugfixes from Eric Biederman:
 "This tree includes a partial revert for "fs: Limit sys_mount to only
  request filesystem modules." When I added the new style module aliases
  to the filesystems I deleted the old ones.  A bad move.  It turns out
  that distributions like Arch linux use module aliases when
  constructing ramdisks.  Which meant ultimately that an ext3 filesystem
  mounted with ext4 would not result in the ext4 module being put into
  the ramdisk.

  The other change in this tree adds a handful of filesystem module
  alias I simply failed to add the first time.  Which inconvinienced a
  few folks using cifs.

  I don't want to inconvinience folks any longer than I have to so here
  are these trivial fixes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  fs: Readd the fs module aliases.
  fs: Limit sys_mount to only request filesystem modules. (Part 3)
2013-03-13 15:47:50 -07:00
Xufeng Zhang
2317f449af sctp: don't break the loop while meeting the active_path so as to find the matched transport
sctp_assoc_lookup_tsn() function searchs which transport a certain TSN
was sent on, if not found in the active_path transport, then go search
all the other transports in the peer's transport_addr_list, however, we
should continue to the next entry rather than break the loop when meet
the active_path transport.

Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-13 10:09:55 -04:00
Vlad Yasevich
f281563350 sctp: Use correct sideffect command in duplicate cookie handling
When SCTP is done processing a duplicate cookie chunk, it tries
to delete a newly created association.  For that, it has to set
the right association for the side-effect processing to work.
However, when it uses the SCTP_CMD_NEW_ASOC command, that performs
more work then really needed (like hashing the associationa and
assigning it an id) and there is no point to do that only to
delete the association as a next step.  In fact, it also creates
an impossible condition where an association may be found by
the getsockopt() call, and that association is empty.  This
causes a crash in some sctp getsockopts.

The solution is rather simple.  We simply use SCTP_CMD_SET_ASOC
command that doesn't have all the overhead and does exactly
what we need.

Reported-by: Karl Heiss <kheiss@gmail.com>
Tested-by: Karl Heiss <kheiss@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-13 09:59:21 -04:00
Eric W. Biederman
fa7614ddd6 fs: Readd the fs module aliases.
I had assumed that the only use of module aliases for filesystems
prior to "fs: Limit sys_mount to only request filesystem modules."
was in request_module.  It turns out I was wrong.  At least mkinitcpio
in Arch linux uses these aliases.

So readd the preexising aliases, to keep from breaking userspace.

Userspace eventually will have to follow and use the same aliases the
kernel does.  So at some point we may be delete these aliases without
problems.  However that day is not today.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-12 18:55:21 -07:00
Linus Torvalds
368edaadc0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
 "This fixes a bug in the new message decoding that just went in during
  the last window."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix decoding of pgids
2013-03-12 09:22:42 -07:00
Linus Torvalds
5b22b1848b Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
 "Some minor fallout from the user-namespace work broke most krb5 mounts
  to nfsd, and I screwed up a change to the AF_LOCAL rpc code."

* 'for-3.9' of git://linux-nfs.org/~bfields/linux:
  sunrpc: don't attempt to cancel unitialized work
  nfsd: fix krb5 handling of anonymous principals
2013-03-12 09:20:58 -07:00
Li RongQing
c80a8512ee net/core: move vlan_depth out of while loop in skb_network_protocol()
[ Bug added added in commit 05e8ef4ab2 (net: factor out
  skb_mac_gso_segment() from skb_gso_segment() ) ]

move vlan_depth out of while loop, or else vlan_depth always is ETH_HLEN,
can not be increased, and lead to infinite loop when frame has two vlan headers.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 11:47:40 -04:00
stephen hemminger
3da889b616 bridge: reserve space for IFLA_BRPORT_FAST_LEAVE
The bridge multicast fast leave feature was added sufficient space
was not reserved in the netlink message. This means the flag may be
lost in netlink events and results of queries.

Found by observation while looking up some netlink stuff for discussion with Vlad.
Problem introduced by commit c2d3babfaf
Author: David S. Miller <davem@davemloft.net>
Date:   Wed Dec 5 16:24:45 2012 -0500

    bridge: implement multicast fast leave

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 05:38:29 -04:00
David S. Miller
2230e0c193 Included changes ares:
- fix packet parsing routine to avoid to read beyond the packet boundary
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJRPlQRAAoJEADl0hg6qKeOP44QAI20iYAaMmA40lrsELAAJMVJ
 G/hHyKYcup0nxrXnlj9yOft6bYx0232/TjhGKGQ7eYcl+ri+Wyu96kC1hJG9rr/Z
 +WCU4CimTY5MRVzFKwNriaiqyAsW2cw2T1k1KfZD9Wb9t6hEdvd8f+4DbXYrYxHG
 nSZQKKDD0cxs1ARScOEGbf7KF8sw6RcGWj0m4xM00Wo/fai+CZZX/HLcUnHQrQxx
 4w9safvaIVuQV3mANTpSoerfkraNzaX14i2ZU5SGi2/mhR9PC4JyGz5FIge+fuvp
 rP/E40GdCYpcuDL7UAyd+IBaOoiP6llDUJA/LqbZLyEZgkMtt8rgQwBsmcYDtiTt
 zmqCgwjp2/mTs44LfuxtxvLcIDRsQh52I0ceZaAzflG3m9t5eYs6L7oyEEUtOSCm
 wwY+RmBdMrArr8dohkxopjxAJtCLuHxC8e9AfXwzqt8FYZIQG/oayBrgtEoxCgzf
 PnJWX0uw4m6WisvMN5Ko8bNeacVRyceTqTOpIWxbdF0wku2evCbkxkK6PfvRDAca
 UKyrLfbDH59OObhq3fEov7wiNjLJo92bV6dLqTGgQp/GQBTswttb+9WwjT6+PE8b
 dKRM1eKlCBDMT5q4tGSzKvoH6cfC+h7GIPmpNDdcG+ByJI4bLyzDxLs6ZzYoyYP9
 plB7HO/1r1pnOE/vtqXE
 =X3VY
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included changes ares:
- fix packet parsing routine to avoid to read beyond the packet boundary

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 05:36:52 -04:00
David Ward
4660c7f498 net/ipv4: Ensure that location of timestamp option is stored
This is needed in order to detect if the timestamp option appears
more than once in a packet, to remove the option if the packet is
fragmented, etc. My previous change neglected to store the option
location when the router addresses were prespecified and Pointer >
Length. But now the option location is also stored when Flag is an
unrecognized value, to ensure these option handling behaviors are
still performed.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 05:35:39 -04:00
Marek Lindner
b47506d912 batman-adv: verify tt len does not exceed packet len
batadv_iv_ogm_process() accesses the packet using the tt_num_changes
attribute regardless of the real packet len (assuming the length check
was done before). Therefore a length check is needed to avoid reading
random memory.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-11 22:59:47 +01:00
Sage Weil
d6c0dd6b0c libceph: fix decoding of pgids
In 4f6a7e5ee1 we effectively dropped support
for the legacy encoding for the OSDMap and incremental.  However, we didn't
fix the decoding for the pgid.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-03-11 14:31:00 -07:00
Linus Torvalds
0cb7750825 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Missing cancel of work items in mac80211 MLME, from Ben Greear.

 2) Fix DMA mapping handling in iwlwifi by using coherent DMA for
    command headers, from Johannes Berg.

 3) Decrease the amount of pressure on the page allocator by using order
    1 pages less in iwlwifi, from Emmanuel Grumbach.

 4) Fix mesh PS broadcast OOPS in mac80211, from Marco Porsch.

 5) Don't forget to recalculate idle state in mac80211 monitor
    interface, from Felix Fietkau.

 6) Fix varargs in netfilter conntrack handler, from Joe Perches.

 7) Need to reset entire chip when command queue fills up in iwlwifi,
    from Emmanuel Grumbach.

 8) The TX antenna value must be valid when calibrations are performed
    in iwlwifi, fix from Dor Shaish.

 9) Don't generate netfilter audit log entries when audit is disabled,
    from Gao Feng.

10) Deal with DMA unit hang on e1000e during power state transitions,
    from Bruce Allan.

11) Remove BUILD_BUG_ON check from igb driver, from Alexander Duyck.

12) Fix lockdep warning on i2c handling of igb driver, from Carolyn
    Wyborny.

13) Fix several TTY handling issues in IRDA ircomm tty driver, from
    Peter Hurley.

14) Several QFQ packet scheduler fixes from Paolo Valente.

15) When VXLAN encapsulates on transmit, we have to reset the netfilter
    state.  From Zang MingJie.

16) Fix jiffie check in net_rx_action() so that we really cap the
    processing at 2HZ.  From Eric Dumazet.

17) Fix erroneous trigger of IP option space exhaustion, when routers
    are pre-specified and we are looking to see if we can insert a
    timestamp, we will have the space.  From David Ward.

18) Fix various issues in benet driver wrt waiting for firmware to
    finish POST after resets or errors.  From Gavin Shan and Sathya
    Perla.

19) Fix TX locking in SFC driver, from Ben Hutchings.

20) Like the VXLAN fix above, when we encap in a TUN device we have to
    reset the netfilter state.  This should fix several strange crashes
    reported by Dave Jones and others.  From Eric Dumazet.

21) Don't forget to clean up MAC address resources when shutting down a
    port in mlx4 driver, from Yan Burman.

22) Fix divide by zero in vmxnet3 driver, from Bhavesh Davda.

23) Fix device statistic regression in tg3 when the driver is using
    phylib, from Nithin Sujir.

24) Fix info leak in several netlink handlers, from Mathias Krause.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
  6lowpan: Fix endianness issue in is_addr_link_local().
  rrunner.c: fix possible memory leak in rr_init_one()
  dcbnl: fix various netlink info leaks
  rtnl: fix info leak on RTM_GETLINK request for VF devices
  bridge: fix mdb info leaks
  tg3: Update link_up flag for phylib devices
  ipv6: stop multicast forwarding to process interface scoped addresses
  bridging: fix rx_handlers return code
  netlabel: fix build problems when CONFIG_IPV6=n
  drivers/isdn: checkng length to be sure not memory overflow
  net/rds: zero last byte for strncpy
  bnx2x: Fix SFP+ misconfiguration in iSCSI boot scenario
  bnx2x: Fix intermittent long KR2 link up time
  macvlan: Set IFF_UNICAST_FLT flag to prevent unnecessary promisc mode.
  team: unsyc the devices addresses when port is removed
  bridge: add missing vid to br_mdb_get()
  Fix: sparse warning in inet_csk_prepare_forced_close
  afkey: fix a typo
  MAINTAINERS: Update qlcnic maintainers list
  netlabel: correctly list all the static label mappings
  ...
2013-03-11 07:51:59 -07:00
Johannes Berg
07e5a5f5ab mac80211: fix crash with P2P Device returning action frames
If a P2P Device interface receives an unhandled action
frame, we attempt to return it. This crashes because it
doesn't have a channel context. Fix the crash by using
status->band and properly mark the return frame as an
off-channel frame.

Reported-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-11 09:37:50 +02:00
YOSHIFUJI Hideaki / 吉藤英明
9026c49272 6lowpan: Fix endianness issue in is_addr_link_local().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 16:49:35 -04:00
Mathias Krause
29cd8ae0e1 dcbnl: fix various netlink info leaks
The dcb netlink interface leaks stack memory in various places:
* perm_addr[] buffer is only filled at max with 12 of the 32 bytes but
  copied completely,
* no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand,
  so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes
  for ieee_pfc structs, etc.,
* the same is true for CEE -- no in-kernel driver fills the whole
  struct,

Prevent all of the above stack info leaks by properly initializing the
buffers/structures involved.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 05:19:26 -04:00
Mathias Krause
84d73cd3fb rtnl: fix info leak on RTM_GETLINK request for VF devices
Initialize the mac address buffer with 0 as the driver specific function
will probably not fill the whole buffer. In fact, all in-kernel drivers
fill only ETH_ALEN of the MAX_ADDR_LEN bytes, i.e. 6 of the 32 possible
bytes. Therefore we currently leak 26 bytes of stack memory to userland
via the netlink interface.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 05:19:26 -04:00
Mathias Krause
c085c49920 bridge: fix mdb info leaks
The bridging code discloses heap and stack bytes via the RTM_GETMDB
netlink interface and via the notify messages send to group RTNLGRP_MDB
afer a successful add/del.

Fix both cases by initializing all unset members/padding bytes with
memset(0).

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 05:19:25 -04:00
Linus Torvalds
72932611b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace bugfixes from Eric Biederman:
 "This is three simple fixes against 3.9-rc1.  I have tested each of
  these fixes and verified they work correctly.

  The userns oops in key_change_session_keyring and the BUG_ON triggered
  by proc_ns_follow_link were found by Dave Jones.

  I am including the enhancement for mount to only trigger requests of
  filesystem modules here instead of delaying this for the 3.10 merge
  window because it is both trivial and the kind of change that tends to
  bit-rot if left untouched for two months."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Use nd_jump_link in proc_ns_follow_link
  fs: Limit sys_mount to only request filesystem modules (Part 2).
  fs: Limit sys_mount to only request filesystem modules.
  userns: Stop oopsing in key_change_session_keyring
2013-03-09 16:51:13 -08:00
J. Bruce Fields
190b1ecf25 sunrpc: don't attempt to cancel unitialized work
As of dc107402ae "SUNRPC: make AF_LOCAL connect synchronous", we no longer initialize connect_worker in the
AF_LOCAL case, resulting in warnings like:

    WARNING: at lib/debugobjects.c:261 debug_print_object+0x8c/0xb0() Hardware name: Bochs
    ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
    Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd sunrpc
    Pid: 4816, comm: nfsd Tainted: G        W    3.8.0-rc2-00049-gdc10740 #801
    Call Trace:
     [<ffffffff8156ec00>] ? free_obj_work+0x60/0xa0
     [<ffffffff81046aaf>] warn_slowpath_common+0x7f/0xc0
     [<ffffffff81046ba6>] warn_slowpath_fmt+0x46/0x50
     [<ffffffff8156eccc>] debug_print_object+0x8c/0xb0
     [<ffffffff81055030>] ? timer_debug_hint+0x10/0x10
     [<ffffffff8156f7e3>] debug_object_assert_init+0xe3/0x120
     [<ffffffff81057ebb>] del_timer+0x2b/0x80
     [<ffffffff8109c4e6>] ? mark_held_locks+0x86/0x110
     [<ffffffff81065a29>] try_to_grab_pending+0xd9/0x150
     [<ffffffff81065b57>] __cancel_work_timer+0x27/0xc0
     [<ffffffff81065c03>] cancel_delayed_work_sync+0x13/0x20
     [<ffffffffa0007067>] xs_destroy+0x27/0x80 [sunrpc]
     [<ffffffffa00040d8>] xprt_destroy+0x78/0xa0 [sunrpc]
     [<ffffffffa0006241>] xprt_put+0x21/0x30 [sunrpc]
     [<ffffffffa00030cf>] rpc_free_client+0x10f/0x1a0 [sunrpc]
     [<ffffffffa0002ff3>] ? rpc_free_client+0x33/0x1a0 [sunrpc]
     [<ffffffffa0002f7e>] rpc_release_client+0x6e/0xb0 [sunrpc]
     [<ffffffffa000325d>] rpc_shutdown_client+0xfd/0x1b0 [sunrpc]
     [<ffffffffa0017196>] rpcb_put_local+0x106/0x130 [sunrpc]
    ...

Acked-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-09 12:43:42 -05:00
Arnd Bergmann
dc893e19b5 Revert parts of "hlist: drop the node parameter from iterators"
Commit b67bfe0d42 ("hlist: drop the node parameter from iterators")
did a lot of nice changes but also contains two small hunks that seem to
have slipped in accidentally and have no apparent connection to the
intent of the patch.

This reverts the two extraneous changes.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Senna Tschudin <peter.senna@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-08 15:05:34 -08:00
Hannes Frederic Sowa
ddf64354af ipv6: stop multicast forwarding to process interface scoped addresses
v2:
a) used struct ipv6_addr_props

v3:
a) reverted changes for ipv6_addr_props

v4:
a) do not use __ipv6_addr_needs_scope_id

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:28:20 -05:00
Cristian Bercaru
3bc1b1add7 bridging: fix rx_handlers return code
The frames for which rx_handlers return RX_HANDLER_CONSUMED are no longer
counted as dropped. They are counted as successfully received by
'netif_receive_skb'.

This allows network interface drivers to correctly update their RX-OK and
RX-DRP counters based on the result of 'netif_receive_skb'.

Signed-off-by: Cristian Bercaru <B43982@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:19:59 -05:00
Samuel Ortiz
3bbc0ceb7a NFC: llcp: Report error to pending sockets when a device is removed
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-08 17:35:22 +01:00
Samuel Ortiz
e6a3a4bb85 NFC: llcp: Clean raw sockets from nfc_llcp_socket_release
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-08 17:34:57 +01:00
Paul Moore
a6a8fe950e netlabel: fix build problems when CONFIG_IPV6=n
My last patch to solve a problem where the static/fallback labels were
not fully displayed resulted in build problems when IPv6 was disabled.
This patch resolves the IPv6 build problems; sorry for the screw-up.

Please queue for -stable or simply merge with the previous patch.

Reported-by: Kbuild Test Robot <fengguang.wu@intel.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 11:33:51 -05:00
Samuel Ortiz
3536da06db NFC: llcp: Clean local timers and works when removing a device
Whenever an adapter is removed we must clean all the local structures,
especially the timers and scheduled work. Otherwise those asynchronous
threads will eventually try to access the freed nfc_dev pointer if an LLCP
link is up.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-08 14:25:04 +01:00
Samuel Ortiz
b141e811a0 NFC: llcp: Decrease socket ack log when accepting a connection
This is really difficult to test with real NFC devices, but without
this fix an LLCP server will eventually refuse new connections.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-08 14:25:04 +01:00
Chen Gang
2e85d67690 net/rds: zero last byte for strncpy
for NUL terminated string, need be always sure '\0' in the end.

additional info:
  strncpy will pads with zeroes to the end of the given buffer.
  should initialise every bit of memory that is going to be copied to userland

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 00:35:44 -05:00
Cong Wang
fbca58a224 bridge: add missing vid to br_mdb_get()
Obviously, vid should be considered when searching for multicast
group.

Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:32:19 -05:00
Christoph Paasch
c10cb5fc0f Fix: sparse warning in inet_csk_prepare_forced_close
In e337e24d66 (inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and
dccp_v4/6_request_recv_sock) I introduced the function
inet_csk_prepare_forced_close, which does a call to bh_unlock_sock().
This produces a sparse-warning.

This patch adds the missing __releases.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:31:29 -05:00
Junwei Zhang
d0d79c3fd7 afkey: fix a typo
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:26:45 -05:00
Paul Moore
0c1233aba1 netlabel: correctly list all the static label mappings
When we have a large number of static label mappings that spill across
the netlink message boundary we fail to properly save our state in the
netlink_callback struct which causes us to repeat the same listings.
This patch fixes this problem by saving the state correctly between
calls to the NetLabel static label netlink "dumpit" routines.

Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:20:23 -05:00
David S. Miller
43b18db8a2 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for your net tree,
they are:

* Don't generate audit log message if audit is not enabled, from Gao Feng.

* Fix logging formatting for packets dropped by helpers, by Joe Perches.

* Fix a compilation warning in nfnetlink if CONFIG_PROVE_RCU is not set,
  from Paul Bolle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 15:20:02 -05:00
Johannes Berg
1345ee6a6d cfg80211: fix potential BSS memory leak and update
In the odd case that while updating information from a beacon,
a BSS was found that is part of a hidden group, we drop the
new information. In this case, however, we leak the IE buffer
from the update, and erroneously update the entry's timestamp
so it will never time out. Fix both these issues.

Cc: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-07 12:55:32 +01:00
Vladimir Kondratiev
021fcdc13a cfg80211: fix inconsistency in trace for rdev_set_mac_acl
There is NETDEV_ENTRY that was incorrectly assigned as WIPHY_ASSIGN,
fix it.

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-07 11:20:01 +01:00
Johannes Berg
27a737ff7c mac80211: always synchronize_net() during station removal
If there are keys left during station removal, then a
synchronize_net() will be done (for each key, I have a
patch to address this for 3.10), otherwise it won't be
done at all which causes issues because the station
could be used for TX while it's being removed from the
driver -- that might confuse the driver.

Fix this by always doing synchronize_net() if no key
was present any more.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 23:17:08 +01:00
John W. Linville
32cdd592b7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-06 10:21:17 -05:00
J. Bruce Fields
3c34ae11fa nfsd: fix krb5 handling of anonymous principals
krb5 mounts started failing as of
683428fae8 "sunrpc: Update svcgss xdr
handle to rpsec_contect cache".

The problem is that mounts are usually done with some host principal
which isn't normally mapped to any user, in which case svcgssd passes
down uid -1, which the kernel is then expected to map to the
export-specific anonymous uid or gid.

The new uid_valid/gid_valid checks were therefore causing that downcall
to fail.

(Note the regression may not have been seen with older userspace that
tended to map unknown principals to an anonymous id on their own rather
than leaving it to the kernel.)

Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-06 10:11:08 -05:00
David Ward
fa2b04f450 net/ipv4: Timestamp option cannot overflow with prespecified addresses
When a router forwards a packet that contains the IPv4 timestamp option,
if there is no space left in the option for the router to add its own
timestamp, then the router increments the Overflow value in the option.

However, if the addresses of the routers are prespecified in the option,
then the overflow condition cannot happen: the option is structured so
that each prespecified router has a place to write its timestamp. Other
routers do not add a timestamp, so there will never be a lack of space.

This fix ensures that the Overflow value in the IPv4 timestamp option is
not incremented when the addresses of the routers are prespecified, even
if the Pointer value is greater than the Length value.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:06 -05:00
Eric Dumazet
d1f41b67ff net: reduce net_rx_action() latency to 2 HZ
We should use time_after_eq() to get maximum latency of two ticks,
instead of three.

Bug added in commit 24f8b2385 (net: increase receive packet quantum)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:06 -05:00
Randy Dunlap
691b3b7e13 net: fix new kernel-doc warnings in net core
Fix new kernel-doc warnings in net/core/dev.c:

Warning(net/core/dev.c:4788): No description found for parameter 'new_carrier'
Warning(net/core/dev.c:4788): Excess function parameter 'new_carries' description in 'dev_change_carrier'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:06 -05:00
Paolo Valente
76e4cb0d3a pkt_sched: sch_qfq: remove a useless invocation of qfq_update_eligible
QFQ+ can select for service only 'eligible' aggregates, i.e.,
aggregates that would have started to be served also in the emulated
ideal system.  As a consequence, for QFQ+ to be work conserving, at
least one of the active aggregates must be eligible when it is time to
choose the next aggregate to serve.

The set of eligible aggregates is updated through the function
qfq_update_eligible(), which does guarantee that, after its
invocation, at least one of the active aggregates is eligible.
Because of this property, this function is invoked in
qfq_deactivate_agg() to guarantee that at least one of the active
aggregates is still eligible after an aggregate has been deactivated.
In particular, the critical case is when there are other active
aggregates, but the aggregate being deactivated happens to be the only
one eligible.

However, this precaution is not needed for QFQ+ to be work conserving,
because update_eligible() is always invoked also at the beginning of
qfq_choose_next_agg(). This patch removes the additional invocation of
update_eligible() in qfq_deactivate_agg().

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
40dd2d5461 pkt_sched: sch_qfq: do not allow virtual time to jump if an aggregate is in service
By definition of (the algorithm of) QFQ+, the system virtual time must
be pushed up only if there is no 'eligible' aggregate, i.e. no
aggregate that would have started to be served also in the ideal
system emulated by QFQ+.  QFQ+ serves only eligible aggregates, hence
the aggregate currently in service is eligible.  As a consequence, to
decide whether there is no eligible aggregate, QFQ+ must also check
whether there is no aggregate in service.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
a0143efa96 pkt_sched: sch_qfq: prevent budget from wrapping around after a dequeue
Aggregate budgets are computed so as to guarantee that, after an
aggregate has been selected for service, that aggregate has enough
budget to serve at least one maximum-size packet for the classes it
contains. For this reason, after a new aggregate has been selected
for service, its next packet is immediately dequeued, without any
further control.

The maximum packet size for a class, lmax, can be changed through
qfq_change_class(). In case the user sets lmax to a lower value than
the the size of some of the still-to-arrive packets, QFQ+ will
automatically push up lmax as it enqueues these packets.  This
automatic push up is likely to happen with TSO/GSO.

In any case, if lmax is assigned a lower value than the size of some
of the packets already enqueued for the class, then the following
problem may occur: the size of the next packet to dequeue for the
class may happen to be larger than lmax, after the aggregate to which
the class belongs has been just selected for service. In this case,
even the budget of the aggregate, which is an unsigned value, may be
lower than the size of the next packet to dequeue. After dequeueing
this packet and subtracting its size from the budget, the latter would
wrap around.

This fix prevents the budget from wrapping around after any packet
dequeue.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
2f3b89a1fe pkt_sched: sch_qfq: serve activated aggregates immediately if the scheduler is empty
If no aggregate is in service, then the function qfq_dequeue() does
not dequeue any packet. For this reason, to guarantee QFQ+ to be work
conserving, a just-activated aggregate must be set as in service
immediately if it happens to be the only active aggregate.
This is done by the function qfq_enqueue().

Unfortunately, the function qfq_add_to_agg(), used to add a class to
an aggregate, does not perform this important additional operation.
In particular, if: 1) qfq_add_to_agg() is invoked to complete the move
of a class from a source aggregate, becoming, for this move, inactive,
to a destination aggregate, becoming instead active, and 2) the
destination aggregate becomes the only active aggregate, then this
aggregate is not however set as in service. QFQ+ remains then in a
non-work-conserving state until a new invocation of qfq_enqueue()
recovers the situation.

This fix solves the problem by moving the logic for setting an
aggregate as in service directly into the function qfq_activate_agg().
Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+
remains work conserving.  Since the more-complex logic of this new
version of activate_aggregate() is not necessary, in qfq_dequeue(), to
reschedule an aggregate that finishes its budget, then the aggregate
is now rescheduled by invoking directly the functions needed.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
624b85fb96 pkt_sched: sch_qfq: fix the update of eligible-group sets
Between two invocations of make_eligible, the system virtual time may
happen to grow enough that, in its binary representation, a bit with
higher order than 31 flips. This happens especially with
TSO/GSO. Before this fix, the mask used in make_eligible was computed
as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
The fix just replaces 1UL with 1ULL.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
9b99b7e90b pkt_sched: sch_qfq: properly cap timestamps in charge_actual_service
QFQ+ schedules the active aggregates in a group using a bucket list
(one list per group). The bucket in which each aggregate is inserted
depends on the aggregate's timestamps, and the number
of buckets in a group is enough to accomodate the possible (range of)
values of the timestamps of all the aggregates in the group. For this
property to hold, timestamps must however be computed correctly.  One
necessary condition for computing timestamps correctly is that the
number of bits dequeued for each aggregate, while the aggregate is in
service, does not exceed the maximum budget budgetmax assigned to the
aggregate.

For each aggregate, budgetmax is proportional to the number of classes
in the aggregate. If the number of classes of the aggregate is
decreased through qfq_change_class(), then budgetmax is decreased
automatically as well.  Problems may occur if the aggregate is in
service when budgetmax is decreased, because the current remaining
budget of the aggregate and/or the service already received by the
aggregate may happen to be larger than the new value of budgetmax.  In
this case, when the aggregate is eventually deselected and its
timestamps are updated, the aggregate may happen to have received an
amount of service larger than budgetmax.  This may cause the aggregate
to be assigned a higher virtual finish time than the maximum
acceptable value for the last bucket in the bucket list of the group.

This fix introduces a cap that addresses this issue.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Peter Hurley
f74861ca87 net/irda: Raise dtr in non-blocking open
DTR/RTS need to be raised, regardless of the open() mode, but not
if the port has already shutdown.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Peter Hurley
0b176ce3a7 net/irda: Use barrier to set task state
Without a memory and compiler barrier, the task state change
can migrate relative to the condition testing in a blocking loop.
However, the task state change must be visible across all cpus
prior to testing those conditions. Failing to do this can result
in the familiar 'lost wakeup' and this task will hang until killed.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:04 -05:00
Peter Hurley
2f7c069b96 net/irda: Hold port lock while bumping blocked_open
Although tty_lock() already protects concurrent update to
blocked_open, that fails to meet the separation-of-concerns between
tty_port and tty.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:04 -05:00
Peter Hurley
a4ed2e737c net/irda: Fix port open counts
Saving the port count bump is unsafe. If the tty is hung up while
this open was blocking, the port count is zeroed.

Explicitly check if the tty was hung up while blocking, and correct
the port count if not.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:04 -05:00
Linus Torvalds
9da060d0ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A moderately sized pile of fixes, some specifically for merge window
  introduced regressions although others are for longer standing items
  and have been queued up for -stable.

  I'm kind of tired of all the RDS protocol bugs over the years, to be
  honest, it's way out of proportion to the number of people who
  actually use it.

   1) Fix missing range initialization in netfilter IPSET, from Jozsef
      Kadlecsik.

   2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
      Berg.

   3) Fix DMA syncing in SFC driver, from Ben Hutchings.

   4) Fix regression in BOND device MAC address setting, from Jiri
      Pirko.

   5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.

   6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
      fix from Dmitry Kravkov.

   7) Missing cfgspace_lock initialization in BCMA driver.

   8) Validate parameter size for SCTP assoc stats getsockopt(), from
      Guenter Roeck.

   9) Fix SCTP association hangs, from Lee A Roberts.

  10) Fix jumbo frame handling in r8169, from Francois Romieu.

  11) Fix phy_device memory leak, from Petr Malat.

  12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
      Mehrtens.

  13) Missing socket refcount release in L2TP, from Guillaume Nault.

  14) sctp_endpoint_init should respect passed in gfp_t, rather than use
      GFP_KERNEL unconditionally.  From Dan Carpenter.

  15) Add AISX AX88179 USB driver, from Freddy Xin.

  16) Remove MAINTAINERS entries for drivers deleted during the merge
      window, from Cesar Eduardo Barros.

  17) RDS protocol can try to allocate huge amounts of memory, check
      that the user's request length makes sense, from Cong Wang.

  18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
      bogus, definition.  From Cong Wang.

  19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
      from Frank Li.  Also, fix a build error introduced in the merge
      window.

  20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.

  21) Don't double count RTT measurements when we leave the TCP receive
      fast path, from Neal Cardwell."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tcp: fix double-counted receiver RTT when leaving receiver fast path
  CAIF: fix sparse warning for caif_usb
  rds: simplify a warning message
  net: fec: fix build error in no MXC platform
  net: ipv6: Don't purge default router if accept_ra=2
  net: fec: put tx to napi poll function to fix dead lock
  sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
  rds: limit the size allocated by rds_message_alloc()
  MAINTAINERS: remove eexpress
  MAINTAINERS: remove drivers/net/wan/cycx*
  MAINTAINERS: remove 3c505
  caif_dev: fix sparse warnings for caif_flow_cb
  ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
  sctp: use the passed in gfp flags instead GFP_KERNEL
  ipv[4|6]: correct dropwatch false positive in local_deliver_finish
  l2tp: Restore socket refcount when sendmsg succeeds
  net/phy: micrel: Disable asymmetric pause for KSZ9021
  bgmac: omit the fcs
  phy: Fix phy_device_free memory leak
  bnx2x: Fix KR2 work-around condition
  ...
2013-03-05 18:42:29 -08:00
Neal Cardwell
aab2b4bf22 tcp: fix double-counted receiver RTT when leaving receiver fast path
We should not update ts_recent and call tcp_rcv_rtt_measure_ts() both
before and after going to step5. That wastes CPU and double-counts the
receiver-side RTT sample.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:07 -05:00
Silviu-Mihai Popescu
d2123be0e5 CAIF: fix sparse warning for caif_usb
This fixes the following sparse warning:
net/caif/caif_usb.c:84:16: warning: symbol 'cfusbl_create' was not
declared. Should it be static?

Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:07 -05:00
Cong Wang
7dac1b514a rds: simplify a warning message
Cc: David S. Miller <davem@davemloft.net>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:07 -05:00