Commit graph

21741 commits

Author SHA1 Message Date
David S. Miller
a5e7424d42 ipv4: ping: Fix recvmsg MSG_OOB error handling.
Don't return an uninitialized variable as the error, return
-EOPNOTSUPP instead.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 17:59:19 -05:00
Greg Rose
115c9b8192 rtnetlink: Fix problem with buffer allocation
Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
is a 32 bit value that can be used to indicate to the kernel that
certain extended ifinfo values are requested by the user application.
At this time the only mask value defined is RTEXT_FILTER_VF to
indicate that the user wants the ifinfo dump to send information
about the VFs belonging to the interface.

This patch fixes a bug in which certain applications do not have
large enough buffers to accommodate the extra information returned
by the kernel with large numbers of SR-IOV virtual functions.
Those applications will not send the new netlink attribute with
the interface info dump request netlink messages so they will
not get unexpectedly large request buffers returned by the kernel.

Modifies the rtnl_calcit function to traverse the list of net
devices and compute the minimum buffer size that can hold the
info dumps of all matching devices based upon the filter passed
in via the new netlink attribute filter mask.  If no filter
mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.

With this change it is possible to add yet to be defined netlink
attributes to the dump request which should make it fairly extensible
in the future.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 16:56:45 -05:00
Michel Machado
84338a6c9d neighbour: Fixed race condition at tbl->nht
When the fixed race condition happens:

1. While function neigh_periodic_work scans the neighbor hash table
pointed by field tbl->nht, it unlocks and locks tbl->lock between
buckets in order to call cond_resched.

2. Assume that function neigh_periodic_work calls cond_resched, that is,
the lock tbl->lock is available, and function neigh_hash_grow runs.

3. Once function neigh_hash_grow finishes, and RCU calls
neigh_hash_free_rcu, the original struct neigh_hash_table that function
neigh_periodic_work was using doesn't exist anymore.

4. Once back at neigh_periodic_work, whenever the old struct
neigh_hash_table is accessed, things can go badly.

Signed-off-by: Michel Machado <michel@digirati.com.br>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 16:28:10 -05:00
John W. Linville
9d4990a260 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-02-20 14:47:17 -05:00
Eric Dumazet
cd961c2ca9 netem: fix dequeue
commit 50612537e9 (netem: fix classful handling) added two errors in
netem_dequeue()

1) After checking skb at the head of tfifo queue for time constraints,
   it dequeues tail skb, thus adding unwanted reordering.

2) qdisc stats are updated twice per packet
   (one when packet dequeued from tfifo, once when delivered)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-19 18:57:50 -05:00
Felix Fietkau
216c57b214 mac80211: do not call rate control .tx_status before .rate_init
Most rate control implementations assume .get_rate and .tx_status are only
called once the per-station data has been fully initialized.
minstrel_ht crashes if this assumption is violated.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-15 13:56:06 -05:00
Johannes Berg
4b5a433ae5 mac80211: call rate control only after init
There are situations where we don't have the
necessary rate control information yet for
station entries, e.g. when associating. This
currently doesn't really happen due to the
dummy station handling; explicitly disabling
rate control when it's not initialised will
allow us to remove dummy stations.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-15 13:56:06 -05:00
John W. Linville
33b5d30cd8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-02-15 13:41:52 -05:00
Ulisses Furquim
24d2b8c0ac Bluetooth: Fix possible use after free in delete path
We need to use the _sync() version for cancelling the info and security
timer in the L2CAP connection delete path. Otherwise the delayed work
handler might run after the connection object is freed.

Signed-off-by: Ulisses Furquim <ulisses@profusion.mobi>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:26 +02:00
Ulisses Furquim
6de3275082 Bluetooth: Remove usage of __cancel_delayed_work()
__cancel_delayed_work() is being used in some paths where we cannot
sleep waiting for the delayed work to finish. However, that function
might return while the timer is running and the work will be queued
again. Replace the calls with safer cancel_delayed_work() version
which spins until the timer handler finishes on other CPUs and
cancels the delayed work.

Signed-off-by: Ulisses Furquim <ulisses@profusion.mobi>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:26 +02:00
Johan Hedberg
ca0d6c7ece Bluetooth: Add missing QUIRK_NO_RESET test to hci_dev_do_close
We should only perform a reset in hci_dev_do_close if the
HCI_QUIRK_NO_RESET flag is set (since in such a case a reset will not be
performed when initializing the device).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2012-02-15 13:09:26 +02:00
Octavian Purdila
cf33e77b76 Bluetooth: Fix RFCOMM session reference counting issue
There is an imbalance in the rfcomm_session_hold / rfcomm_session_put
operations which causes the following crash:

[  685.010159] BUG: unable to handle kernel paging request at 6b6b6b6b
[  685.010169] IP: [<c149d76d>] rfcomm_process_dlcs+0x1b/0x15e
[  685.010181] *pdpt = 000000002d665001 *pde = 0000000000000000
[  685.010191] Oops: 0000 [#1] PREEMPT SMP
[  685.010247]
[  685.010255] Pid: 947, comm: krfcommd Tainted: G         C  3.0.16-mid8-dirty #44
[  685.010266] EIP: 0060:[<c149d76d>] EFLAGS: 00010246 CPU: 1
[  685.010274] EIP is at rfcomm_process_dlcs+0x1b/0x15e
[  685.010281] EAX: e79f551c EBX: 6b6b6b6b ECX: 00000007 EDX: e79f40b4
[  685.010288] ESI: e79f4060 EDI: ed4e1f70 EBP: ed4e1f68 ESP: ed4e1f50
[  685.010295]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  685.010303] Process krfcommd (pid: 947, ti=ed4e0000 task=ed43e5e0 task.ti=ed4e0000)
[  685.010308] Stack:
[  685.010312]  ed4e1f68 c149eb53 e5925150 e79f4060 ed500000 ed4e1f70 ed4e1f80 c149ec10
[  685.010331]  00000000 ed43e5e0 00000000 ed4e1f90 ed4e1f9c c149ec87 0000bf54 00000000
[  685.010348]  00000000 ee03bf54 c149ec37 ed4e1fe4 c104fe01 00000000 00000000 00000000
[  685.010367] Call Trace:
[  685.010376]  [<c149eb53>] ? rfcomm_process_rx+0x6e/0x74
[  685.010387]  [<c149ec10>] rfcomm_process_sessions+0xb7/0xde
[  685.010398]  [<c149ec87>] rfcomm_run+0x50/0x6d
[  685.010409]  [<c149ec37>] ? rfcomm_process_sessions+0xde/0xde
[  685.010419]  [<c104fe01>] kthread+0x63/0x68
[  685.010431]  [<c104fd9e>] ? __init_kthread_worker+0x42/0x42
[  685.010442]  [<c14dae82>] kernel_thread_helper+0x6/0xd

This issue has been brought up earlier here:

https://lkml.org/lkml/2011/5/21/127

The issue appears to be the rfcomm_session_put in rfcomm_recv_ua. This
operation doesn't seem be to required as for the non-initiator case we
have the rfcomm_process_rx doing an explicit put and in the initiator
case the last dlc_unlink will drive the reference counter to 0.

There have been several attempts to fix these issue:

6c2718d Bluetooth: Do not call rfcomm_session_put() for RFCOMM UA on closed socket
683d949 Bluetooth: Never deallocate a session when some DLC points to it

but AFAICS they do not fix the issue just make it harder to reproduce.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: Gopala Krishna Murala <gopala.krishna.murala@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:26 +02:00
Octavian Purdila
b5a30dda65 Bluetooth: silence lockdep warning
Since bluetooth uses multiple protocols types, to avoid lockdep
warnings, we need to use different lockdep classes (one for each
protocol type).

This is already done in bt_sock_create but it misses a couple of cases
when new connections are created. This patch corrects that to fix the
following warning:

<4>[ 1864.732366] =======================================================
<4>[ 1864.733030] [ INFO: possible circular locking dependency detected ]
<4>[ 1864.733544] 3.0.16-mid3-00007-gc9a0f62 #3
<4>[ 1864.733883] -------------------------------------------------------
<4>[ 1864.734408] t.android.btclc/4204 is trying to acquire lock:
<4>[ 1864.734869]  (rfcomm_mutex){+.+.+.}, at: [<c14970ea>] rfcomm_dlc_close+0x15/0x30
<4>[ 1864.735541]
<4>[ 1864.735549] but task is already holding lock:
<4>[ 1864.736045]  (sk_lock-AF_BLUETOOTH){+.+.+.}, at: [<c1498bf7>] lock_sock+0xa/0xc
<4>[ 1864.736732]
<4>[ 1864.736740] which lock already depends on the new lock.
<4>[ 1864.736750]
<4>[ 1864.737428]
<4>[ 1864.737437] the existing dependency chain (in reverse order) is:
<4>[ 1864.738016]
<4>[ 1864.738023] -> #1 (sk_lock-AF_BLUETOOTH){+.+.+.}:
<4>[ 1864.738549]        [<c1062273>] lock_acquire+0x104/0x140
<4>[ 1864.738977]        [<c13d35c1>] lock_sock_nested+0x58/0x68
<4>[ 1864.739411]        [<c1493c33>] l2cap_sock_sendmsg+0x3e/0x76
<4>[ 1864.739858]        [<c13d06c3>] __sock_sendmsg+0x50/0x59
<4>[ 1864.740279]        [<c13d0ea2>] sock_sendmsg+0x94/0xa8
<4>[ 1864.740687]        [<c13d0ede>] kernel_sendmsg+0x28/0x37
<4>[ 1864.741106]        [<c14969ca>] rfcomm_send_frame+0x30/0x38
<4>[ 1864.741542]        [<c1496a2a>] rfcomm_send_ua+0x58/0x5a
<4>[ 1864.741959]        [<c1498447>] rfcomm_run+0x441/0xb52
<4>[ 1864.742365]        [<c104f095>] kthread+0x63/0x68
<4>[ 1864.742742]        [<c14d5182>] kernel_thread_helper+0x6/0xd
<4>[ 1864.743187]
<4>[ 1864.743193] -> #0 (rfcomm_mutex){+.+.+.}:
<4>[ 1864.743667]        [<c1061ada>] __lock_acquire+0x988/0xc00
<4>[ 1864.744100]        [<c1062273>] lock_acquire+0x104/0x140
<4>[ 1864.744519]        [<c14d2c70>] __mutex_lock_common+0x3b/0x33f
<4>[ 1864.744975]        [<c14d303e>] mutex_lock_nested+0x2d/0x36
<4>[ 1864.745412]        [<c14970ea>] rfcomm_dlc_close+0x15/0x30
<4>[ 1864.745842]        [<c14990d9>] __rfcomm_sock_close+0x5f/0x6b
<4>[ 1864.746288]        [<c1499114>] rfcomm_sock_shutdown+0x2f/0x62
<4>[ 1864.746737]        [<c13d275d>] sys_socketcall+0x1db/0x422
<4>[ 1864.747165]        [<c14d42f0>] syscall_call+0x7/0xb

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:26 +02:00
Andrzej Kaczmarek
6e1da683f7 Bluetooth: l2cap_set_timer needs jiffies as timeout value
After moving L2CAP timers to workqueues l2cap_set_timer expects timeout
value to be specified in jiffies but constants defined in miliseconds
are used. This makes timeouts unreliable when CONFIG_HZ is not set to
1000.

__set_chan_timer macro still uses jiffies as input to avoid multiple
conversions from/to jiffies for sk_sndtimeo value which is already
specified in jiffies.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Ackec-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:25 +02:00
Andrzej Kaczmarek
a63752552b Bluetooth: Fix sk_sndtimeo initialization for L2CAP socket
sk_sndtime value should be specified in jiffies thus initial value
needs to be converted from miliseconds. Otherwise this timeout is
unreliable when CONFIG_HZ is not set to 1000.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:25 +02:00
Johan Hedberg
4aa832c27e Bluetooth: Remove bogus inline declaration from l2cap_chan_connect
As reported by Dan Carpenter this function causes a Sparse warning and
shouldn't be declared inline:

include/net/bluetooth/l2cap.h:837:30 error: marked inline, but without a
definition"

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2012-02-15 13:09:25 +02:00
Peter Hurley
18daf1644e Bluetooth: Fix l2cap conn failures for ssp devices
Commit 330605423c fixed l2cap conn establishment for non-ssp remote
devices by not setting HCI_CONN_ENCRYPT_PEND every time conn security
is tested (which was always returning failure on any subsequent
security checks).

However, this broke l2cap conn establishment for ssp remote devices
when an ACL link was already established at SDP-level security. This
fix ensures that encryption must be pending whenever authentication
is also pending.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:25 +02:00
Eric Dumazet
58e05f357a netpoll: netpoll_poll_dev() should access dev->flags
commit 5a698af53f (bond: service netpoll arp queue on master device)
tested IFF_SLAVE flag against dev->priv_flags instead of dev->flags

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 15:24:26 -05:00
Axel Lin
f65bd5ec47 RxRPC: Fix kcalloc parameters swapped
The first parameter should be "number of elements" and the second parameter
should be "element size".

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:41:55 -05:00
Neal Cardwell
0af2a0d057 tcp: fix tcp_shifted_skb() adjustment of lost_cnt_hint for FACK
This commit ensures that lost_cnt_hint is correctly updated in
tcp_shifted_skb() for FACK TCP senders. The lost_cnt_hint adjustment
in tcp_sacktag_one() only applies to non-FACK senders, so FACK senders
need their own adjustment.

This applies the spirit of 1e5289e121 -
except now that the sequence range passed into tcp_sacktag_one() is
correct we need only have a special case adjustment for FACK.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:38:57 -05:00
Neal Cardwell
daef52bab1 tcp: fix range tcp_shifted_skb() passes to tcp_sacktag_one()
Fix the newly-SACKed range to be the range of newly-shifted bytes.

Previously - since 832d11c5cd -
tcp_shifted_skb() incorrectly called tcp_sacktag_one() with the start
and end sequence numbers of the skb it passes in set to the range just
beyond the range that is newly-SACKed.

This commit also removes a special-case adjustment to lost_cnt_hint in
tcp_shifted_skb() since the pre-existing adjustment of lost_cnt_hint
in tcp_sacktag_one() now properly handles this things now that the
correct start sequence number is passed in.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-13 01:00:22 -05:00
Neal Cardwell
cc9a672ee5 tcp: allow tcp_sacktag_one() to tag ranges not aligned with skbs
This commit allows callers of tcp_sacktag_one() to pass in sequence
ranges that do not align with skb boundaries, as tcp_shifted_skb()
needs to do in an upcoming fix in this patch series.

In fact, now tcp_sacktag_one() does not need to depend on an input skb
at all, which makes its semantics and dependencies more clear.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-13 01:00:21 -05:00
Linus Torvalds
8df54d622a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Quoth David:

1) GRO MAC header comparisons were ethernet specific, breaking other
   link types.  This required a multi-faceted fix to cure the originally
   noted case (Infiniband), because IPoIB was lying about it's actual
   hard header length.  Thanks to Eric Dumazet, Roland Dreier, and
   others.

2) Fix build failure when INET_UDP_DIAG is built in and ipv6 is modular.
   From Anisse Astier.

3) Off by ones and other bug fixes in netprio_cgroup from Neil Horman.

4) ipv4 TCP reset generation needs to respect any network interface
   binding from the socket, otherwise route lookups might give a
   different result than all the other segments received.  From Shawn
   Lu.

5) Fix unintended regression in ipv4 proxy ARP responses, from Thomas
   Graf.

6) Fix SKB under-allocation bug in sh_eth, from Yoshihiro Shimoda.

7) Revert skge PCI mapping changes that are causing crashes for some
   folks, from Stephen Hemminger.

8) IPV4 route lookups fill in the wildcarded fields of the given flow
   lookup key passed in, which is fine most of the time as this is
   exactly what the caller's want.  However there are a few cases that
   want to retain the original flow key values afterwards, so handle
   those cases properly.  Fix from Julian Anastasov.

9) IGB/IXGBE VF lookup bug fixes from Greg Rose.

10) Properly null terminate filename passed to ethtool flash device
    method, from Ben Hutchings.

11) S3 resume fix in via-velocity from David Lv.

12) Fix double SKB free during xmit failure in CAIF, from Dmitry
    Tarnyagin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
  net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabled
  ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.
  netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m
  netprio_cgroup: don't allocate prio table when a device is registered
  netprio_cgroup: fix an off-by-one bug
  bna: fix error handling of bnad_get_flash_partition_by_offset()
  isdn: type bug in isdn_net_header()
  net: Make qdisc_skb_cb upper size bound explicit.
  ixgbe: ethtool: stats user buffer overrun
  ixgbe: dcb: up2tc mapping lost on disable/enable CEE DCB state
  ixgbe: do not update real num queues when netdev is going away
  ixgbe: Fix broken dependency on MAX_SKB_FRAGS being related to page size
  ixgbe: Fix case of Tx Hang in PF with 32 VFs
  ixgbe: fix vf lookup
  igb: fix vf lookup
  e1000: add dropped DMA receive enable back in for WoL
  gro: more generic L2 header check
  IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses
  zd1211rw: firmware needs duration_id set to zero for non-pspoll frames
  net: enable TC35815 for MIPS again
  ...
2012-02-10 14:18:46 -08:00
Thomas Graf
70620c46ac net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabled
Commit 653241 (net: RFC3069, private VLAN proxy arp support) changed
the behavior of arp proxy to send arp replies back out on the interface
the request came in even if the private VLAN feature is disabled.

Previously we checked rt->dst.dev != skb->dev for in scenarios, when
proxy arp is enabled on for the netdevice and also when individual proxy
neighbour entries have been added.

This patch adds the check back for the pneigh_lookup() scenario.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:13:36 -05:00
Li Wei
5dc7883f2a ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.
This patch fix a bug which introduced by commit ac8a4810 (ipv4: Save
nexthop address of LSRR/SSRR option to IPCB.).In that patch, we saved
the nexthop of SRR in ip_option->nexthop and update iph->daddr until
we get to ip_forward_options(), but we need to update it before
ip_rt_get_source(), otherwise we may get a wrong src.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:12:12 -05:00
Neil Horman
2b73bc65e2 netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m
When the netprio_cgroup module is not loaded, net_prio_subsys_id
is -1, and so sock_update_prioidx() accesses cgroup_subsys array
with negative index subsys[-1].

Make the code resembles cls_cgroup code, which is bug free.

Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:08:57 -05:00
Neil Horman
f5c38208d3 netprio_cgroup: don't allocate prio table when a device is registered
So we delay the allocation till the priority is set through cgroup,
and this makes skb_update_priority() faster when it's not set.

This also eliminates an off-by-one bug similar with the one fixed
in the previous patch.

Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:08:57 -05:00
Neil Horman
a87dfe14a7 netprio_cgroup: fix an off-by-one bug
# mount -t cgroup xxx /mnt
  # mkdir /mnt/tmp
  # cat /mnt/tmp/net_prio.ifpriomap
  lo 0
  eth0 0
  virbr0 0
  # echo 'lo 999' > /mnt/tmp/net_prio.ifpriomap
  # cat /mnt/tmp/net_prio.ifpriomap
  lo 999
  eth0 0
  virbr0 4101267344

We got weired output, because we exceeded the boundary of the array.
We may even crash the kernel..

Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:08:56 -05:00
Mohammed Shafi Shajakhan
b57e6b560f mac80211: Fix a rwlock bad magic bug
read_lock(&tpt_trig->trig.leddev_list_lock) is accessed via the path
ieee80211_open (->) ieee80211_do_open (->) ieee80211_mod_tpt_led_trig
(->) ieee80211_start_tpt_led_trig (->) tpt_trig_timer before initializing
it.
the intilization of this read/write lock happens via the path
ieee80211_led_init (->) led_trigger_register, but we are doing
'ieee80211_led_init'  after 'ieeee80211_if_add' where we
register netdev_ops.
so we access leddev_list_lock before initializing it and causes the
following bug in chrome laptops with AR928X cards with the following
script

while true
do
sudo modprobe -v ath9k
sleep 3
sudo modprobe -r ath9k
sleep 3
done

	BUG: rwlock bad magic on CPU#1, wpa_supplicant/358, f5b9eccc
	Pid: 358, comm: wpa_supplicant Not tainted 3.0.13 #1
	Call Trace:

	[<8137b9df>] rwlock_bug+0x3d/0x47
	[<81179830>] do_raw_read_lock+0x19/0x29
	[<8137f063>] _raw_read_lock+0xd/0xf
	[<f9081957>] tpt_trig_timer+0xc3/0x145 [mac80211]
	[<f9081f3a>] ieee80211_mod_tpt_led_trig+0x152/0x174 [mac80211]
	[<f9076a3f>] ieee80211_do_open+0x11e/0x42e [mac80211]
	[<f9075390>] ? ieee80211_check_concurrent_iface+0x26/0x13c [mac80211]
	[<f9076d97>] ieee80211_open+0x48/0x4c [mac80211]
	[<812dbed8>] __dev_open+0x82/0xab
	[<812dc0c9>] __dev_change_flags+0x9c/0x113
	[<812dc1ae>] dev_change_flags+0x18/0x44
	[<8132144f>] devinet_ioctl+0x243/0x51a
	[<81321ba9>] inet_ioctl+0x93/0xac
	[<812cc951>] sock_ioctl+0x1c6/0x1ea
	[<812cc78b>] ? might_fault+0x20/0x20
	[<810b1ebb>] do_vfs_ioctl+0x46e/0x4a2
	[<810a6ebb>] ? fget_light+0x2f/0x70
	[<812ce549>] ? sys_recvmsg+0x3e/0x48
	[<810b1f35>] sys_ioctl+0x46/0x69
	[<8137fa77>] sysenter_do_call+0x12/0x2

Cc: <stable@vger.kernel.org>
Cc: Gary Morain <gmorain@google.com>
Cc: Paul Stewart <pstew@google.com>
Cc: Abhijit Pradhan <abhijit@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-09 15:16:04 -05:00
David S. Miller
16bda13d90 net: Make qdisc_skb_cb upper size bound explicit.
Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside
of other data structures.

This is intended to be used by IPoIB so that it can remember
addressing information stored at hard_header_ops->create() time that
it can fetch when the packet gets to the transmit routine.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-09 13:50:34 -05:00
Eric Dumazet
5ca3b72c5d gro: more generic L2 header check
Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
only, and failed on IB/ipoib traffic.

He provided a patch faking a zeroed header to let GRO aggregates frames.

Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
check to be more generic, ie not assuming L2 header is 14 bytes, but
taking into account hard_header_len.

__napi_gro_receive() has special handling for the common case (Ethernet)
to avoid a memcmp() call and use an inline optimized function instead.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Shlomo Pongratz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08 18:26:54 -05:00
Anisse Astier
6d25886ee2 net: Fix build regression when INET_UDP_DIAG=y and IPV6=m
Tested-by: Anisse Astier <anisse@astier.eu>

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-07 13:35:28 -05:00
Shawn Lu
e2446eaab5 tcp_v4_send_reset: binding oif to iif in no sock case
Binding RST packet outgoing interface to incoming interface
for tcp v4 when there is no socket associate with it.
when sk is not NULL, using sk->sk_bound_dev_if instead.
(suggested by Eric Dumazet).

This has few benefits:
1. tcp_v6_send_reset already did that.
2. This helps tcp connect with SO_BINDTODEVICE set. When
connection is lost, we still able to sending out RST using
same interface.
3. we are sending reply, it is most likely to be succeed
if iif is used

Signed-off-by: Shawn Lu <shawn.lu@ericsson.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04 18:20:05 -05:00
Neil Horman
5962b35c1d netprio_cgroup: Fix obo in get_prioidx
It was recently pointed out to me that the get_prioidx function sets a bit in
the prioidx map prior to checking to see if the index being set is out of
bounds.  This patch corrects that, avoiding the possiblity of us writing beyond
the end of the array

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
CC: Stanislaw Gruszka <sgruszka@redhat.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04 16:30:24 -05:00
John W. Linville
157ca9eae9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-02-03 14:14:07 -05:00
Linus Torvalds
6c073a7ee2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: fix safety of rbd_put_client()
  rbd: fix a memory leak in rbd_get_client()
  ceph: create a new session lock to avoid lock inversion
  ceph: fix length validation in parse_reply_info()
  ceph: initialize client debugfs outside of monc->mutex
  ceph: change "ceph.layout" xattr to be "ceph.file.layout"
2012-02-02 15:47:33 -08:00
Sage Weil
ab434b60ab ceph: initialize client debugfs outside of monc->mutex
Initializing debufs under monc->mutex introduces a lock dependency for
sb->s_type->i_mutex_key, which (combined with several other dependencies)
leads to an annoying lockdep warning.  There's no particular reason to do
the debugfs setup under this lock, so move it out.

It used to be the case that our first monmap could come from the OSD; that
is no longer the case with recent servers, so we will reliably set up the
client entry during the initial authentication.

We don't have to worry about racing with debugfs teardown by
ceph_debugfs_client_cleanup() because ceph_destroy_client() calls
ceph_msgr_flush() first, which will wait for the message dispatch work
to complete (and the debugfs init to complete).

Fixes: #1940
Signed-off-by: Sage Weil <sage@newdream.net>
2012-02-02 12:49:01 -08:00
Dmitry Tarnyagin
ba7605745d caif: Bugfix double kfree_skb upon xmit failure
SKB is freed twice upon send error. The Network stack consumes SKB even
when it returns error code.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-02 14:35:12 -05:00
sjur.brandeland@stericsson.com
b01377a420 caif: Bugfix list_del_rcu race in cfmuxl_ctrlcmd.
Always use cfmuxl_remove_uplayer when removing a up-layer.
cfmuxl_ctrlcmd() can be called independently and in parallel with
cfmuxl_remove_uplayer(). The race between them could cause list_del_rcu
to be called on a node which has been already taken out from the list.
That lead to a (rare) crash on accessing poisoned node->prev inside
list_del_rcu.

This fix ensures that deletion are done holding the same lock.

Reported-by: Dmitry Tarnyagin <dmitry.tarnyagin@stericsson.com>
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-02 14:35:12 -05:00
Jason Wang
c43b874d5d tcp: properly initialize tcp memory limits
Commit 4acb4190 tries to fix the using uninitialized value
introduced by commit 3dc43e3,  but it would make the
per-socket memory limits too small.

This patch fixes this and also remove the redundant codes
introduced in 4acb4190.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-02 14:34:41 -05:00
Eliad Peller
07ae2dfcf4 mac80211: timeout a single frame in the rx reorder buffer
The current code checks for stored_mpdu_num > 1, causing
the reorder_timer to be triggered indefinitely, but the
frame is never timed-out (until the next packet is received)

Signed-off-by: Eliad Peller <eliad@wizery.com>
Cc: <stable@vger.kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-01 15:26:00 -05:00
Ben Hutchings
786f528119 ethtool: Null-terminate filename passed to ethtool_ops::flash_device
The parameters for ETHTOOL_FLASHDEV include a filename, which ought to
be null-terminated.  Currently the only driver that implements
ethtool_ops::flash_device attempts to add a null terminator if
necessary, but does it wrongly.  Do it in the ethtool core instead.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-01 14:47:17 -05:00
Arun Sharma
efcdbf24fd net: Disambiguate kernel message
Some of our machines were reporting:

TCP: too many of orphaned sockets

even when the number of orphaned sockets was well below the
limit.

We print a different message depending on whether we're out
of TCP memory or there are too many orphaned sockets.

Also move the check out of line and cleanup the messages
that were printed.

Signed-off-by: Arun Sharma <asharma@fb.com>
Suggested-by: Mohan Srinivasan <mohan@fb.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: David Miller <davem@davemloft.net>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-01 14:41:50 -05:00
Linus Torvalds
a14a8d9316 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
1) Setting link attributes can modify the size of the attributes that
   would be reported on a subsequent getlink netlink operation,
   therefore min_ifinfo_dump_size needs to be adjusted.  From Stefan
   Gula.

2) Resegmentation of TSO frames while trimming can violate invariants
   expected by callers, namely that the number of segments can only stay
   the same or decrease, never increase.  If MSS changes, however, we
   can trim data but then end up with more segments.  Fix this by only
   segmenting to the MSS already recorded in the SKB.  That's the
   simplest fix for now and if we want to get more fancy in the future
   that's a more involved change.

   This probably explains some retransmit counter inaccuracies.

   From Neal Cardwell.

3) Fix too-many-wakeups in POLL with AF_UNIX sockets, from Eric Dumazet.

4) Fix CAIF crashes wrt.  namespace handling.  From Eric Dumazet and
   Eric W. Biederman.

5) TCP port selection fixes from Flavio Leitner.

6) More socket memory cgroup build fixes in certain randonfig
   situations.  From Glauber Costa.

7) Fix TCP memory sysctl regression reported by Ingo Molnar, also from
   Glauber Costa.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  af_unix: fix EPOLLET regression for stream sockets
  tcp: fix tcp_trim_head() to adjust segment count with skb MSS
  net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL
  net caif: Register properly as a pernet subsystem.
  netns: Fail conspicously if someone uses net_generic at an inappropriate time.
  net: explicitly add jump_label.h header to sock.h
  net: RTNETLINK adjusting values of min_ifinfo_dump_size
  ipv6: Fix ip_gre lockless xmits.
  xen-netfront: correct MAX_TX_TARGET calculation.
  netns: fix net_alloc_generic()
  tcp: bind() optimize port allocation
  tcp: bind() fix autoselection to share ports
  l2tp: l2tp_ip - fix possible oops on packet receive
  iwlwifi: fix PCI-E transport "inta" race
  mac80211: set bss_conf.idle when vif is connected
  mac80211: update oper_channel on ibss join
2012-01-30 10:53:20 -08:00
Eric Dumazet
6f01fd6e6f af_unix: fix EPOLLET regression for stream sockets
Commit 0884d7aa24 (AF_UNIX: Fix poll blocking problem when reading from
a stream socket) added a regression for epoll() in Edge Triggered mode
(EPOLLET)

Appropriate fix is to use skb_peek()/skb_unlink() instead of
skb_dequeue(), and only call skb_unlink() when skb is fully consumed.

This remove the need to requeue a partial skb into sk_receive_queue head
and the extra sk->sk_data_ready() calls that added the regression.

This is safe because once skb is given to sk_receive_queue, it is not
modified by a writer, and readers are serialized by u->readlock mutex.

This also reduce number of spinlock acquisition for small reads or
MSG_PEEK users so should improve overall performance.

Reported-by: Nick Mathewson <nickm@freehaven.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexey Moiseytsev <himeraster@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-30 12:45:07 -05:00
Neal Cardwell
5b35e1e6e9 tcp: fix tcp_trim_head() to adjust segment count with skb MSS
This commit fixes tcp_trim_head() to recalculate the number of
segments in the skb with the skb's existing MSS, so trimming the head
causes the skb segment count to be monotonically non-increasing - it
should stay the same or go down, but not increase.

Previously tcp_trim_head() used the current MSS of the connection. But
if there was a decrease in MSS between original transmission and ACK
(e.g. due to PMTUD), this could cause tcp_trim_head() to
counter-intuitively increase the segment count when trimming bytes off
the head of an skb. This violated assumptions in tcp_tso_acked() that
tcp_trim_head() only decreases the packet count, so that packets_acked
in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to
pass u32 pkts_acked values as large as 0xffffffff to
ca_ops->pkts_acked().

As an aside, if tcp_trim_head() had really wanted the skb to reflect
the current MSS, it should have called tcp_set_skb_tso_segs()
unconditionally, since a decrease in MSS would mean that a
single-packet skb should now be sliced into multiple segments.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-30 12:42:58 -05:00
Glauber Costa
4acb41903b net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL
sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c
in commit 3dc43e3e4d, since it
became a per-ns value.

That code, however, will never run when CONFIG_SYSCTL is
disabled, leading to bogus values on those fields - causing hung
TCP sockets.

This patch fixes it by keeping an initialization code in
tcp_init(). It will be overwritten by the first net namespace
init if CONFIG_SYSCTL is compiled in, and do the right thing if
it is compiled out.

It is also named properly as tcp_init_mem(), to properly signal
its non-sysctl side effect on TCP limits.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: David S. Miller <davem@davemloft.net>
Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com
[ renamed the function, tidied up the changelog a bit ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-30 12:41:06 -05:00
Linus Torvalds
f94f72ee67 NFS client bugfixes for Linux 3.3 (pull 3)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPJcPiAAoJEGcL54qWCgDyLegQAITxsoM1yYiXUQ70dksWIn8q
 y/ieLyrrTa0X3k3pHjSi305ar5d41ebEdq3aPk7tnXkpVEAPBKu4gmgx2ZQmb/G4
 uS0jiPQMjakORpBQ3RPviPDy+Yb/xWZa8iZFSq8F1pjWggPYVaNaDQmZXY+h1luN
 JSgaLSwefw3eTBuY9sqN1+qr0/F1Cbri5fYDMeA6GlJfdDkt4qO7Rep6VRc8xghk
 Pb5CnyiIziGB8qZnWI2dxQVUZRUYMGA+E6cAZzBxBAyLVxWmZGHJWB6VceNIilaT
 3CWXgQ8XRbOeYWx1q/Lnbf2s1ebRGWj5JxLoMSDYo4U6q5BEjngG0vzf5wHpMToR
 8JZB6PXmD9riCsm6xHIdu5Q+5Ku4cttGDT0PeCci/lwhgf9/u7YxwpTF8zdec0e5
 IWYIA9n+i8pot6XDOrlfmXARIGmtz7CDnPlmXCg0hKujfEj5Tmx7v4DKtGZSxQay
 e83/uuMfSWoaM3/RKG5Y0DhcbweuMMdybno4jgiYilKMCSiP4A+6YOaBSP0Y60DO
 PJmaK5E5BAhGfljCjQkvNZjpIMLUfFKKTjpB1e09ZmNs3q0lSnglX2MEOizaoHLI
 qZ7ZtFY4GVoZDV9d0zqWjUqZK0+L7YWo9M5el2ApBYt/K9gc8bpK9NLB4wrF6aPV
 DSe4flw4d8uTxkFvES3s
 =Fin9
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

NFS client bugfixes for Linux 3.3 (pull 3)

* tag 'nfs-for-3.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Fix machine creds in generic_create_cred and generic_match
2012-01-30 08:47:49 -08:00
Eric W. Biederman
8a8ee9aff6 net caif: Register properly as a pernet subsystem.
caif is a subsystem and as such it needs to register with
register_pernet_subsys instead of register_pernet_device.

Among other problems using register_pernet_device was resulting in
net_generic being called before the caif_net structure was allocated.
Which has been causing net_generic to fail with either BUG_ON's or by
return NULL pointers.

A more ugly problem that could be caused is packets in flight why the
subsystem is shutting down.

To remove confusion also remove the cruft cause by inappropriately
trying to fix this bug.

With the aid of the previous patch I have tested this patch and
confirmed that using register_pernet_subsys makes the failure go away as
it should.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-27 21:06:03 -05:00
David S. Miller
cc0d7b91db Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2012-01-27 20:40:18 -05:00