Commit graph

21861 commits

Author SHA1 Message Date
Alex Elder
9bd1966344 libceph: rename "page_shift" variable to something sensible
In write_partial_msg_pages() there is a local variable used to
track the starting offset within a bio segment to use.  Its name,
"page_shift" defies the Linux convention of using that name for
log-base-2(page size).

Since it's only used in the bio case rename it "bio_offset".  Use it
along with the page_pos field to compute the memory offset when
computing CRC's in that function.  This makes the bio case match the
others more closely.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:52 -05:00
Alex Elder
0cdf9e6018 libceph: get rid of zero_page_address
There's not a lot of benefit to zero_page_address, which basically
holds a mapping of the zero page through the life of the messenger
module.  Even with our own mapping, the sendpage interface where
it's used may need to kmap() it again.  It's almost certain to
be in low memory anyway.

So stop treating the zero page specially in write_partial_msg_pages()
and just get rid of zero_page_address entirely.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:52 -05:00
Alex Elder
e36b13cceb libceph: only call kernel_sendpage() via helper
Make ceph_tcp_sendpage() be the only place kernel_sendpage() is
used, by using this helper in write_partial_msg_pages().

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:52 -05:00
Alex Elder
31739139f3 libceph: use kernel_sendpage() for sending zeroes
If a message queued for send gets revoked, zeroes are sent over the
wire instead of any unsent data.  This is done by constructing a
message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg().

Since we are already working with a page in this case we can use
the sendpage interface instead.  Create a new ceph_tcp_sendpage()
helper that sets up flags to match the way ceph_tcp_sendmsg()
does now.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
37675b0f42 libceph: fix inverted crc option logic
CRC's are computed for all messages between ceph entities.  The CRC
computation for the data portion of message can optionally be
disabled using the "nocrc" (common) ceph option.  The default is
for CRC computation for the data portion to be enabled.

Unfortunately, the code that implements this feature interprets the
feature flag wrong, meaning that by default the CRC's have *not*
been computed (or checked) for the data portion of messages unless
the "nocrc" option was supplied.

Fix this, in write_partial_msg_pages() and read_partial_message().
Also change the flag variable in write_partial_msg_pages() to be
"no_datacrc" to match the usage elsewhere in the file.

This fixes http://tracker.newdream.net/issues/2064

Signed-off-by: Alex Elder <elder@dreamhost.com>
Reviewed-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
84495f4961 libceph: some simple changes
Nothing too big here.
    - define the size of the buffer used for consuming ignored
      incoming data using a symbolic constant
    - simplify the condition determining whether to unmap the page
      in write_partial_msg_pages(): do it for crc but not if the
      page is the zero page

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
f42299e6c3 libceph: small refactor in write_partial_kvec()
Make a small change in the code that counts down kvecs consumed by
a ceph_tcp_sendmsg() call.  Same functionality, just blocked out
a little differently.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
fe3ad593e2 libceph: do crc calculations outside loop
Move blocks of code out of loops in read_partial_message_section()
and read_partial_message().  They were only was getting called at
the end of the last iteration of the loop anyway.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
a9a0c51af4 libceph: separate CRC calculation from byte swapping
Calculate CRC in a separate step from rearranging the byte order
of the result, to improve clarity and readability.

Use offsetof() to determine the number of bytes to include in the
CRC calculation.

In read_partial_message(), switch which value gets byte-swapped,
since the just-computed CRC is already likely to be in a register.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
bca064d236 libceph: use "do" in CRC-related Boolean variables
Change the name (and type) of a few CRC-related Boolean local
variables so they contain the word "do", to distingish their purpose
from variables used for holding an actual CRC value.

Note that in the process of doing this I identified a fairly serious
logic error in write_partial_msg_pages():  the value of "do_crc"
assigned appears to be the opposite of what it should be.  No
attempt to fix this is made here; this change preserves the
erroneous behavior.  The problem I found is documented here:
    http://tracker.newdream.net/issues/2064

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
cffaba15cd ceph: ensure Boolean options support both senses
Many ceph-related Boolean options offer the ability to both enable
and disable a feature.  For all those that don't offer this, add
a new option so that they do.

Note that ceph_show_options()--which reports mount options currently
in effect--only reports the option if it is different from the
default value.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
d3002b974c libceph: a few small changes
This gathers a number of very minor changes:
    - use %hu when formatting the a socket address's address family
    - null out the ceph_msgr_wq pointer after the queue has been
      destroyed
    - drop a needless cast in ceph_write_space()
    - add a WARN() call in ceph_state_change() in the event an
      unrecognized socket state is encountered
    - rearrange the logic in ceph_con_get() and ceph_con_put() so
      that:
        - the reference counts are only atomically read once
	- the values displayed via dout() calls are known to
	  be meaningful at the time they are formatted

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
41617d0c9c libceph: make ceph_tcp_connect() return int
There is no real need for ceph_tcp_connect() to return the socket
pointer it creates, since it already assigns it to con->sock, which
is visible to the caller.  Instead, have it return an error code,
which tidies things up a bit.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:51 -05:00
Alex Elder
6173d1f02f libceph: encapsulate some messenger cleanup code
Define a helper function to perform various cleanup operations.  Use
it both in the exit routine and in the init routine in the event of
an error.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:50 -05:00
Alex Elder
e0f43c9419 libceph: make ceph_msgr_wq private
The messenger workqueue has no need to be public.  So give it static
scope.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:50 -05:00
Alex Elder
859eb79948 libceph: encapsulate connection kvec operations
Encapsulate the operation of adding a new chunk of data to the next
open slot in a ceph_connection's out_kvec array.  Also add a "reset"
operation to make subsequent add operations start at the beginning
of the array again.

Use these routines throughout, avoiding duplicate code and ensuring
all calls are handled consistently.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:50 -05:00
Alex Elder
963be4d770 libceph: move prepare_write_banner()
One of the arguments to prepare_write_connect() indicates whether it
is being called immediately after a call to prepare_write_banner().
Move the prepare_write_banner() call inside prepare_write_connect(),
and reinterpret (and rename) the "after_banner" argument so it
indicates that prepare_write_connect() should *make* the call
rather than should know it has already been made.

This was split out from the next patch to highlight this change in
logic.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:50 -05:00
Alex Elder
ee57741c52 rbd: make ceph_parse_options() return a pointer
ceph_parse_options() takes the address of a pointer as an argument
and uses it to return the address of an allocated structure if
successful.  With this interface is not evident at call sites that
the pointer is always initialized.  Change the interface to return
the address instead (or a pointer-coded error code) to make the
validity of the returned pointer obvious.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:47 -05:00
Alex Elder
99f0f3b2c4 ceph: eliminate some abusive casts
This fixes some spots where a type cast to (void *) was used as
as a universal type hiding mechanism.  Instead, properly cast the
type to the intended target type.

Signed-off-by: Alex Elder <elder@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Alex Elder
bd40614512 ceph: eliminate some needless casts
This eliminates type casts in some places where they are not
required.

Signed-off-by: Alex Elder <elder@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Alex Elder
f64a93172b ceph: kill addr_str_lock spinlock; use atomic instead
A spinlock is used to protect a value used for selecting an array
index for a string used for formatting a socket address for human
consumption.  The index is reset to 0 if it ever reaches the maximum
index value.

Instead, use an ever-increasing atomic variable as a sequence
number, and compute the array index by masking off all but the
sequence number's lowest bits.  Make the number of entries in the
array a power of two to allow the use of such a mask (to avoid jumps
in the index value when the sequence number wraps).

The length of these strings is somewhat arbitrarily set at 60 bytes.
The worst-case length of a string produced is 54 bytes, for an IPv6
address that can't be shortened, e.g.:
    [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767
Change it so we arbitrarily use 64 bytes instead; if nothing else
it will make the array of these line up better in hex dumps.

Rename a few things to reinforce the distinction between the number
of strings in the array and the length of individual strings.

Signed-off-by: Alex Elder <elder@newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Alex Elder
a5bc3129a2 ceph: make use of "else" where appropriate
Rearrange ceph_tcp_connect() a bit, making use of "else" rather than
re-testing a value with consecutive "if" statements.  Don't record a
connection's socket pointer unless the connect operation is
successful.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Alex Elder
5766651971 ceph: use a shared zero page rather than one per messenger
Each messenger allocates a page to be used when writing zeroes
out in the event of error or other abnormal condition.  Instead,
use the kernel ZERO_PAGE() for that purpose.

Signed-off-by: Alex Elder <elder@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Xi Wang
6448669777 libceph: fix overflow check in crush_decode()
The existing overflow check (n > ULONG_MAX / b) didn't work, because
n = ULONG_MAX / b would both bypass the check and still overflow the
allocation size a + n * b.

The correct check should be (n > (ULONG_MAX - a) / b).

Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Sage Weil <sage@newdream.net>
2012-03-22 10:47:45 -05:00
Jim Schutt
182fac2689 net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the socket buffer
The Ceph messenger would sometimes queue multiple work items to write
data to a socket when the socket buffer was full.

Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the
same way that net/core/stream.c:sk_stream_write_space() does, i.e.,
clearing it only when sufficient space is available in the socket buffer.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Reviewed-by: Alex Elder <elder@dreamhost.com>
2012-03-22 10:47:45 -05:00
Pablo Neira Ayuso
a16a1647fa netfilter: ctnetlink: fix race between delete and timeout expiration
Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.

After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.

Tested-by: Kerin Millar <kerframil@gmail.com>
Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-17 01:47:08 -07:00
RongQing.Li
c577923756 ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.

[ bug introduced in 96b52e61be (ipv6: mcast: RCU conversions) ]

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16 21:56:42 -07:00
Eric Dumazet
cc34eb672e sch_sfq: revert dont put new flow at the end of flows
This reverts commit d47a0ac7b6 (sch_sfq: dont put new flow at the end of
flows)

As Jesper found out, patch sounded great but has bad side effects.

In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.

It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.

A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.

Reported-by: Jesper Dangaard Brouer <jdb@comx.dk>
Cc: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16 01:55:25 -07:00
Eric Dumazet
122bdf67f1 ipv6: fix icmp6_dst_alloc()
commit 87a115783 ( ipv6: Move xfrm_lookup() call down into
icmp6_dst_alloc().) forgot to convert one error path, leading
to crashes in mld_sendpack()

Many thanks to Dave Jones for providing a very complete bug report.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-16 01:53:42 -07:00
Eric Dumazet
dfd25ffffc tcp: fix syncookie regression
commit ea4fc0d619 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
added a serious regression on synflood handling.

Simon Kirby discovered a successful connection was delayed by 20 seconds
before being responsive.

In my tests, I discovered that xmit frames were lost, and needed ~4
retransmits and a socket dst rebuild before being really sent.

In case of syncookie initiated connection, we use a different path to
initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.

As ip_queue_xmit() now depends on inet flow being setup, fix this by
copying the temp flowi4 we use in cookie_v4_check().

Reported-by: Simon Kirby <sim@netnation.com>
Bisected-by: Simon Kirby <sim@netnation.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-11 15:52:12 -07:00
Steffen Klassert
ac3f48de09 route: Remove redirect_genid
As we invalidate the inetpeer tree along with the routing cache now,
we don't need a genid to reset the redirect handling when the routing
cache is flushed.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 00:30:32 -08:00
Steffen Klassert
5faa5df1fa inetpeer: Invalidate the inetpeer tree along with the routing cache
We initialize the routing metrics with the values cached on the
inetpeer in rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics.

To fix this issue, we replace the old tree with a fresh initialized
inet_peer_base. The old tree is removed later with a delayed work queue.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 00:30:24 -08:00
Paulius Zaleckas
5200959b83 bridge: fix state reporting when port is disabled
Now we have:
eth0: link *down*
br0: port 1(eth0) entered *forwarding* state

br_log_state(p) should be called *after* p->state is set
to BR_STATE_DISABLED.

Reported-by: Zilvinas Valinskas <zilvinas@wilibox.com>
Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 00:25:25 -08:00
Paulius Zaleckas
d9e179ecec bridge: br_log_state() s/entering/entered/
When br_log_state() is reporting state it should say "entered"
istead of "entering" since state at this point is already
changed.

Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 00:25:25 -08:00
David S. Miller
9259c483a3 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch 2012-03-07 22:49:01 -08:00
Jesse Gross
81e5d41d7e openvswitch: Fix checksum update for actions on UDP packets.
When modifying IP addresses or ports on a UDP packet we don't
correctly follow the rules for unchecksummed packets.  This meant
that packets without a checksum can be given a incorrect new checksum
and packets with a checksum can become marked as being unchecksummed.
This fixes it to handle those requirements.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-03-07 14:36:57 -08:00
Ben Pfaff
651a68ea2c openvswitch: Honor dp_ifindex, when specified, for vport lookup by name.
When OVS_VPORT_ATTR_NAME is specified and dp_ifindex is nonzero, the
logical behavior would be for the vport name lookup scope to be limited
to the specified datapath, but in fact the dp_ifindex value was ignored.
This commit causes the search scope to be honored.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-03-06 15:04:04 -08:00
Li Wei
d6ddef9e64 IPv6: Fix not join all-router mcast group when forwarding set.
When forwarding was set and a new net device is register,
we need add this device to the all-router mcast group.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 16:58:47 -05:00
Pablo Neira Ayuso
7413851197 netfilter: nf_conntrack: fix early_drop with reliable event delivery
If reliable event delivery is enabled and ctnetlink fails to deliver
the destroy event in early_drop, the conntrack subsystem cannot
drop any the candidate flow that was planned to be evicted.

Reported-by: Kerin Millar <kerframil@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:50 -05:00
Florian Westphal
739e4505a0 bridge: netfilter: don't call iptables on vlan packets if sysctl is off
When net.bridge.bridge-nf-filter-vlan-tagged is 0 (default), vlan packets
arriving should not be sent to ip(6)tables by bridge netfilter.

However, it turns out that we currently always send VLAN packets to
netfilter, if ..
a), CONFIG_VLAN_8021Q is enabled ; or
b), CONFIG_VLAN_8021Q is not set but rx vlan offload is enabled
   on the bridge port.

This is because bridge netfilter treats skb with
skb->protocol == ETH_P_IP{V6} as "non-vlan packet".

With rx vlan offload on or CONFIG_VLAN_8021Q=y, the vlan header has
already been removed here, and we cannot rely on skb->protocol alone.

Fix this by only using skb->protocol if the skb has no vlan tag,
or if a vlan tag is present and filter-vlan-tagged bridge netfilter
sysctl is enabled.

We cannot remove the skb->protocol == htons(ETH_P_8021Q) test
because the vlan tag is still around in the CONFIG_VLAN_8021Q=n &&
"ethtool -K $itf rxvlan off" case.

reproducer:
iptables -t raw -I PREROUTING -i br0
iptables -t raw -I PREROUTING -i br0.1

Then send packets to an ip address configured on br0.1 interface.
Even with net.bridge.bridge-nf-filter-vlan-tagged=0, the 1st rule
will match instead of the 2nd one.

With this patch applied, the 2nd rule will match instead.
In the non-local address case, netfilter won't be consulted after
this patch unless the sysctl is switched on.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:49 -05:00
Pablo Neira Ayuso
a157b9d5b5 netfilter: bridge: fix wrong pointer dereference
In adf7ff8, a invalid dereference was added in ebt_make_names.

CC [M]  net/bridge/netfilter/ebtables.o
net/bridge/netfilter/ebtables.c: In function `ebt_make_names':
net/bridge/netfilter/ebtables.c:1371:20: warning: `t' may be used uninitialized in this function [-Wuninitialized]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:49 -05:00
Pablo Neira Ayuso
8be619d1e4 netfilter: ctnetlink: remove incorrect spin_[un]lock_bh on NAT module autoload
Since 7d367e0, ctnetlink_new_conntrack is called without holding
the nf_conntrack_lock spinlock. Thus, ctnetlink_parse_nat_setup
does not require to release that spinlock anymore in the NAT module
autoload case.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:49 -05:00
Santosh Nayak
848edc6919 netfilter: ebtables: fix wrong name length while copying to user-space
user-space ebtables expects 32 bytes-long names, but xt_match names
use 29 bytes. We have to copy less 29 bytes and then, make sure we
fill the remaining bytes with zeroes.

Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:49 -05:00
Neal Cardwell
4648dc97af tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una
This commit fixes tcp_shift_skb_data() so that it does not shift
SACKed data below snd_una.

This fixes an issue whose symptoms exactly match reports showing
tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
net/ipv4/tcp_input.c:3418" thread on netdev).

Since 2008 (832d11c5cd)
tcp_shift_skb_data() had been shifting SACKed ranges that were below
snd_una. It checked that the *end* of the skb it was about to shift
from was above snd_una, but did not check that the end of the actual
shifted range was above snd_una; this commit adds that check.

Shifting SACKed ranges below snd_una is problematic because for such
ranges tcp_sacktag_one() short-circuits: it does not declare anything
as SACKed and does not increase sacked_out.

Before the fixes in commits cc9a672ee5
and daef52bab1, shifting SACKed ranges
below snd_una happened to work because tcp_shifted_skb() was always
(incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
tcp_sacktag_one() never short-circuited and always increased
tp->sacked_out in this case.

After those two fixes, my testing has verified that shifting SACKed
ranges below snd_una could cause tp->sacked_out to go negative with
the following sequence of events:

(1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
    then shifts a prefix of that skb that is below snd_una

(2) tcp_shifted_skb() increments the packet count of the
    already-SACKed prev sk_buff

(3) tcp_sacktag_one() sees the end of the new SACKed range is below
    snd_una, so it short-circuits and doesn't increase tp->sacked_out

(5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
    decrements tp->sacked_out by this "inflated" pcount that was
    missing a matching increase in tp->sacked_out, and hence
    tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
    to s32 is negative.

(6) this leads to the warnings seen in the recent "WARNING: at
    net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
    tcp_input.c:3418  WARN_ON((int)tp->sacked_out < 0);

More generally, I think this bug can be tickled in some cases where
two or more ACKs from the receiver are lost and then a DSACK arrives
that is immediately above an existing SACKed skb in the write queue.

This fix changes tcp_shift_skb_data() to abort this sequence at step
(1) in the scenario above by noticing that the bytes are below snd_una
and not shifting them.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:49 -05:00
Ulrich Weber
d1d81d4c3d bridge: check return value of ipv6_dev_get_saddr()
otherwise source IPv6 address of ICMPV6_MGM_QUERY packet
might be random junk if IPv6 is disabled on interface or
link-local address is not yet ready (DAD).

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-05 16:45:34 -05:00
Eric Dumazet
a4b64fbe48 rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo()
nlmsg_parse() might return an error, so test its return value before
potential random memory accesses.

Errors introduced in commit 115c9b8192 (rtnetlink: Fix problem with
buffer allocation)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-04 22:02:55 -05:00
Joakim Tjernlund
709e1b5cd9 bridge: message age needs to increase, not decrease.
commit bridge: send proper message_age in config BPDU
added this gem:
  bpdu.message_age = (jiffies - root->designated_age)
  p->designated_age = jiffies + bpdu->message_age;
Notice how bpdu->message_age is negated when reassigned to
bpdu.message_age. This causes message age to decrease breaking the
STP protocol.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-04 21:57:40 -05:00
Joakim Tjernlund
aaca735f4f bridge: Adjust min age inc for HZ > 256
min age increment needs to round up its min age tick for all
HZ values to guarantee message age is increasing.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-04 21:57:39 -05:00
Neal Cardwell
c0638c247f tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
to mark the first portion as lost. This is for two primary reasons:

(1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
doing this, it preserves the sum of their packet counts in order to
reflect the real-world dynamics on the wire. But given that skbs can
have remainders that do not align to MSS boundaries, this packet count
preservation means that for SACKed skbs there is not necessarily a
direct linear relationship between tcp_skb_pcount(skb) and
skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
off and mark as lost a prefix of length (packets - oldcnt)*mss from
SACKed skbs were leading to occasional failures of the WARN_ON(len >
skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
recent "crash in tcp_fragment" thread on netdev).

(2) there is no real point in fragmenting off part of a SACKed skb and
calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
for SACKed skbs.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-03 14:57:59 -05:00
David S. Miller
1e804aecd7 Merge branch 'master' of git://1984.lsi.us.es/net 2012-03-01 15:40:36 -05:00
Neal Cardwell
4c90d3b303 tcp: fix false reordering signal in tcp_shifted_skb
When tcp_shifted_skb() shifts bytes from the skb that is currently
pointed to by 'highest_sack' then the increment of
TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This
implicit advancement, combined with the recent fix to pass the correct
SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think
that the newly SACKed range was before the tcp_highest_sack_seq(),
leading to a call to tcp_update_reordering() with a degree of
reordering matching the size of the newly SACKed range (typically just
1 packet, which is a NOP, but potentially larger).

This commit fixes this by simply calling tcp_sacktag_one() before the
TCP_SKB_CB(skb)->seq advancement that can advance our notion of the
highest SACKed sequence.

Correspondingly, we can simplify the code a little now that
tcp_shifted_skb() should update the lost_cnt_hint in all cases where
skb == tp->lost_skb_hint.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-28 15:06:46 -05:00
John W. Linville
eea79e0713 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-02-27 13:14:47 -05:00
Linus Torvalds
203738e548 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
1) ICMP sockets leave err uninitialized but we try to return it for the
   unsupported MSG_OOB case, reported by Dave Jones.

2) Add new Zaurus device ID entries, from Dave Jones.

3) Pointer calculation in hso driver memset is wrong, from Dan
   Carpenter.

4) ks8851_probe() checks unsigned value as negative, fix also from Dan
   Carpenter.

5) Fix crashes in atl1c driver due to TX queue handling, from Eric
   Dumazet.  I anticipate some TX side locking fixes coming in the near
   future for this driver as well.

6) The inline directive fix in Bluetooth which was breaking the build
   only with very new versions of GCC, from Johan Hedberg.

7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
   window, reported by Meelis Roos and fixed by Eric Dumazet.

8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng.

9) Some ip6_route_output() callers test the return value for NULL, but
   this never happens as the convention is to return a dst entry with
   dst->error set.  Fixes from RonQing Li.

10) Logitech Harmony 900 should be handled by zaurus driver not
   cdc_ether, update white lists and black lists accordingly.  From
   Scott Talbert.

11) Receiving from certain kinds of devices there won't be a MAC header,
   so there is no MAC header to fixup in the IPSEC code, and if we try
   to do it we'll crash.  Fix from Eric Dumazet.

12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
   Petrilin.

13) Fix regression in link-down handling in davinci_emac which causes
   all RX descriptors to be freed up and therefore RX to wedge
   completely, from Christian Riesch.

14) It took two attempts, but ctnetlink soft lockups seem to be
   cured now, from Pablo Neira Ayuso.

15) Endianness bug fix in ENIC driver, from Santosh Nayak.

16) The long ago conversion of the PPP fragmentation code over to
   abstracted SKB list handling wasn't perfect, once we get an
   out of sequence SKB we don't flush the rest of them like we
   should.  From Ben McKeegan.

17) Fix regression of ->ip_summed initialization in sfc driver.
   From Ben Hutchings.

18) Bluetooth timeout mistakenly using msecs instead of jiffies,
   from Andrzej Kaczmarek.

19) Using _sync variant of work cancellation results in deadlocks,
   use the non _sync variants instead.  From Andre Guedes.

20) Bluetooth rfcomm code had reference counting problems leading
   to crashes, fix from Octavian Purdila.

21) The conversion of netem over to classful qdisc handling added
   two bugs to netem_dequeue(), fixes from Eric Dumazet.

22) Missing pci_iounmap() in ATM Solos driver.  Fix from Julia Lawall.

23) b44_pci_exit() should not have __exit tag since it's invoked from
   non-__exit code.  From Nikola Pajkovsky.

24) The conversion of the neighbour hash tables over to RCU added a
   race, fixed here by adding the necessary reread of tbl->nht, fix
   from Michel Machado.

25) When we added VF (virtual function) attributes for network device
   dumps, this potentially bloats up the size of the dump of one
   network device such that the dump size is too large for the buffer
   allocated by properly written netlink applications.

   In particular, if you add 255 VFs to a network device, parts of
   GLIBC stop working.

   To fix this, we add an attribute that is used to turn on these
   extended portions of the network device dump.  Sophisticaed
   applications like 'ip' that want to see this stuff  will be changed
   to set the attribute, whereas things like GLIBC that don't care
   about VFs simply will not, and therefore won't be busted by the
   mere presence of VFs on a network device.

   Thanks to the tireless work of Greg Rose on this fix.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
  sfc: Fix assignment of ip_summed for pre-allocated skbs
  ppp: fix 'ppp_mp_reconstruct bad seq' errors
  enic: Fix endianness bug.
  gre: fix spelling in comments
  netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
  Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
  davinci_emac: Do not free all rx dma descriptors during init
  mlx4_core: Fixing array indexes when setting port types
  phy: IC+101G and PHY_HAS_INTERRUPT flag
  netdev/phy/icplus: Correct broken phy_init code
  ipsec: be careful of non existing mac headers
  Move Logitech Harmony 900 from cdc_ether to zaurus
  hso: memsetting wrong data in hso_get_count()
  netfilter: ip6_route_output() never returns NULL.
  ethernet/broadcom: ip6_route_output() never returns NULL.
  ipv6: ip6_route_output() never returns NULL.
  jme: Fix FIFO flush issue
  atm: clip: remove clip_tbl
  ipv4: ping: Fix recvmsg MSG_OOB error handling.
  rtnetlink: Fix problem with buffer allocation
  ...
2012-02-26 12:47:17 -08:00
Florian Westphal
e899b1119f netfilter: bridge: fix module autoload in compat case
We expected 0 if module doesn't exist, which is no longer the case
(42046e2e45,
netfilter: x_tables: return -ENOENT for non-existant matches/targets).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-25 14:29:09 +01:00
David S. Miller
e807e566e9 Merge branch 'master' of git://1984.lsi.us.es/net 2012-02-24 17:41:57 -05:00
stephen hemminger
bff528578f gre: fix spelling in comments
The original spelling and bad word choice makes these comments hard to read.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-24 17:41:11 -05:00
Jozsef Kadlecsik
7d367e0668 netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
Marcell Zambo and Janos Farago noticed and reported that when
new conntrack entries are added via netlink and the conntrack table
gets full, soft lockup happens. This is because the nf_conntrack_lock
is held while nf_conntrack_alloc is called, which is in turn wants
to lock nf_conntrack_lock while evicting entries from the full table.

The patch fixes the soft lockup with limiting the holding of the
nf_conntrack_lock to the minimum, where it's absolutely required.
It required to extend (and thus change) nf_conntrack_hash_insert
so that it makes sure conntrack and ctnetlink do not add the same entry
twice to the conntrack table.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-24 12:24:15 +01:00
Pablo Neira Ayuso
279072882d Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
This reverts commit af14cca162.

This patch contains a race condition between packets and ctnetlink
in the conntrack addition. A new patch to fix this issue follows up.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-24 12:19:57 +01:00
Eric Dumazet
03606895cd ipsec: be careful of non existing mac headers
Niccolo Belli reported ipsec crashes in case we handle a frame without
mac header (atm in his case)

Before copying mac header, better make sure it is present.

Bugzilla reference:  https://bugzilla.kernel.org/show_bug.cgi?id=42809

Reported-by: Niccolò Belli <darkbasic@linuxsystems.it>
Tested-by: Niccolò Belli <darkbasic@linuxsystems.it>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-23 16:50:45 -05:00
David S. Miller
4a2258dddd Merge branch 'nf' of git://1984.lsi.us.es/net 2012-02-23 00:20:14 -05:00
RongQing.Li
5d38b1f8cf netfilter: ip6_route_output() never returns NULL.
ip6_route_output() never returns NULL, so it is wrong to
check if the return value is NULL.

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-22 15:30:15 -05:00
RongQing.Li
5095d64db1 ipv6: ip6_route_output() never returns NULL.
ip6_route_output() never returns NULL, so it is wrong to
check if the return value is NULL.

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-22 15:30:14 -05:00
Eric Dumazet
597cdbc223 atm: clip: remove clip_tbl
Commit 32092ecf06 (atm: clip: Use device neigh support on top of
"arp_tbl".) introduced a bug since clip_tbl is zeroed : Crash occurs in
__neigh_for_each_release()

idle_timer_check() must use instead arp_tbl and neigh_check_cb() should
ignore non clip neighbours.

Idea from David Miller.

Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-22 02:23:25 -05:00
David S. Miller
a5e7424d42 ipv4: ping: Fix recvmsg MSG_OOB error handling.
Don't return an uninitialized variable as the error, return
-EOPNOTSUPP instead.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 17:59:19 -05:00
Greg Rose
115c9b8192 rtnetlink: Fix problem with buffer allocation
Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
is a 32 bit value that can be used to indicate to the kernel that
certain extended ifinfo values are requested by the user application.
At this time the only mask value defined is RTEXT_FILTER_VF to
indicate that the user wants the ifinfo dump to send information
about the VFs belonging to the interface.

This patch fixes a bug in which certain applications do not have
large enough buffers to accommodate the extra information returned
by the kernel with large numbers of SR-IOV virtual functions.
Those applications will not send the new netlink attribute with
the interface info dump request netlink messages so they will
not get unexpectedly large request buffers returned by the kernel.

Modifies the rtnl_calcit function to traverse the list of net
devices and compute the minimum buffer size that can hold the
info dumps of all matching devices based upon the filter passed
in via the new netlink attribute filter mask.  If no filter
mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.

With this change it is possible to add yet to be defined netlink
attributes to the dump request which should make it fairly extensible
in the future.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 16:56:45 -05:00
Michel Machado
84338a6c9d neighbour: Fixed race condition at tbl->nht
When the fixed race condition happens:

1. While function neigh_periodic_work scans the neighbor hash table
pointed by field tbl->nht, it unlocks and locks tbl->lock between
buckets in order to call cond_resched.

2. Assume that function neigh_periodic_work calls cond_resched, that is,
the lock tbl->lock is available, and function neigh_hash_grow runs.

3. Once function neigh_hash_grow finishes, and RCU calls
neigh_hash_free_rcu, the original struct neigh_hash_table that function
neigh_periodic_work was using doesn't exist anymore.

4. Once back at neigh_periodic_work, whenever the old struct
neigh_hash_table is accessed, things can go badly.

Signed-off-by: Michel Machado <michel@digirati.com.br>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 16:28:10 -05:00
Mohammed Shafi Shajakhan
0dee00686d mac80211: Fix a warning on changing to monitor mode from STA
nothing needs to be done for monitor/AP_VLAN mode on calling
ieee80211_bss_info_change_notify -> drv_bss_info_changed with the change
flag 'BSS_CHANGED_IDLE'. 'wl1271' seems to use BSS_CHANGED_IDLE only for
STA and IBSS mode. further the non-idle state of the monitor mode is
taken care by the 'count' variable which counts non-idle interfaces.
ieee80211_idle_off(local, "in use") will be called.
this fixes the following WARNING when we have initially STA mode
(network manager running) and not associated, and change it to monitor
mode with network manager disabled and bringing up the monitor mode.
this changes the idle state from 'true' (STA unassociated) to 'false'
(MONITOR mode)
exposed by the commit 405385f8ce
"mac80211: set bss_conf.idle when vif is connected"

	WARNING: net/mac80211/main.c:212
	ieee80211_bss_info_change_notify+0x1cf/0x330 [mac80211]()
	Hardware name: 64756D6
	Pid: 3835, comm: ifconfig Tainted: G           O
	3.3.0-rc3-wl #9
	Call Trace:
	  [<c0133b02>] warn_slowpath_common+0x72/0xa0
	  [<fc8e8c3f>] ?
	  ieee80211_bss_info_change_notify+0x1cf/0x330 [mac80211]
	  [<fc8e8c3f>] ?
	  ieee80211_bss_info_change_notify+0x1cf/0x330 [mac80211]
	  [<c0133b52>] warn_slowpath_null+0x22/0x30
	  [<fc8e8c3f>]
	  ieee80211_bss_info_change_notify+0x1cf/0x330 [mac80211]
	  [<fc8f9de3>] __ieee80211_recalc_idle+0x113/0x430
	  [mac80211]
	  [<fc8fabc6>] ieee80211_do_open+0x156/0x7e0 [mac80211]
	  [<fc8f8a25>] ?
	  ieee80211_check_concurrent_iface+0x25/0x180 [mac80211]
	  [<c015dd9f>] ? raw_notifier_call_chain+0x1f/0x30
	  [<fc8fb290>] ieee80211_open+0x40/0x80 [mac80211]
	  [<c05894f6>] __dev_open+0x96/0xe0
	  [<c068fba5>] ? _raw_spin_unlock_bh+0x35/0x40
	  [<c05881d9>] __dev_change_flags+0x109/0x170
	  [<c0589423>] dev_change_flags+0x23/0x60
	  [<c05f3770>] devinet_ioctl+0x6a0/0x770

 ieee80211 phy0: device no longer idle - in use

Cc: Eliad Peller <eliad@wizery.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-21 14:45:27 -05:00
Mohammed Shafi Shajakhan
8617b093d0 mac80211: zero initialize count field in ieee80211_tx_rate
rate control algorithms concludes the rate as invalid
with rate[i].idx < -1 , while they do also check for rate[i].count is
non-zero. it would be safer to zero initialize the 'count' field.
recently we had a ath9k rate control crash where the ath9k rate control
in ath_tx_status assumed to check only for rate[i].count being non-zero
in one instance and ended up in using invalid rate index for
'connection monitoring NULL func frames' which eventually lead to the crash.
thanks to Pavel Roskin for fixing it and finding the root cause.
https://bugzilla.redhat.com/show_bug.cgi?id=768639

Cc: stable@vger.kernel.org
Cc: Pavel Roskin <proski@gnu.org>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-21 14:45:26 -05:00
Jozsef Kadlecsik
af14cca162 netfilter: ctnetlink: fix soft lockup when netlink adds new entries
Marcell Zambo and Janos Farago noticed and reported that when
new conntrack entries are added via netlink and the conntrack table
gets full, soft lockup happens. This is because the nf_conntrack_lock
is held while nf_conntrack_alloc is called, which is in turn wants
to lock nf_conntrack_lock while evicting entries from the full table.

The patch fixes the soft lockup with limiting the holding of the
nf_conntrack_lock to the minimum, where it's absolutely required.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-21 16:25:31 +01:00
Linus Torvalds
8ebbfb4957 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Assorted fixes, sat in -next for a week or so...

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ocfs2: deal with wraparounds of i_nlink in ocfs2_rename()
  vfs: fix compat_sys_stat() handling of overflows in st_nlink
  quota: Fix deadlock with suspend and quotas
  vfs: Provide function to get superblock and wait for it to thaw
  vfs: fix panic in __d_lookup() with high dentry hashtable counts
  autofs4 - fix lockdep splat in autofs
  vfs: fix d_inode_lookup() dentry ref leak
2012-02-20 16:13:58 -08:00
John W. Linville
9d4990a260 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-02-20 14:47:17 -05:00
Eric Dumazet
cd961c2ca9 netem: fix dequeue
commit 50612537e9 (netem: fix classful handling) added two errors in
netem_dequeue()

1) After checking skb at the head of tfifo queue for time constraints,
   it dequeues tail skb, thus adding unwanted reordering.

2) qdisc stats are updated twice per packet
   (one when packet dequeued from tfifo, once when delivered)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-19 18:57:50 -05:00
Felix Fietkau
216c57b214 mac80211: do not call rate control .tx_status before .rate_init
Most rate control implementations assume .get_rate and .tx_status are only
called once the per-station data has been fully initialized.
minstrel_ht crashes if this assumption is violated.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-15 13:56:06 -05:00
Johannes Berg
4b5a433ae5 mac80211: call rate control only after init
There are situations where we don't have the
necessary rate control information yet for
station entries, e.g. when associating. This
currently doesn't really happen due to the
dummy station handling; explicitly disabling
rate control when it's not initialised will
allow us to remove dummy stations.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-15 13:56:06 -05:00
John W. Linville
33b5d30cd8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-02-15 13:41:52 -05:00
Ulisses Furquim
24d2b8c0ac Bluetooth: Fix possible use after free in delete path
We need to use the _sync() version for cancelling the info and security
timer in the L2CAP connection delete path. Otherwise the delayed work
handler might run after the connection object is freed.

Signed-off-by: Ulisses Furquim <ulisses@profusion.mobi>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:26 +02:00
Ulisses Furquim
6de3275082 Bluetooth: Remove usage of __cancel_delayed_work()
__cancel_delayed_work() is being used in some paths where we cannot
sleep waiting for the delayed work to finish. However, that function
might return while the timer is running and the work will be queued
again. Replace the calls with safer cancel_delayed_work() version
which spins until the timer handler finishes on other CPUs and
cancels the delayed work.

Signed-off-by: Ulisses Furquim <ulisses@profusion.mobi>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:26 +02:00
Johan Hedberg
ca0d6c7ece Bluetooth: Add missing QUIRK_NO_RESET test to hci_dev_do_close
We should only perform a reset in hci_dev_do_close if the
HCI_QUIRK_NO_RESET flag is set (since in such a case a reset will not be
performed when initializing the device).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2012-02-15 13:09:26 +02:00
Octavian Purdila
cf33e77b76 Bluetooth: Fix RFCOMM session reference counting issue
There is an imbalance in the rfcomm_session_hold / rfcomm_session_put
operations which causes the following crash:

[  685.010159] BUG: unable to handle kernel paging request at 6b6b6b6b
[  685.010169] IP: [<c149d76d>] rfcomm_process_dlcs+0x1b/0x15e
[  685.010181] *pdpt = 000000002d665001 *pde = 0000000000000000
[  685.010191] Oops: 0000 [#1] PREEMPT SMP
[  685.010247]
[  685.010255] Pid: 947, comm: krfcommd Tainted: G         C  3.0.16-mid8-dirty #44
[  685.010266] EIP: 0060:[<c149d76d>] EFLAGS: 00010246 CPU: 1
[  685.010274] EIP is at rfcomm_process_dlcs+0x1b/0x15e
[  685.010281] EAX: e79f551c EBX: 6b6b6b6b ECX: 00000007 EDX: e79f40b4
[  685.010288] ESI: e79f4060 EDI: ed4e1f70 EBP: ed4e1f68 ESP: ed4e1f50
[  685.010295]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  685.010303] Process krfcommd (pid: 947, ti=ed4e0000 task=ed43e5e0 task.ti=ed4e0000)
[  685.010308] Stack:
[  685.010312]  ed4e1f68 c149eb53 e5925150 e79f4060 ed500000 ed4e1f70 ed4e1f80 c149ec10
[  685.010331]  00000000 ed43e5e0 00000000 ed4e1f90 ed4e1f9c c149ec87 0000bf54 00000000
[  685.010348]  00000000 ee03bf54 c149ec37 ed4e1fe4 c104fe01 00000000 00000000 00000000
[  685.010367] Call Trace:
[  685.010376]  [<c149eb53>] ? rfcomm_process_rx+0x6e/0x74
[  685.010387]  [<c149ec10>] rfcomm_process_sessions+0xb7/0xde
[  685.010398]  [<c149ec87>] rfcomm_run+0x50/0x6d
[  685.010409]  [<c149ec37>] ? rfcomm_process_sessions+0xde/0xde
[  685.010419]  [<c104fe01>] kthread+0x63/0x68
[  685.010431]  [<c104fd9e>] ? __init_kthread_worker+0x42/0x42
[  685.010442]  [<c14dae82>] kernel_thread_helper+0x6/0xd

This issue has been brought up earlier here:

https://lkml.org/lkml/2011/5/21/127

The issue appears to be the rfcomm_session_put in rfcomm_recv_ua. This
operation doesn't seem be to required as for the non-initiator case we
have the rfcomm_process_rx doing an explicit put and in the initiator
case the last dlc_unlink will drive the reference counter to 0.

There have been several attempts to fix these issue:

6c2718d Bluetooth: Do not call rfcomm_session_put() for RFCOMM UA on closed socket
683d949 Bluetooth: Never deallocate a session when some DLC points to it

but AFAICS they do not fix the issue just make it harder to reproduce.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: Gopala Krishna Murala <gopala.krishna.murala@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:26 +02:00
Octavian Purdila
b5a30dda65 Bluetooth: silence lockdep warning
Since bluetooth uses multiple protocols types, to avoid lockdep
warnings, we need to use different lockdep classes (one for each
protocol type).

This is already done in bt_sock_create but it misses a couple of cases
when new connections are created. This patch corrects that to fix the
following warning:

<4>[ 1864.732366] =======================================================
<4>[ 1864.733030] [ INFO: possible circular locking dependency detected ]
<4>[ 1864.733544] 3.0.16-mid3-00007-gc9a0f62 #3
<4>[ 1864.733883] -------------------------------------------------------
<4>[ 1864.734408] t.android.btclc/4204 is trying to acquire lock:
<4>[ 1864.734869]  (rfcomm_mutex){+.+.+.}, at: [<c14970ea>] rfcomm_dlc_close+0x15/0x30
<4>[ 1864.735541]
<4>[ 1864.735549] but task is already holding lock:
<4>[ 1864.736045]  (sk_lock-AF_BLUETOOTH){+.+.+.}, at: [<c1498bf7>] lock_sock+0xa/0xc
<4>[ 1864.736732]
<4>[ 1864.736740] which lock already depends on the new lock.
<4>[ 1864.736750]
<4>[ 1864.737428]
<4>[ 1864.737437] the existing dependency chain (in reverse order) is:
<4>[ 1864.738016]
<4>[ 1864.738023] -> #1 (sk_lock-AF_BLUETOOTH){+.+.+.}:
<4>[ 1864.738549]        [<c1062273>] lock_acquire+0x104/0x140
<4>[ 1864.738977]        [<c13d35c1>] lock_sock_nested+0x58/0x68
<4>[ 1864.739411]        [<c1493c33>] l2cap_sock_sendmsg+0x3e/0x76
<4>[ 1864.739858]        [<c13d06c3>] __sock_sendmsg+0x50/0x59
<4>[ 1864.740279]        [<c13d0ea2>] sock_sendmsg+0x94/0xa8
<4>[ 1864.740687]        [<c13d0ede>] kernel_sendmsg+0x28/0x37
<4>[ 1864.741106]        [<c14969ca>] rfcomm_send_frame+0x30/0x38
<4>[ 1864.741542]        [<c1496a2a>] rfcomm_send_ua+0x58/0x5a
<4>[ 1864.741959]        [<c1498447>] rfcomm_run+0x441/0xb52
<4>[ 1864.742365]        [<c104f095>] kthread+0x63/0x68
<4>[ 1864.742742]        [<c14d5182>] kernel_thread_helper+0x6/0xd
<4>[ 1864.743187]
<4>[ 1864.743193] -> #0 (rfcomm_mutex){+.+.+.}:
<4>[ 1864.743667]        [<c1061ada>] __lock_acquire+0x988/0xc00
<4>[ 1864.744100]        [<c1062273>] lock_acquire+0x104/0x140
<4>[ 1864.744519]        [<c14d2c70>] __mutex_lock_common+0x3b/0x33f
<4>[ 1864.744975]        [<c14d303e>] mutex_lock_nested+0x2d/0x36
<4>[ 1864.745412]        [<c14970ea>] rfcomm_dlc_close+0x15/0x30
<4>[ 1864.745842]        [<c14990d9>] __rfcomm_sock_close+0x5f/0x6b
<4>[ 1864.746288]        [<c1499114>] rfcomm_sock_shutdown+0x2f/0x62
<4>[ 1864.746737]        [<c13d275d>] sys_socketcall+0x1db/0x422
<4>[ 1864.747165]        [<c14d42f0>] syscall_call+0x7/0xb

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:26 +02:00
Andrzej Kaczmarek
6e1da683f7 Bluetooth: l2cap_set_timer needs jiffies as timeout value
After moving L2CAP timers to workqueues l2cap_set_timer expects timeout
value to be specified in jiffies but constants defined in miliseconds
are used. This makes timeouts unreliable when CONFIG_HZ is not set to
1000.

__set_chan_timer macro still uses jiffies as input to avoid multiple
conversions from/to jiffies for sk_sndtimeo value which is already
specified in jiffies.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Ackec-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:25 +02:00
Andrzej Kaczmarek
a63752552b Bluetooth: Fix sk_sndtimeo initialization for L2CAP socket
sk_sndtime value should be specified in jiffies thus initial value
needs to be converted from miliseconds. Otherwise this timeout is
unreliable when CONFIG_HZ is not set to 1000.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:25 +02:00
Johan Hedberg
4aa832c27e Bluetooth: Remove bogus inline declaration from l2cap_chan_connect
As reported by Dan Carpenter this function causes a Sparse warning and
shouldn't be declared inline:

include/net/bluetooth/l2cap.h:837:30 error: marked inline, but without a
definition"

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
2012-02-15 13:09:25 +02:00
Peter Hurley
18daf1644e Bluetooth: Fix l2cap conn failures for ssp devices
Commit 330605423c fixed l2cap conn establishment for non-ssp remote
devices by not setting HCI_CONN_ENCRYPT_PEND every time conn security
is tested (which was always returning failure on any subsequent
security checks).

However, this broke l2cap conn establishment for ssp remote devices
when an ACL link was already established at SDP-level security. This
fix ensures that encryption must be pending whenever authentication
is also pending.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2012-02-15 13:09:25 +02:00
Eric Dumazet
58e05f357a netpoll: netpoll_poll_dev() should access dev->flags
commit 5a698af53f (bond: service netpoll arp queue on master device)
tested IFF_SLAVE flag against dev->priv_flags instead of dev->flags

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 15:24:26 -05:00
Axel Lin
f65bd5ec47 RxRPC: Fix kcalloc parameters swapped
The first parameter should be "number of elements" and the second parameter
should be "element size".

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:41:55 -05:00
Neal Cardwell
0af2a0d057 tcp: fix tcp_shifted_skb() adjustment of lost_cnt_hint for FACK
This commit ensures that lost_cnt_hint is correctly updated in
tcp_shifted_skb() for FACK TCP senders. The lost_cnt_hint adjustment
in tcp_sacktag_one() only applies to non-FACK senders, so FACK senders
need their own adjustment.

This applies the spirit of 1e5289e121 -
except now that the sequence range passed into tcp_sacktag_one() is
correct we need only have a special case adjustment for FACK.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:38:57 -05:00
Dimitri Sivanich
074b85175a vfs: fix panic in __d_lookup() with high dentry hashtable counts
When the number of dentry cache hash table entries gets too high
(2147483648 entries), as happens by default on a 16TB system, use of a
signed integer in the dcache_init() initialization loop prevents the
dentry_hashtable from getting initialized, causing a panic in
__d_lookup().  Fix this in dcache_init() and similar areas.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-02-13 20:45:38 -05:00
Neal Cardwell
daef52bab1 tcp: fix range tcp_shifted_skb() passes to tcp_sacktag_one()
Fix the newly-SACKed range to be the range of newly-shifted bytes.

Previously - since 832d11c5cd -
tcp_shifted_skb() incorrectly called tcp_sacktag_one() with the start
and end sequence numbers of the skb it passes in set to the range just
beyond the range that is newly-SACKed.

This commit also removes a special-case adjustment to lost_cnt_hint in
tcp_shifted_skb() since the pre-existing adjustment of lost_cnt_hint
in tcp_sacktag_one() now properly handles this things now that the
correct start sequence number is passed in.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-13 01:00:22 -05:00
Neal Cardwell
cc9a672ee5 tcp: allow tcp_sacktag_one() to tag ranges not aligned with skbs
This commit allows callers of tcp_sacktag_one() to pass in sequence
ranges that do not align with skb boundaries, as tcp_shifted_skb()
needs to do in an upcoming fix in this patch series.

In fact, now tcp_sacktag_one() does not need to depend on an input skb
at all, which makes its semantics and dependencies more clear.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-13 01:00:21 -05:00
Linus Torvalds
8df54d622a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Quoth David:

1) GRO MAC header comparisons were ethernet specific, breaking other
   link types.  This required a multi-faceted fix to cure the originally
   noted case (Infiniband), because IPoIB was lying about it's actual
   hard header length.  Thanks to Eric Dumazet, Roland Dreier, and
   others.

2) Fix build failure when INET_UDP_DIAG is built in and ipv6 is modular.
   From Anisse Astier.

3) Off by ones and other bug fixes in netprio_cgroup from Neil Horman.

4) ipv4 TCP reset generation needs to respect any network interface
   binding from the socket, otherwise route lookups might give a
   different result than all the other segments received.  From Shawn
   Lu.

5) Fix unintended regression in ipv4 proxy ARP responses, from Thomas
   Graf.

6) Fix SKB under-allocation bug in sh_eth, from Yoshihiro Shimoda.

7) Revert skge PCI mapping changes that are causing crashes for some
   folks, from Stephen Hemminger.

8) IPV4 route lookups fill in the wildcarded fields of the given flow
   lookup key passed in, which is fine most of the time as this is
   exactly what the caller's want.  However there are a few cases that
   want to retain the original flow key values afterwards, so handle
   those cases properly.  Fix from Julian Anastasov.

9) IGB/IXGBE VF lookup bug fixes from Greg Rose.

10) Properly null terminate filename passed to ethtool flash device
    method, from Ben Hutchings.

11) S3 resume fix in via-velocity from David Lv.

12) Fix double SKB free during xmit failure in CAIF, from Dmitry
    Tarnyagin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
  net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabled
  ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.
  netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m
  netprio_cgroup: don't allocate prio table when a device is registered
  netprio_cgroup: fix an off-by-one bug
  bna: fix error handling of bnad_get_flash_partition_by_offset()
  isdn: type bug in isdn_net_header()
  net: Make qdisc_skb_cb upper size bound explicit.
  ixgbe: ethtool: stats user buffer overrun
  ixgbe: dcb: up2tc mapping lost on disable/enable CEE DCB state
  ixgbe: do not update real num queues when netdev is going away
  ixgbe: Fix broken dependency on MAX_SKB_FRAGS being related to page size
  ixgbe: Fix case of Tx Hang in PF with 32 VFs
  ixgbe: fix vf lookup
  igb: fix vf lookup
  e1000: add dropped DMA receive enable back in for WoL
  gro: more generic L2 header check
  IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses
  zd1211rw: firmware needs duration_id set to zero for non-pspoll frames
  net: enable TC35815 for MIPS again
  ...
2012-02-10 14:18:46 -08:00
Thomas Graf
70620c46ac net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabled
Commit 653241 (net: RFC3069, private VLAN proxy arp support) changed
the behavior of arp proxy to send arp replies back out on the interface
the request came in even if the private VLAN feature is disabled.

Previously we checked rt->dst.dev != skb->dev for in scenarios, when
proxy arp is enabled on for the netdevice and also when individual proxy
neighbour entries have been added.

This patch adds the check back for the pneigh_lookup() scenario.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:13:36 -05:00
Li Wei
5dc7883f2a ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.
This patch fix a bug which introduced by commit ac8a4810 (ipv4: Save
nexthop address of LSRR/SSRR option to IPCB.).In that patch, we saved
the nexthop of SRR in ip_option->nexthop and update iph->daddr until
we get to ip_forward_options(), but we need to update it before
ip_rt_get_source(), otherwise we may get a wrong src.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:12:12 -05:00
Neil Horman
2b73bc65e2 netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m
When the netprio_cgroup module is not loaded, net_prio_subsys_id
is -1, and so sock_update_prioidx() accesses cgroup_subsys array
with negative index subsys[-1].

Make the code resembles cls_cgroup code, which is bug free.

Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:08:57 -05:00
Neil Horman
f5c38208d3 netprio_cgroup: don't allocate prio table when a device is registered
So we delay the allocation till the priority is set through cgroup,
and this makes skb_update_priority() faster when it's not set.

This also eliminates an off-by-one bug similar with the one fixed
in the previous patch.

Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:08:57 -05:00
Neil Horman
a87dfe14a7 netprio_cgroup: fix an off-by-one bug
# mount -t cgroup xxx /mnt
  # mkdir /mnt/tmp
  # cat /mnt/tmp/net_prio.ifpriomap
  lo 0
  eth0 0
  virbr0 0
  # echo 'lo 999' > /mnt/tmp/net_prio.ifpriomap
  # cat /mnt/tmp/net_prio.ifpriomap
  lo 999
  eth0 0
  virbr0 4101267344

We got weired output, because we exceeded the boundary of the array.
We may even crash the kernel..

Origionally-authored-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 15:08:56 -05:00
Mohammed Shafi Shajakhan
b57e6b560f mac80211: Fix a rwlock bad magic bug
read_lock(&tpt_trig->trig.leddev_list_lock) is accessed via the path
ieee80211_open (->) ieee80211_do_open (->) ieee80211_mod_tpt_led_trig
(->) ieee80211_start_tpt_led_trig (->) tpt_trig_timer before initializing
it.
the intilization of this read/write lock happens via the path
ieee80211_led_init (->) led_trigger_register, but we are doing
'ieee80211_led_init'  after 'ieeee80211_if_add' where we
register netdev_ops.
so we access leddev_list_lock before initializing it and causes the
following bug in chrome laptops with AR928X cards with the following
script

while true
do
sudo modprobe -v ath9k
sleep 3
sudo modprobe -r ath9k
sleep 3
done

	BUG: rwlock bad magic on CPU#1, wpa_supplicant/358, f5b9eccc
	Pid: 358, comm: wpa_supplicant Not tainted 3.0.13 #1
	Call Trace:

	[<8137b9df>] rwlock_bug+0x3d/0x47
	[<81179830>] do_raw_read_lock+0x19/0x29
	[<8137f063>] _raw_read_lock+0xd/0xf
	[<f9081957>] tpt_trig_timer+0xc3/0x145 [mac80211]
	[<f9081f3a>] ieee80211_mod_tpt_led_trig+0x152/0x174 [mac80211]
	[<f9076a3f>] ieee80211_do_open+0x11e/0x42e [mac80211]
	[<f9075390>] ? ieee80211_check_concurrent_iface+0x26/0x13c [mac80211]
	[<f9076d97>] ieee80211_open+0x48/0x4c [mac80211]
	[<812dbed8>] __dev_open+0x82/0xab
	[<812dc0c9>] __dev_change_flags+0x9c/0x113
	[<812dc1ae>] dev_change_flags+0x18/0x44
	[<8132144f>] devinet_ioctl+0x243/0x51a
	[<81321ba9>] inet_ioctl+0x93/0xac
	[<812cc951>] sock_ioctl+0x1c6/0x1ea
	[<812cc78b>] ? might_fault+0x20/0x20
	[<810b1ebb>] do_vfs_ioctl+0x46e/0x4a2
	[<810a6ebb>] ? fget_light+0x2f/0x70
	[<812ce549>] ? sys_recvmsg+0x3e/0x48
	[<810b1f35>] sys_ioctl+0x46/0x69
	[<8137fa77>] sysenter_do_call+0x12/0x2

Cc: <stable@vger.kernel.org>
Cc: Gary Morain <gmorain@google.com>
Cc: Paul Stewart <pstew@google.com>
Cc: Abhijit Pradhan <abhijit@qca.qualcomm.com>
Cc: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Cc: Rajkumar Manoharan <rmanohar@qca.qualcomm.com>
Acked-by: Johannes Berg <johannes.berg@intel.com>
Tested-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-02-09 15:16:04 -05:00
Florian Westphal
a8db7b2d19 netfilter: nf_queue: fix queueing of bridged gro skbs
When trying to nf_queue GRO/GSO skbs, nf_queue uses skb_gso_segment
to split the skb.

However, if nf_queue is called via bridge netfilter, the mac header
won't be preserved -- packets will thus contain a bogus mac header.

Fix this by setting skb->data to the mac header when skb->nf_bridge
is set and restoring skb->data afterwards for all segments.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-02-09 20:47:53 +01:00
David S. Miller
16bda13d90 net: Make qdisc_skb_cb upper size bound explicit.
Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside
of other data structures.

This is intended to be used by IPoIB so that it can remember
addressing information stored at hard_header_ops->create() time that
it can fetch when the packet gets to the transmit routine.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-09 13:50:34 -05:00
Eric Dumazet
5ca3b72c5d gro: more generic L2 header check
Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
only, and failed on IB/ipoib traffic.

He provided a patch faking a zeroed header to let GRO aggregates frames.

Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
check to be more generic, ie not assuming L2 header is 14 bytes, but
taking into account hard_header_len.

__napi_gro_receive() has special handling for the common case (Ethernet)
to avoid a memcmp() call and use an inline optimized function instead.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Shlomo Pongratz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08 18:26:54 -05:00