Commit graph

6772 commits

Author SHA1 Message Date
David S. Miller
f1abb346d8 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of updates intended for the 3.14 stream...

For the mac80211 bits, Johannes says:

"I have various improvements/cleanups/fixes all over, but the shortlog
shows that Luis's regulatory work and mesh work from the cozybit folks
are the biggest ones, along with the CSA fixes."

Along with that, we have big batches of updates to brcmfmac, rtlwifi,
and ath9k.  There are updates to wcn36xx, rt2x00, and a handful of
others as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 14:25:23 -05:00
Eric Dumazet
f54b311142 tcp: auto corking
With the introduction of TCP Small Queues, TSO auto sizing, and TCP
pacing, we can implement Automatic Corking in the kernel, to help
applications doing small write()/sendmsg() to TCP sockets.

Idea is to change tcp_push() to check if the current skb payload is
under skb optimal size (a multiple of MSS bytes)

If under 'size_goal', and at least one packet is still in Qdisc or
NIC TX queues, set the TCP Small Queue Throttled bit, so that the push
will be delayed up to TX completion time.

This delay might allow the application to coalesce more bytes
in the skb in following write()/sendmsg()/sendfile() system calls.

The exact duration of the delay is depending on the dynamics
of the system, and might be zero if no packet for this flow
is actually held in Qdisc or NIC TX ring.

Using FQ/pacing is a way to increase the probability of
autocorking being triggered.

Add a new sysctl (/proc/sys/net/ipv4/tcp_autocorking) to control
this feature and default it to 1 (enabled)

Add a new SNMP counter : nstat -a | grep TcpExtTCPAutoCorking
This counter is incremented every time we detected skb was under used
and its flush was deferred.

Tested:

Interesting effects when using line buffered commands under ssh.

Excellent performance results in term of cpu usage and total throughput.

lpq83:~# echo 1 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
9410.39

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':

      35209.439626 task-clock                #    2.901 CPUs utilized
             2,294 context-switches          #    0.065 K/sec
               101 CPU-migrations            #    0.003 K/sec
             4,079 page-faults               #    0.116 K/sec
    97,923,241,298 cycles                    #    2.781 GHz                     [83.31%]
    51,832,908,236 stalled-cycles-frontend   #   52.93% frontend cycles idle    [83.30%]
    25,697,986,603 stalled-cycles-backend    #   26.24% backend  cycles idle    [66.70%]
   102,225,978,536 instructions              #    1.04  insns per cycle
                                             #    0.51  stalled cycles per insn [83.38%]
    18,657,696,819 branches                  #  529.906 M/sec                   [83.29%]
        91,679,646 branch-misses             #    0.49% of all branches         [83.40%]

      12.136204899 seconds time elapsed

lpq83:~# echo 0 >/proc/sys/net/ipv4/tcp_autocorking
lpq83:~# perf stat ./super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128
6624.89

 Performance counter stats for './super_netperf 4 -t TCP_STREAM -H lpq84 -- -m 128':
      40045.864494 task-clock                #    3.301 CPUs utilized
               171 context-switches          #    0.004 K/sec
                53 CPU-migrations            #    0.001 K/sec
             4,080 page-faults               #    0.102 K/sec
   111,340,458,645 cycles                    #    2.780 GHz                     [83.34%]
    61,778,039,277 stalled-cycles-frontend   #   55.49% frontend cycles idle    [83.31%]
    29,295,522,759 stalled-cycles-backend    #   26.31% backend  cycles idle    [66.67%]
   108,654,349,355 instructions              #    0.98  insns per cycle
                                             #    0.57  stalled cycles per insn [83.34%]
    19,552,170,748 branches                  #  488.244 M/sec                   [83.34%]
       157,875,417 branch-misses             #    0.81% of all branches         [83.34%]

      12.130267788 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:51:41 -05:00
Jeff Kirsher
a6227e26d9 include/net/: Fix FSF address in file headers
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:37:56 -05:00
Jeff Kirsher
4b2f13a251 sctp: Fix FSF address in file headers
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.

CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06 12:37:56 -05:00
John W. Linville
d86804cb70 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-12-06 10:37:24 -05:00
John W. Linville
4b074b0762 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2013-12-02 14:25:38 -05:00
Simon Wunderlich
e487eaeb07 cfg80211/mac80211/ath6kl: acquire wdev lock outside ch_switch_notify
The channel switch notification should be sent under the
wdev/sdata-lock, preferably in the same moment as the channel change
happens, to avoid races by other callers (e.g. start/stop_ap).
This also adds the previously missing sdata_lock protection in
csa_finalize_work.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-02 11:51:54 +01:00
Andrei Otcheretianski
b176e62940 cfg80211: aggregate mgmt_tx parameters into a struct
Change cfg80211 and mac80211 to use cfg80211_mgmt_tx_params
struct to aggregate parameters for mgmt_tx functions.
This makes the functions' signatures less clumsy and allows
less painful parameters extension.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
[fix all other drivers]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-12-02 11:51:52 +01:00
Xufeng Zhang
6eabca54d6 sctp: Restore 'resent' bit to avoid retransmitted chunks for RTT measurements
Currently retransmitted DATA chunks could also be used for
RTT measurements since there are no flag to identify whether
the transmitted DATA chunk is a new one or a retransmitted one.
This problem is introduced by commit ae19c5486 ("sctp: remove
'resent' bit from the chunk") which inappropriately removed the
'resent' bit completely, instead of doing this, we should set
the resent bit only for the retransmitted DATA chunks.

Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-28 18:29:58 -05:00
Luis R. Rodriguez
4c7d3982a6 cfg80211: use enum nl80211_dfs_regions for dfs_region everywhere
u8 was used in some other places, just stick to the enum,
this forces us to express the values that are expected.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:52:12 +01:00
Eliad Peller
21f659bf1f mac80211: add min required channel definition field
Add a new field to ieee80211_chanctx_conf to indicate
the min required channel configuration.

Tuning to a narrower channel might help reducing
the noise level and saving some power.

The min required channel definition is the max of
all min required channel definitions of the interfaces
bound to this channel context.

In AP mode, use 20MHz when there are no connected station.
When a new station is added/removed, calculate the new max
bandwidth supported by any of the stations (e.g. 80MHz when
80MHz and 40MHz stations are connected).

In other cases, simply use bss_conf.chandef as the
min required chandef.

Notify drivers about changes to this field by calling
drv_change_chanctx with a new CHANGE_MIN_WIDTH notification.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:52:05 +01:00
Luis R. Rodriguez
2a901468c2 cfg80211: add an option to disable processing country IEs
Certain vendors may want to disable the processing of
country IEs so that they can continue using the regulatory
domain the driver or user has set.  Currently there is no
way to stop the core from processing country IEs, so add
support to the core to ignore country IE hints.

Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:51:51 +01:00
Luis R. Rodriguez
a09a85a013 cfg80211: add flags to define country IE processing rules
802.11 cards may have different country IE parsing behavioural
preferences and vendors may want to support these. These preferences
were managed by the REGULATORY_CUSTOM_REG and the REGULATORY_STRICT_REG
flags and their combination. Instead of using this existing notation,
split out the country IE behavioural preferences as a new flag. This
will allow us to add more customizations easily and make the code more
maintainable.

Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[fix up conflicts]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:51:49 +01:00
Luis R. Rodriguez
a2f73b6c5d cfg80211: move regulatory flags to their own variable
We'll expand this later, this will make it easier to
classify and review what things are related to regulatory
or not.

Coccinelle only missed 4 hits, which I had to do manually,
supplying the SmPL in case of merge conflicts.

@@
struct wiphy *wiphy;
@@
-wiphy->flags |= WIPHY_FLAG_CUSTOM_REGULATORY
+wiphy->regulatory_flags |= REGULATORY_CUSTOM_REG
@@
expression e;
@@
-e->flags |= WIPHY_FLAG_CUSTOM_REGULATORY
+e->regulatory_flags |= REGULATORY_CUSTOM_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags &= ~WIPHY_FLAG_CUSTOM_REGULATORY
+wiphy->regulatory_flags &= ~REGULATORY_CUSTOM_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags & WIPHY_FLAG_CUSTOM_REGULATORY
+wiphy->regulatory_flags & REGULATORY_CUSTOM_REG

@@
struct wiphy *wiphy;
@@
-wiphy->flags |= WIPHY_FLAG_STRICT_REGULATORY
+wiphy->regulatory_flags |= REGULATORY_STRICT_REG
@@
expression e;
@@
-e->flags |= WIPHY_FLAG_STRICT_REGULATORY
+e->regulatory_flags |= REGULATORY_STRICT_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags &= ~WIPHY_FLAG_STRICT_REGULATORY
+wiphy->regulatory_flags &= ~REGULATORY_STRICT_REG
@@
struct wiphy *wiphy;
@@
-wiphy->flags & WIPHY_FLAG_STRICT_REGULATORY
+wiphy->regulatory_flags & REGULATORY_STRICT_REG

@@
struct wiphy *wiphy;
@@
-wiphy->flags |= WIPHY_FLAG_DISABLE_BEACON_HINTS
+wiphy->regulatory_flags |= REGULATORY_DISABLE_BEACON_HINTS
@@
expression e;
@@
-e->flags |= WIPHY_FLAG_DISABLE_BEACON_HINTS
+e->regulatory_flags |= REGULATORY_DISABLE_BEACON_HINTS
@@
struct wiphy *wiphy;
@@
-wiphy->flags &= ~WIPHY_FLAG_DISABLE_BEACON_HINTS
+wiphy->regulatory_flags &= ~REGULATORY_DISABLE_BEACON_HINTS
@@
struct wiphy *wiphy;
@@
-wiphy->flags & WIPHY_FLAG_DISABLE_BEACON_HINTS
+wiphy->regulatory_flags & REGULATORY_DISABLE_BEACON_HINTS

Generated-by: Coccinelle SmPL
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Peter Senna Tschudin <peter.senna@gmail.com>
Cc: Mihir Shete <smihir@qti.qualcomm.com>
Cc: Henri Bahini <hbahini@qca.qualcomm.com>
Cc: Tushnim Bhattacharyya <tushnimb@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[fix up whitespace damage, overly long lines]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:51:46 +01:00
Max Stepanov
2475b1cc0d mac80211: add generic cipher scheme support
This adds generic cipher scheme support to mac80211, such schemes
are fully under control by the driver. On hw registration drivers
may specify additional HW ciphers with a scheme how these ciphers
have to be handled by mac80211 TX/RR. A cipher scheme specifies a
cipher suite value, a size of the security header to be added to
or stripped from frames and how the PN is to be verified on RX.

Signed-off-by: Max Stepanov <Max.Stepanov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:50:52 +01:00
Janusz Dziedzic
d2859df5e7 cfg80211/mac80211: DFS setup chandef for cac event
To report channel width correctly we have
to send correct channel parameters from
mac80211 when calling cfg80211_cac_event().

This is required in case of using channel width
higher than 20MHz and we have to set correct
dfs channel state after CAC (NL80211_DFS_AVAILABLE).

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Reviewed-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:50:46 +01:00
Luis R. Rodriguez
222ea58199 cfg80211: force WIPHY_FLAG_CUSTOM_REGULATORY on wiphy_apply_custom_regulatory()
wiphy_apply_custom_regulatory() implies WIPHY_FLAG_CUSTOM_REGULATORY
but we never enforced it, do that now and warn if the driver
didn't set it. All drivers should be following this today already.

Having WIPHY_FLAG_CUSTOM_REGULATORY does not however mean you will
use wiphy_apply_custom_regulatory() though, you may have your own
_orig value set up tools / helpers. The intel drivers are examples
of this type of driver.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:49:45 +01:00
Luis R. Rodriguez
8fe02e167e cfg80211: consolidate passive-scan and no-ibss flags
These two flags are used for the same purpose, just
combine them into a no-ir flag to annotate no initiating
radiation is allowed.

Old userspace sending either flag will have it treated as
the no-ir flag. To be considerate to older userspace we
also send both the no-ir flag and the old no-ibss flags.
Newer userspace will have to be aware of older kernels.

Update all places in the tree using these flags with the
following semantic patch:

@@
@@
-NL80211_RRF_PASSIVE_SCAN
+NL80211_RRF_NO_IR
@@
@@
-NL80211_RRF_NO_IBSS
+NL80211_RRF_NO_IR
@@
@@
-IEEE80211_CHAN_PASSIVE_SCAN
+IEEE80211_CHAN_NO_IR
@@
@@
-IEEE80211_CHAN_NO_IBSS
+IEEE80211_CHAN_NO_IR
@@
@@
-NL80211_RRF_NO_IR | NL80211_RRF_NO_IR
+NL80211_RRF_NO_IR
@@
@@
-IEEE80211_CHAN_NO_IR | IEEE80211_CHAN_NO_IR
+IEEE80211_CHAN_NO_IR
@@
@@
-(NL80211_RRF_NO_IR)
+NL80211_RRF_NO_IR
@@
@@
-(IEEE80211_CHAN_NO_IR)
+IEEE80211_CHAN_NO_IR

Along with some hand-optimisations in documentation, to
remove duplicates and to fix some indentation.

Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
[do all the driver updates in one go]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-11-25 20:49:35 +01:00
Hannes Frederic Sowa
85fbaa7503 inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions
Commit bceaa90240 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa90240 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-23 14:46:23 -08:00
Johannes Berg
91398a0992 genetlink: fix genl_set_err() group ID
Fix another really stupid bug - I introduced genl_set_err()
precisely to be able to adjust the group and reject invalid
ones, but then forgot to do so.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-21 13:09:43 -05:00
Johannes Berg
220815a966 genetlink: fix genlmsg_multicast() bug
Unfortunately, I introduced a tremendously stupid bug into
genlmsg_multicast() when doing all those multicast group
changes: it adjusts the group number, but then passes it
to genlmsg_multicast_netns() which does that again.

Somehow, my tests failed to catch this, so add a warning
into genlmsg_multicast_netns() and remove the offending
group ID adjustment.

Also add a warning to the similar code in other functions
so people who misuse them are more loudly warned.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-21 13:09:43 -05:00
Linus Torvalds
1ee2dcc224 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...
2013-11-19 15:50:47 -08:00
Johannes Berg
2a94fe48f3 genetlink: make multicast groups const, prevent abuse
Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.

This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.

At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
68eb55031d genetlink: pass family to functions using groups
This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
62b68e99fa genetlink: add and use genl_set_err()
Add a static inline to generic netlink to wrap netlink_set_err()
to make it easier to use here - use it in openvswitch (the only
generic netlink user of netlink_set_err()).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
c2ebb90846 genetlink: remove family pointer from genl_multicast_group
There's no reason to have the family pointer there since it
can just be passed internally where needed, so remove it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
06fb555a27 genetlink: remove genl_unregister_mc_group()
There are no users of this API remaining, and we'll soon
change group registration to be static (like ops are now)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:06 -05:00
Johannes Berg
c53ed74236 genetlink: only pass array to genl_register_family_with_ops()
As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-19 16:39:05 -05:00
Johannes Berg
568508aa07 genetlink: unify registration functions
Now that the ops assignment is just two variables rather than a
long list iteration etc., there's no reason to separately export
__genl_register_family() and __genl_register_family_with_ops().

Unify the two functions into __genl_register_family() and make
genl_register_family_with_ops() call it after assigning the ops.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-15 20:50:23 -05:00
Linus Torvalds
9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Johannes Berg
3f5ccd06ae genetlink: make genl_ops flags a u8 and move to end
To save some space in the struct on 32-bit systems,
make the flags a u8 (only 4 bits are used) and also
move them to the end of the struct.

This has no impact on 64-bit systems as alignment of
the struct in an array uses up the space anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg
f84f771d94 genetlink: allow making ops const
Allow making the ops array const by not modifying the ops
flags on registration but rather only when ops are sent
out in the family information.

No users are updated yet except for the pre_doit/post_doit
calls in wireless (the only ones that exist now.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg
d91824c08f genetlink: register family ops as array
Instead of using a linked list, use an array. This reduces
the data size needed by the users of genetlink, for example
in wireless (net/wireless/nl80211.c) on 64-bit it frees up
over 1K of data space.

Remove the attempted sending of CTRL_CMD_NEWOPS ctrl event
since genl_ctrl_event(CTRL_CMD_NEWOPS, ...) only returns
-EINVAL anyway, therefore no such event could ever be sent.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:41 -05:00
Johannes Berg
3686ec5e84 genetlink: remove genl_register_ops/genl_unregister_ops
genl_register_ops() is still needed for internal registration,
but is no longer available to users of the API.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:10:40 -05:00
Jiri Pirko
6aafeef03b netfilter: push reasm skb through instead of original frag skbs
Pushing original fragments through causes several problems. For example
for matching, frags may not be matched correctly. Take following
example:

<example>
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

and on HOSTB you do:
ping6 HOSTA -s2000    (MTU is 1500)

Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen)
</example>

As was discussed previously, the only correct solution seems to be to use
reassembled skb instead of separete frags. Doing this has positive side
effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
dances in ipvs and conntrack can be removed.

Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
entirely and use code in net/ipv6/reassembly.c instead.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-11 00:19:35 -05:00
Florent Fourcot
3fdfa5ff50 ipv6: enable IPV6_FLOWLABEL_MGR for getsockopt
It is already possible to set/put/renew a label
with IPV6_FLOWLABEL_MGR and setsockopt. This patch
add the possibility to get information about this
label (current value, time before expiration, etc).

It helps application to take decision for a renew
or a release of the label.

v2:
 * Add spin_lock to prevent race condition
 * return -ENOENT if no result found
 * check if flr_action is GET

v3:
 * move the spin_lock to protect only the
   relevant code

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-08 13:42:57 -05:00
John W. Linville
c1f3bb6bd3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-11-08 09:03:10 -05:00
Mathias Krause
0c7ddf36c2 net: move pskb_put() to core code
This function has usage beside IPsec so move it to the core skbuff code.
While doing so, give it some documentation and change its return type to
'unsigned char *' to be in line with skb_put().

Signed-off-by: Mathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 19:28:58 -05:00
Joe Perches
f9bddcdf1f udp: Remove unnecessary semicolon from do{}while (0) macro
Just an unnecessary semicolon that should be removed...

Whitespace neatening of macro too.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-07 02:14:33 -05:00
Hannes Frederic Sowa
482fc6094a ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE
Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.

Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.

(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)

Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>

This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.

This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.

Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.

Cc: David S. Miller <davem@davemloft.net>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 21:52:27 -05:00
John W. Linville
33b443422e Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2013-11-05 15:50:22 -05:00
John W. Linville
353c78152c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	net/wireless/reg.c
2013-11-05 15:49:02 -05:00
David S. Miller
cfce0a2b61 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please accept the following pull request intended for the 3.13 tree...

I had intended to pass most of these to you as much as two weeks ago.
Unfortunately, I failed to account for the effects of bad Internet
connections and my own fatique/laziness while traveling.  On the bright
side, at least these have been baking in linux-next for some time!

For the mac80211 bits, Johannes says:

"This time I have two fixes for P2P (which requires not using CCK rates)
and a workaround for APs with broken WMM information."

For the iwlwifi bits, Johannes says:

"I have a few fixes for warnings/issues: one from Alex, fixing scan
timings, one from Emmanuel fixing a WARN_ON in the DVM driver, one from
Stanislaw removing a trigger-happy WARN_ON in the MVM driver and a
change from myself to try to recover when the device isn't processing
commands quickly."

And:

"For this round, I have a lot of changes:
 * power management improvements
 * BT coexistence improvements/updates
 * new device support
 * VHT support
 * IBSS support (though due to a small bug it requires new firmware)
 * various other fixes/improvements."

For the Bluetooth bits, Gustavo says:

"More patches for 3.12, busy times for Bluetooth. More than a 100 commits since
the last pull. The bulk of work comes from Johan and Marcel, they are doing
fixes and improvements all over the Bluetooth subsystem, as the diffstat can
show."

For the ath10k and ath6kl bits, Kalle says:

"Bartosz added support to ath10k for our 10.x AP firmware branch, which
gives us AP specific features and fixes. We still support the main
firmware branch as well just like before, ath10k detects runtime what
firmware is used. Unfortunately the firmware interface in 10.x branch is
somewhat different so there was quite a lot of changes in ath10k for
this.

Michal and Sujith did some performance improvements in ath10k. Vladimir
fixed a compiler warning and Fengguang removed an extra semicolon."

For the NFC bits, Samuel says:

"It's a fairly big one, with the following highlights:

- NFC digital layer implementation: Most NFC chipsets implement the NFC
  digital layer in firmware, but others have more basic functionalities
  and expect the host to implement the digital layer. This layer sits
  below the NFC core.

- Sony's port100 support: This is "soft" NFC USB dongle that expects the
  digital layer to be implemented on the host. This is the first user of
  our NFC digital stack implementation.

- Secure element API: We now provide a netlink API for enabling,
  disabling and discovering NFC attached (embedded or UICC ones) secure
  elements. With some userspace help, this allows us to support NFC
  payments.
  Only the pn544 driver currently supports that API.

- NCI SPI fixes and improvements: In order to support NCI devices over
  SPI, we fixed and improved our NCI/SPI implementation. The currently
  most deployed NFC NCI chipset, Broadcom's bcm2079x, supports that mode
  and we're planning to use our NCI/SPI framework to implement a
  driver for it.

- pn533 fragmentation support in target mode: This was the only missing
  feature from our pn533 impementation. We now support fragmentation in
  both Tx and Rx modes, in target mode."

On top of all that, brcmfmac and rt2x00 both get the usual flurry
of updates.  A few other drivers get hit here or there as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-05 02:34:57 -05:00
Jesper Dangaard Brouer
1ba3aab303 net: codel: Avoid undefined behavior from signed overflow
As described in commit 5a581b367 (jiffies: Avoid undefined
behavior from signed overflow), according to the C standard
3.4.3p3, overflow of a signed integer results in undefined
behavior.

To fix this, do as the above commit, and do an unsigned
subtraction, and interpreting the result as a signed
two's-complement number.  This is based on the theory from
RFC 1982 and is nicely described in wikipedia here:
 https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution

A side-note, I have seen practical issues with the previous logic
when dealing with 16-bit, on a 64-bit machine (gcc version
4.4.5). This were 32-bit, which I have not observed issues with.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <netoptimizer@brouer.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 20:01:29 -05:00
Yuchung Cheng
9f9843a751 tcp: properly handle stretch acks in slow start
Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
regardless the number of packets. Consequently slow start performance
is highly dependent on the degree of the stretch ACKs caused by
receiver or network ACK compression mechanisms (e.g., delayed-ACK,
GRO, etc).  But slow start algorithm is to send twice the amount of
packets of packets left so it should process a stretch ACK of degree
N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
follow up patch will use the remainder of the N (if greater than 1)
to adjust cwnd in the congestion avoidance phase.

In addition this patch retires the experimental limited slow start
(LSS) feature. LSS has multiple drawbacks but questionable benefit. The
fractional cwnd increase in LSS requires a loop in slow start even
though it's rarely used. Configuring such an increase step via a global
sysctl on different BDPS seems hard. Finally and most importantly the
slow start overshoot concern is now better covered by the Hybrid slow
start (hystart) enabled by default.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 19:57:59 -05:00
David S. Miller
72c39a0ade Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
This is another batch containing Netfilter/IPVS updates for your net-next
tree, they are:

* Six patches to make the ipt_CLUSTERIP target support netnamespace,
  from Gao feng.

* Two cleanups for the nf_conntrack_acct infrastructure, introducing
  a new structure to encapsulate conntrack counters, from Holger
  Eitzenberger.

* Fix missing verdict in SCTP support for IPVS, from Daniel Borkmann.

* Skip checksum recalculation in SCTP support for IPVS, also from
  Daniel Borkmann.

* Fix behavioural change in xt_socket after IP early demux, from
  Florian Westphal.

* Fix bogus large memory allocation in the bitmap port set type in ipset,
  from Jozsef Kadlecsik.

* Fix possible compilation issues in the hash netnet set type in ipset,
  also from Jozsef Kadlecsik.

* Define constants to identify netlink callback data in ipset dumps,
  again from Jozsef Kadlecsik.

* Use sock_gen_put() in xt_socket to replace xt_socket_put_sk,
  from Eric Dumazet.

* Improvements for the SH scheduler in IPVS, from Alexander Frolkin.

* Remove extra delay due to unneeded rcu barrier in IPVS net namespace
  cleanup path, from Julian Anastasov.

* Save some cycles in ip6t_REJECT by skipping checksum validation in
  packets leaving from our stack, from Stanislav Fomichev.

* Fix IPVS_CMD_ATTR_MAX definition in IPVS, larger that required, from
  Julian Anastasov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 19:46:58 -05:00
Daniel Borkmann
cea80ea8d2 net: checksum: fix warning in skb_checksum
This patch fixes a build warning in skb_checksum() by wrapping the
csum_partial() usage in skb_checksum(). The problem is that on a few
architectures, csum_partial is used with prefix asmlinkage whereas
on most architectures it's not. So fix this up generically as we did
with csum_block_add_ext() to match the signature. Introduced by
2817a336d4 ("net: skb_checksum: allow custom update/combine for
walking skb").

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 15:27:08 -05:00
John W. Linville
87bc0728d4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	drivers/net/wireless/brcm80211/brcmfmac/sdio_host.h
2013-11-04 14:51:28 -05:00
David S. Miller
394efd19d5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be.h
	drivers/net/netconsole.c
	net/bridge/br_private.h

Three mostly trivial conflicts.

The net/bridge/br_private.h conflict was a function signature (argument
addition) change overlapping with the extern removals from Joe Perches.

In drivers/net/netconsole.c we had one change adjusting a printk message
whilst another changed "printk(KERN_INFO" into "pr_info(".

Lastly, the emulex change was a new inline function addition overlapping
with Joe Perches's extern removals.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-04 13:48:30 -05:00
Daniel Borkmann
e6d8b64b34 net: sctp: fix and consolidate SCTP checksumming code
This fixes an outstanding bug found through IPVS, where SCTP packets
with skb->data_len > 0 (non-linearized) and empty frag_list, but data
accumulated in frags[] member, are forwarded with incorrect checksum
letting SCTP initial handshake fail on some systems. Linearizing each
SCTP skb in IPVS to prevent that would not be a good solution as
this leads to an additional and unnecessary performance penalty on
the load-balancer itself for no good reason (as we actually only want
to update the checksum, and can do that in a different/better way
presented here).

The actual problem is elsewhere, namely, that SCTP's checksumming
in sctp_compute_cksum() does not take frags[] into account like
skb_checksum() does. So while we are fixing this up, we better reuse
the existing code that we have anyway in __skb_checksum() and use it
for walking through the data doing checksumming. This will not only
fix this issue, but also consolidates some SCTP code with core
sk_buff code, bringing it closer together and removing respectively
avoiding reimplementation of skb_checksum() for no good reason.

As crc32c() can use hardware implementation within the crypto layer,
we leave that intact (it wraps around / falls back to e.g. slice-by-8
algorithm in __crc32c_le() otherwise); plus use the __crc32c_le_combine()
combinator for crc32c blocks.

Also, we remove all other SCTP checksumming code, so that we only
have to use sctp_compute_cksum() from now on; for doing that, we need
to transform SCTP checkumming in output path slightly, and can leave
the rest intact.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-03 23:04:57 -05:00