Commit graph

24675 commits

Author SHA1 Message Date
Jan Beulich
ce9f3f31ef netfilter: properly annotate ipv4_netfilter_{init,fini}()
Despite being just a few bytes of code, they should still have proper
annotations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:56:04 +02:00
Michael Wang
1c15b67709 netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_queue()
Since 'list_for_each_continue_rcu' has already been replaced by
'list_for_each_entry_continue_rcu', pass 'list_head' to nf_queue() as a
parameter can not benefit us any more.

This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of
nf_queue() and __nf_queue() to save code.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:52:54 +02:00
Michael Wang
2a6decfd8a netfilter: pass 'nf_hook_ops' instead of 'list_head' to nf_iterate()
Since 'list_for_each_continue_rcu' has already been replaced by
'list_for_each_entry_continue_rcu', pass 'list_head' to nf_iterate() as a
parameter can not benefit us any more.

This patch will replace 'list_head' with 'nf_hook_ops' as the parameter of
nf_iterate() to save code.

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:52:44 +02:00
Cong Wang
965505015b netfilter: remove xt_NOTRACK
It was scheduled to be removed for a long time.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netfilter@vger.kernel.org
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:36:40 +02:00
Pablo Neira Ayuso
84b5ee939e netfilter: nf_conntrack: add nf_ct_timeout_lookup
This patch adds the new nf_ct_timeout_lookup function to encapsulate
the timeout policy attachment that is called in the nf_conntrack_in
path.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:33:03 +02:00
Pablo Neira Ayuso
236df00561 netfilter: xt_CT: refactorize xt_ct_tg_check
This patch adds xt_ct_set_helper and xt_ct_set_timeout to reduce
the size of xt_ct_tg_check.

This aims to improve code mantainability by splitting xt_ct_tg_check
in smaller chunks.

Suggested by Eric Dumazet.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:32:48 +02:00
Pablo Neira Ayuso
6703aa74ad netfilter: xt_socket: fix compilation warnings with gcc 4.7
This patch fixes compilation warnings in xt_socket with gcc-4.7.

In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:265:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:265:9: note: ‘dport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:264:27: note: ‘saddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:264:19: note: ‘daddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_match.isra.4’:
include/net/netfilter/nf_tproxy_core.h:75:2: warning: ‘protocol’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:113:5: note: ‘protocol’ was declared here
In file included from include/net/tcp.h:37:0,
                 from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:45: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:112:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:106:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:112:9: note: ‘dport’ was declared here
In file included from include/net/tcp.h:37:0,
                 from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:15: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:111:16: note: ‘saddr’ was declared here
In file included from include/net/tcp.h:37:0,
                 from net/netfilter/xt_socket.c:17:
include/net/inet_hashtables.h:356:15: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:111:9: note: ‘daddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
net/netfilter/xt_socket.c: In function ‘socket_mt6_v1’:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘sport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:268:16: note: ‘sport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:23: warning: ‘dport’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:268:9: note: ‘dport’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘saddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:267:27: note: ‘saddr’ was declared here
In file included from net/netfilter/xt_socket.c:22:0:
include/net/netfilter/nf_tproxy_core.h:175:6: warning: ‘daddr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/xt_socket.c:267:19: note: ‘daddr’ was declared here

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-03 13:31:39 +02:00
Patrick McHardy
8a91bb0c30 netfilter: ip6tables: add stateless IPv6-to-IPv6 Network Prefix Translation target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:25 +02:00
Pablo Neira Ayuso
320ff567f2 netfilter: nf_nat: support IPv6 in TFTP NAT helper
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:24 +02:00
Pablo Neira Ayuso
5901b6be88 netfilter: nf_nat: support IPv6 in IRC NAT helper
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:23 +02:00
Patrick McHardy
9a66482106 netfilter: nf_nat: support IPv6 in SIP NAT helper
Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:22 +02:00
Patrick McHardy
ee6eb96673 netfilter: nf_nat: support IPv6 in amanda NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:21 +02:00
Patrick McHardy
d33cbeeb1a netfilter: nf_nat: support IPv6 in FTP NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:20 +02:00
Patrick McHardy
ed72d9e294 netfilter: ip6tables: add NETMAP target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:19 +02:00
Patrick McHardy
115e23ac78 netfilter: ip6tables: add REDIRECT target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:19 +02:00
Patrick McHardy
b3f644fc82 netfilter: ip6tables: add MASQUERADE target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:18 +02:00
Patrick McHardy
58a317f106 netfilter: ipv6: add IPv6 NAT support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:17 +02:00
Patrick McHardy
2cf545e835 net: core: add function for incremental IPv6 pseudo header checksum updates
Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
2012-08-30 03:00:16 +02:00
Patrick McHardy
0ad352cb43 netfilter: ipv6: expand skb head in ip6_route_me_harder after oif change
Expand the skb headroom if the oif changed due to rerouting similar to
how IPv4 packets are handled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:15 +02:00
Patrick McHardy
c7232c9979 netfilter: add protocol independent NAT core
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:14 +02:00
Patrick McHardy
051966c0c6 netfilter: nf_nat: add protoff argument to packet mangling functions
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:13 +02:00
Patrick McHardy
811927ccfe netfilter: nf_conntrack: restrict NAT helper invocation to IPv4
The NAT helpers currently only handle IPv4 packets correctly. Restrict
invocation of the helpers to IPv4 in preparation of IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:12 +02:00
Patrick McHardy
2b60af0178 netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments
ICMPv6 error messages are tracked by extracting the conntrack tuple of
the inner packet and looking up the corresponding conntrack entry. Tuple
extraction uses the ->get_l4proto() callback, which in case of fragments
returns NEXTHDR_FRAGMENT instead of the upper protocol, even for the
first fragment when the entire next header is present, resulting in a
failure to find the correct connection tracking entry.

This patch changes ipv6_get_l4proto() to use ipv6_skip_exthdr() instead
of nf_ct_ipv6_skip_exthdr() in order to skip fragment headers when the
fragment offset is zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:11 +02:00
Patrick McHardy
4cdd34084d netfilter: nf_conntrack_ipv6: improve fragmentation handling
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
  assigned, the reassembled packet is passed through the stack instead
  of the original fragments.

- During defragmentation, the largest received fragment size is stored.
  On output, the packet is refragmented if required. If the largest
  received fragment size exceeds the outgoing MTU, a "packet too big"
  message is generated, thus behaving as if the original fragments
  were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
  connections using a helper, so it is switched to use ipv6_skip_exthdr()
  instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
  reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:10 +02:00
Jesper Dangaard Brouer
590e3f79a2 ipvs: IPv6 MTU checking cleanup and bugfix
Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().

The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len.  And the 'mtu' variable have been adjusted before
calling helper.

Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().

This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver.  This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 02:55:39 +02:00
Patrick McHardy
5f2d04f1f9 ipv4: fix path MTU discovery with connection tracking
IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
2012-08-26 19:13:55 +02:00
Pavel Emelyanov
0fa7fa98db packet: Protect packet sk list with mutex (v2)
Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
danborkmann@iogearbox.net
9e67030af3 af_packet: use define instead of constant
Instead of using a hard-coded value for the status variable, it would make
the code more readable to use its destined define from linux/if_packet.h.

Signed-off-by: daniel.borkmann@tik.ee.ethz.ch
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
Ying Xue
bfdc587c5a rds: Don't disable BH on BH context
Since we have already in BH context when *_write_space(),
*_data_ready() as well as *_state_change() are called, it's
unnecessary to disable BH.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:52:04 -07:00
Eric Dumazet
b87fb39e39 ipv6: gre: fix ip6gre_err()
ip6gre_err() miscomputes grehlen (sizeof(ipv6h) is 4 or 8,
not 40 as expected), and should take into account 'offset' parameter.

Also uses pskb_may_pull() to cope with some fragged skbs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:48:32 -07:00
Eric Dumazet
ef8531b64c xfrm: fix RCU bugs
This patch reverts commit 56892261ed (xfrm: Use rcu_dereference_bh to
deference pointer protected by rcu_read_lock_bh), and fixes bugs
introduced in commit 418a99ac6a ( Replace rwlock on xfrm_policy_afinfo
with rcu )

1) We properly use RCU variant in this file, not a mix of RCU/RCU_BH

2) We must defer some writes after the synchronize_rcu() call or a reader
 can crash dereferencing NULL pointer.

3) Now we use the xfrm_policy_afinfo_lock spinlock only from process
context, we no longer need to block BH in xfrm_policy_register_afinfo()
and xfrm_policy_unregister_afinfo()

4) Can use RCU_INIT_POINTER() instead of rcu_assign_pointer() in
xfrm_policy_unregister_afinfo()

5) Remove a forward inline declaration (xfrm_policy_put_afinfo()),
  and also move xfrm_policy_get_afinfo() declaration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@windriver.com>
Cc: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:39:46 -07:00
Eric Dumazet
0115e8e30d net: remove delay at device dismantle
I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.

These call_rcu() were posted by rt_free(struct rtable *rt) calls.

We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.

As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.

To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.

Use NETDEV_UNREGISTER_FINAL with care !

Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL

Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.

With help from Gao feng

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 21:50:36 -07:00
David S. Miller
bf277b0cce Merge git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:

* Remove unnecessary RTNL locking now that we have support
  for namespace in nf_conntrack, from Patrick McHardy.

* Cleanup to eliminate unnecessary goto in the initialization
  path of several Netfilter tables, from Jean Sacren.

* Another cleanup from Wu Fengguang, this time to PTR_RET instead
  of if IS_ERR then return PTR_ERR.

* Use list_for_each_entry_continue_rcu in nf_iterate, from
  Michael Wang.

* Add pmtu_disc sysctl option to disable PMTU in their tunneling
  transmitter, from Julian Anastasov.

* Generalize application protocol registration in IPVS and modify
  IPVS FTP helper to use it, from Julian Anastasov.

* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
  helper for NAT support, from Julian Anastasov.

* Add logic to update PMTU for IPIP packets in IPVS, again
  from Julian Anastasov.

* A couple of sparse warning fixes for IPVS and Netfilter from
  Claudiu Ghioc and Patrick McHardy respectively.

Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 18:48:52 -07:00
David S. Miller
1304a7343b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-08-22 14:21:38 -07:00
Jean Sacren
90efbed18a netfilter: remove unnecessary goto statement for error recovery
Usually it's a good practice to use goto statement for error recovery
when initializing the module. This approach could be an overkill if:

 1) there is only one fail case;
 2) success and failure use the same return statement.

For a cleaner approach, remove the unnecessary goto statement and
directly implement error recovery.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-22 19:17:38 +02:00
Michael Wang
6705e86724 netfilter: replace list_for_each_continue_rcu with new interface
This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to allow removing
list_for_each_continue_rcu().

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-22 19:17:20 +02:00
Eric Dumazet
e0e3cea46d af_netlink: force credentials passing [CVE-2012-3520]
Pablo Neira Ayuso discovered that avahi and
potentially NetworkManager accept spoofed Netlink messages because of a
kernel bug.  The kernel passes all-zero SCM_CREDENTIALS ancillary data
to the receiver if the sender did not provide such data, instead of not
including any such data at all or including the correct data from the
peer (as it is the case with AF_UNIX).

This bug was introduced in commit 16e5726269
(af_unix: dont send SCM_CREDENTIALS by default)

This patch forces passing credentials for netlink, as
before the regression.

Another fix would be to not add SCM_CREDENTIALS in
netlink messages if not provided by the sender, but it
might break some programs.

With help from Florian Weimer & Petr Matousek

This issue is designated as CVE-2012-3520

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:53:01 -07:00
Eric Dumazet
a9915a1b52 ipv4: fix ip header ident selection in __ip_make_skb()
Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
memory in __ip_select_ident().

It turns out that __ip_make_skb() called ip_select_ident() before
properly initializing iph->daddr.

This is a bug uncovered by commit 1d861aa4b3 (inet: Minimize use of
cached route inetpeer.)

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:51:06 -07:00
Christoph Paasch
1a7b27c97c ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
Since 0e73441992 ("ipv4: Use inet_csk_route_child_sock() in DCCP and
TCP."), inet_csk_route_child_sock() is called instead of
inet_csk_route_req().

However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
ireq->opt is set to NULL, before calling inet_csk_route_child_sock().
Thus, inside inet_csk_route_child_sock() opt is always NULL and the
SRR-options are not respected anymore.
Packets sent by the server won't have the correct destination-IP.

This patch fixes it by accessing newinet->inet_opt instead of ireq->opt
inside inet_csk_route_child_sock().

Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:49:11 -07:00
Eric Dumazet
144d56e910 tcp: fix possible socket refcount problem
Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
added bug leading to following trace :

[ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.131726]
[ 2866.132188] =========================
[ 2866.132281] [ BUG: held lock freed! ]
[ 2866.132281] 3.6.0-rc1+ #622 Not tainted
[ 2866.132281] -------------------------
[ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
[ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] 4 locks held by kworker/0:1/652:
[ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
[ 2866.132281]  #1:  ((&task->u.tk_work)){+.+.+.}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
[ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
[ 2866.132281]  #3:  (&icsk->icsk_retransmit_timer){+.-...}, at: [<ffffffff81078017>] run_timer_softirq+0x1ad/0x35f
[ 2866.132281]
[ 2866.132281] stack backtrace:
[ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
[ 2866.132281] Call Trace:
[ 2866.132281]  <IRQ>  [<ffffffff810bc527>] debug_check_no_locks_freed+0x112/0x159
[ 2866.132281]  [<ffffffff818a0839>] ? __sk_free+0xfd/0x114
[ 2866.132281]  [<ffffffff811549fa>] kmem_cache_free+0x6b/0x13a
[ 2866.132281]  [<ffffffff818a0839>] __sk_free+0xfd/0x114
[ 2866.132281]  [<ffffffff818a08c0>] sk_free+0x1c/0x1e
[ 2866.132281]  [<ffffffff81911e1c>] tcp_write_timer+0x51/0x56
[ 2866.132281]  [<ffffffff81078082>] run_timer_softirq+0x218/0x35f
[ 2866.132281]  [<ffffffff81078017>] ? run_timer_softirq+0x1ad/0x35f
[ 2866.132281]  [<ffffffff810f5831>] ? rb_commit+0x58/0x85
[ 2866.132281]  [<ffffffff81911dcb>] ? tcp_write_timer_handler+0x148/0x148
[ 2866.132281]  [<ffffffff81070bd6>] __do_softirq+0xcb/0x1f9
[ 2866.132281]  [<ffffffff81a0a00c>] ? _raw_spin_unlock+0x29/0x2e
[ 2866.132281]  [<ffffffff81a1227c>] call_softirq+0x1c/0x30
[ 2866.132281]  [<ffffffff81039f38>] do_softirq+0x4a/0xa6
[ 2866.132281]  [<ffffffff81070f2b>] irq_exit+0x51/0xad
[ 2866.132281]  [<ffffffff81a129cd>] do_IRQ+0x9d/0xb4
[ 2866.132281]  [<ffffffff81a0a3ef>] common_interrupt+0x6f/0x6f
[ 2866.132281]  <EOI>  [<ffffffff8109d006>] ? sched_clock_cpu+0x58/0xd1
[ 2866.132281]  [<ffffffff81a0a172>] ? _raw_spin_unlock_irqrestore+0x4c/0x56
[ 2866.132281]  [<ffffffff81078692>] mod_timer+0x178/0x1a9
[ 2866.132281]  [<ffffffff818a00aa>] sk_reset_timer+0x19/0x26
[ 2866.132281]  [<ffffffff8190b2cc>] tcp_rearm_rto+0x99/0xa4
[ 2866.132281]  [<ffffffff8190dfba>] tcp_event_new_data_sent+0x6e/0x70
[ 2866.132281]  [<ffffffff8190f7ea>] tcp_write_xmit+0x7de/0x8e4
[ 2866.132281]  [<ffffffff818a565d>] ? __alloc_skb+0xa0/0x1a1
[ 2866.132281]  [<ffffffff8190f952>] __tcp_push_pending_frames+0x2e/0x8a
[ 2866.132281]  [<ffffffff81904122>] tcp_sendmsg+0xb32/0xcc6
[ 2866.132281]  [<ffffffff819229c2>] inet_sendmsg+0xaa/0xd5
[ 2866.132281]  [<ffffffff81922918>] ? inet_autobind+0x5f/0x5f
[ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [<ffffffff8189adab>] sock_sendmsg+0xa3/0xc4
[ 2866.132281]  [<ffffffff810f5de6>] ? rb_reserve_next_event+0x26f/0x2d5
[ 2866.132281]  [<ffffffff8103e6a9>] ? native_sched_clock+0x29/0x6f
[ 2866.132281]  [<ffffffff8103e6f8>] ? sched_clock+0x9/0xd
[ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [<ffffffff8189ae03>] kernel_sendmsg+0x37/0x43
[ 2866.132281]  [<ffffffff8199ce49>] xs_send_kvec+0x77/0x80
[ 2866.132281]  [<ffffffff8199cec1>] xs_sendpages+0x6f/0x1a0
[ 2866.132281]  [<ffffffff8107826d>] ? try_to_del_timer_sync+0x55/0x61
[ 2866.132281]  [<ffffffff8199d0d2>] xs_tcp_send_request+0x55/0xf1
[ 2866.132281]  [<ffffffff8199bb90>] xprt_transmit+0x89/0x1db
[ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
[ 2866.132281]  [<ffffffff81999d92>] call_transmit+0x1c5/0x20e
[ 2866.132281]  [<ffffffff819a0d55>] __rpc_execute+0x6f/0x225
[ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
[ 2866.132281]  [<ffffffff819a0f33>] rpc_async_schedule+0x28/0x34
[ 2866.132281]  [<ffffffff810835d6>] process_one_work+0x24d/0x47f
[ 2866.132281]  [<ffffffff81083567>] ? process_one_work+0x1de/0x47f
[ 2866.132281]  [<ffffffff819a0f0b>] ? __rpc_execute+0x225/0x225
[ 2866.132281]  [<ffffffff81083a6d>] worker_thread+0x236/0x317
[ 2866.132281]  [<ffffffff81083837>] ? process_scheduled_works+0x2f/0x2f
[ 2866.132281]  [<ffffffff8108b7b8>] kthread+0x9a/0xa2
[ 2866.132281]  [<ffffffff81a12184>] kernel_thread_helper+0x4/0x10
[ 2866.132281]  [<ffffffff81a0a4b0>] ? retint_restore_args+0x13/0x13
[ 2866.132281]  [<ffffffff8108b71e>] ? __init_kthread_worker+0x5a/0x5a
[ 2866.132281]  [<ffffffff81a12180>] ? gs_change+0x13/0x13
[ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.309689] =============================================================================
[ 2866.310254] BUG TCP (Not tainted): Object already free
[ 2866.310254] -----------------------------------------------------------------------------
[ 2866.310254]

The bug comes from the fact that timer set in sk_reset_timer() can run
before we actually do the sock_hold(). socket refcount reaches zero and
we free the socket too soon.

timer handler is not allowed to reduce socket refcnt if socket is owned
by the user, or we need to change sk_reset_timer() implementation.

We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
was used instead of TCP_DELACK_TIMER_DEFERRED.

For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
even if not fired from a timer.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:42:23 -07:00
Patrick McHardy
2834a6386b netfilter: nf_conntrack: remove unnecessary RTNL locking
Locking the rtnl was added to nf_conntrack_l{3,4}_proto_unregister()
for walking the network namespace list. This is not done anymore since
we have proper namespace support in the protocols now, so we don't
need to take the RTNL anymore.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:46:29 +02:00
Patrick McHardy
fe31d1a860 netfilter: sparse endian fixes
Fix a couple of endian annotation in net/netfilter:

net/netfilter/nfnetlink_acct.c:82:30: warning: cast to restricted __be64
net/netfilter/nfnetlink_acct.c:86:30: warning: cast to restricted __be64
net/netfilter/nfnetlink_cthelper.c:77:28: warning: cast to restricted __be16
net/netfilter/xt_NFQUEUE.c:46:16: warning: restricted __be32 degrades to integer
net/netfilter/xt_NFQUEUE.c:60:34: warning: restricted __be32 degrades to integer
net/netfilter/xt_NFQUEUE.c:68:34: warning: restricted __be32 degrades to integer
net/netfilter/xt_osf.c:272:55: warning: cast to restricted __be16

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:45:57 +02:00
Neal Cardwell
fae6ef87fa net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
This commit removes the sk_rx_dst_set calls from
tcp_create_openreq_child(), because at that point the icsk_af_ops
field of ipv6_mapped TCP sockets has not been set to its proper final
value.

Instead, to make sure we get the right sk_rx_dst_set variant
appropriate for the address family of the new connection, we have
tcp_v{4,6}_syn_recv_sock() directly call the appropriate function
shortly after the call to tcp_create_openreq_child() returns.

This also moves inet6_sk_rx_dst_set() to avoid a forward declaration
with the new approach.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reported-by: Artem Savkov <artem.savkov@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 03:03:33 -07:00
Randy Dunlap
3de7a37b02 net/core/dev.c: fix kernel-doc warning
Fix kernel-doc warning:

Warning(net/core/dev.c:5745): No description found for parameter 'dev'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc:	"David S. Miller" <davem@davemloft.net>
Cc:	netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 03:00:55 -07:00
Patrick McHardy
9d7b0fc1ef net: ipv6: fix oops in inet_putpeer()
Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
initialized, causing a false positive result from inetpeer_ptr_is_peer(),
which in turn causes a NULL pointer dereference in inet_putpeer().

Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
EIP: 0060:[<c03abf93>] EFLAGS: 00010246 CPU: 0
EIP is at inet_putpeer+0xe/0x16
EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
 DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
 f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
 f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
 [<c0423de1>] xfrm6_dst_destroy+0x42/0xdb
 [<c038d5f7>] dst_destroy+0x1d/0xa4
 [<c03ef48d>] xfrm_bundle_flo_delete+0x2b/0x36
 [<c0396870>] flow_cache_gc_task+0x85/0x9f
 [<c0142d2b>] process_one_work+0x122/0x441
 [<c043feb5>] ? apic_timer_interrupt+0x31/0x38
 [<c03967eb>] ? flow_cache_new_hashrnd+0x2b/0x2b
 [<c0143e2d>] worker_thread+0x113/0x3cc

Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
properly initialize the dst's peer pointer.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:56:56 -07:00
Jesper Juhl
d92c7f8aab caif: Do not dereference NULL in chnl_recv_cb()
In net/caif/chnl_net.c::chnl_recv_cb() we call skb_header_pointer()
which may return NULL, but we do not check for a NULL pointer before
dereferencing it.
This patch adds such a NULL check and properly free's allocated memory
and return an error (-EINVAL) on failure - much better than crashing..

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:47:49 -07:00
David S. Miller
6c71bec66a Merge git://1984.lsi.us.es/nf
Pable Neira Ayuso says:

====================
The following five patches contain fixes for 3.6-rc, they are:

* Two fixes for message parsing in the SIP conntrack helper, from
  Patrick McHardy.

* One fix for the SIP helper introduced in the user-space cthelper
  infrastructure, from Patrick McHardy.

* fix missing appropriate locking while modifying one conntrack entry
  from the nfqueue integration code, from myself.

* fix possible access to uninitiliazed timer in the nf_conntrack
  expectation infrastructure, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:44:29 -07:00
Eric Leblond
c0de08d042 af_packet: don't emit packet on orig fanout group
If a packet is emitted on one socket in one group of fanout sockets,
it is transmitted again. It is thus read again on one of the sockets
of the fanout group. This result in a loop for software which
generate packets when receiving one.
This retransmission is not the intended behavior: a fanout group
must behave like a single socket. The packet should not be
transmitted on a socket if it originates from a socket belonging
to the same fanout group.

This patch fixes the issue by changing the transmission check to
take fanout group info account.

Reported-by: Aleksandr Kotov <a1k@mail.ru>
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:37:29 -07:00
Ying Xue
e6a04b1d3f tipc: eliminate configuration for maximum number of name publications
Gets rid of the need for users to specify the maximum number of
name publications supported by TIPC. TIPC now automatically provides
support for the maximum number of name publications to 65535.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00
Ying Xue
34f256cc79 tipc: eliminate configuration for maximum number of name subscriptions
Gets rid of the need for users to specify the maximum number of
name subscriptions supported by TIPC. TIPC now automatically provides
support for the maximum number of name subscriptions to 65535.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00