Commit graph

14125 commits

Author SHA1 Message Date
Vlad Yasevich
4814326b59 sctp: prevent too-fast association id reuse
We use the idr subsystem and always ask for an id
at or above 1.  This results in a id reuse when one
association is terminated while another is created.

To prevent re-use, we keep track of the last id returned
and ask for that id + 1 as a base for each query.  We let
the idr spin lock protect this base id as well.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:54:01 -05:00
Andrei Pelinescu-Onciul
da85b7396f sctp: fix integer overflow when setting the autoclose timer
When setting the autoclose timeout in jiffies there is a possible
integer overflow if the value in seconds is very large
(e.g. for 2^22 s with HZ=1024). The problem appears even on
64-bit due to the integer promotion rules. The fix is just a cast
 to unsigned long.

Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:54:01 -05:00
Andrei Pelinescu-Onciul
f6778aab6c sctp: limit maximum autoclose setsockopt value
To avoid overflowing the maximum timer interval when transforming
the  autoclose interval from seconds to jiffies, limit the maximum
autoclose value to MAX_SCHEDULE_TIMEOUT/HZ.

Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:54:01 -05:00
Neil Horman
d8dd15781d sctp: Fix mis-ordering of user space data when multihoming in use
Recently had a bug reported to me, in which the user was sending
packets with a payload containing a sequence number.  The packets
were getting delivered in order according the chunk TSN values, but
the sequence values in the payload were arriving out of order.  At
first I thought it must be an application error, but we eventually
found it to be a problem on the transmit side in the sctp stack.

The conditions for the error are that multihoming must be in use,
and it helps if each transport has a different pmtu.  The problem
occurs in sctp_outq_flush.  Basically we dequeue packets from the
data queue, and attempt to append them to the orrered packet for a
given transport.  After we append a data chunk we add the trasport
to the end of a list of transports to have their packets sent at
the end of sctp_outq_flush.  The problem occurs when a data chunks
fills up a offered packet on a transport.  The function that does
the appending (sctp_packet_transmit_chunk), will try to call
sctp_packet_transmit on the full packet, and then append the chunk
to a new packet.  This call to sctp_packet_transmit, sends that
packet ahead of the others that may be queued in the transport_list
in sctp_outq_flush.  The result is that frames that were sent in one
order from the user space sending application get re-ordered prior
to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing
of data payloads, even though tsn ordering is correct.

The fix is to change where we assign a tsn.  By doing this earlier,
we are then free to place chunks in packets, whatever way we
see fit and the protocol will make sure to do all the appropriate
re-ordering on receive as is needed.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: William Reich <reich@ulticom.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:54:00 -05:00
Vlad Yasevich
46d5a80855 sctp: Update max.burst implementation
Current implementation of max.burst ends up limiting new
data during cwnd decay period.  The decay is happening becuase
the connection is idle and we are allowed to fill the congestion
window.  The point of max.burst is to limit micro-bursts in response
to large acks.  This still happens, as max.burst is still applied
to each transmit opportunity.  It will also apply if a very large
send is made (greater then allowed by burst).

Tested-by: Florian Niederbacher <florian.niederbacher@student.uibk.ac.at>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:54:00 -05:00
Vlad Yasevich
245cba7e55 sctp: Remove useless last_time_used variable
The transport last_time_used variable is rather useless.
It was only used when determining if CWND needs to be updated
due to idle transport.  However, idle transport detection was
based on a Heartbeat timer and last_time_used was not incremented
when sending Heartbeats.  As a result the check for cwnd reduction
was always true.  We can get rid of the variable and just base
our cwnd manipulation on the HB timer (like the code comment sais).
We also have to call into the cwnd manipulation function regardless
of whether HBs are enabled or not.  That way we will detect idle
transports if the user has disabled Heartbeats.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:58 -05:00
Amerigo Wang
a242b41ded sctp: remove deprecated SCTP_GET_*_OLD stuffs
SCTP_GET_*_OLD stuffs are schedlued to be removed.

Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:58 -05:00
Andrei Pelinescu-Onciul
37051f7386 sctp: allow setting path_maxrxt independent of SPP_PMTUD_ENABLE
Since draft-ietf-tsvwg-sctpsocket-15.txt, setting the
SPP_MTUD_ENABLE flag when changing pathmaxrxt via the
SCTP_PEER_ADDR_PARAMS setsockopt is not required any
longer.

Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:57 -05:00
Vlad Yasevich
90f2f5318b sctp: Update SWS avaoidance receiver side algorithm
We currently send window update SACKs every time we free up 1 PMTU
worth of data.  That a lot more SACKs then necessary.  Instead, we'll
now send back the actuall window every time we send a sack, and do
window-update SACKs when a fraction of the receive buffer has been
opened.  The fraction is controlled with a sysctl.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:57 -05:00
Vlad Yasevich
e0e9db178a sctp: Select a working primary during sctp_connectx()
When sctp_connectx() is used, we pick the first address as
primary, even though it may not have worked.  This results
in excessive retransmits and poor performance.  We should
select the address that the association was established with.

Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:57 -05:00
Vlad Yasevich
6383cfb3ed sctp: Fix malformed "Invalid Stream Identifier" error
The "Invalid Stream Identifier" error has a 16 bit reserved
field at the end, thus making the parameter length be 8 bytes.
We've never supplied that reserved field making wireshark
tag the packet as malformed.

Reported-by: Chris Dischino <cdischino@sonusnet.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:56 -05:00
Wei Yongjun
b93d647174 sctp: implement the sender side for SACK-IMMEDIATELY extension
This patch implement the sender side for SACK-IMMEDIATELY
extension.

  Section 4.1.  Sender Side Considerations

  Whenever the sender of a DATA chunk can benefit from the
  corresponding SACK chunk being sent back without delay, the sender
  MAY set the I-bit in the DATA chunk header.

  Reasons for setting the I-bit include

  o  The sender is in the SHUTDOWN-PENDING state.

  o  The application requests to set the I-bit of the last DATA chunk
     of a user message when providing the user message to the SCTP
     implementation.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:56 -05:00
Wei Yongjun
6dc7694f9d sctp: implement the receiver side for SACK-IMMEDIATELY extension
This patch implement the receiver side for SACK-IMMEDIATELY
extension:

  Section 4.2.  Receiver Side Considerations

  On reception of an SCTP packet containing a DATA chunk with the I-bit
  set, the receiver SHOULD NOT delay the sending of the corresponding
  SACK chunk and SHOULD send it back immediately.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2009-11-23 15:53:53 -05:00
Joe Perches
9d4fb27db9 net/ipv4: Move && and || to end of previous line
On Sun, 2009-11-22 at 16:31 -0800, David Miller wrote:
> It should be of the form:
> 	if (x &&
> 	    y)
> 
> or:
> 	if (x && y)
> 
> Fix patches, rather than complaints, for existing cases where things
> do not follow this pattern are certainly welcome.

Also collapsed some multiple tabs to single space.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-23 10:41:23 -08:00
Eric Dumazet
593f63b0be pktgen: Fix device name compares
Commit e6fce5b916 (pktgen: multiqueue etc.) tried to relax
the pktgen restriction of one device per kernel thread, adding a '@'
tag to device names.

Problem is we dont perform check on full pktgen device name.
This allows adding many time same 'device' to pktgen thread

 pgset "add_device eth0@0"

one session later :

 pgset "add_device eth0@0"

(This doesnt find previous device)

This consumes ~1.5 MBytes of vmalloc memory per round and also triggers
this warning :

[  673.186380] proc_dir_entry 'pktgen/eth0@0' already registered
[  673.186383] Modules linked in: pktgen ixgbe ehci_hcd psmouse mdio mousedev evdev [last unloaded: pktgen]
[  673.186406] Pid: 6219, comm: bash Tainted: G        W  2.6.32-rc7-03302-g41cec6f-dirty #16
[  673.186410] Call Trace:
[  673.186417]  [<ffffffff8104a29b>] warn_slowpath_common+0x7b/0xc0
[  673.186422]  [<ffffffff8104a341>] warn_slowpath_fmt+0x41/0x50
[  673.186426]  [<ffffffff8114e789>] proc_register+0x109/0x210
[  673.186433]  [<ffffffff8100bf2e>] ? apic_timer_interrupt+0xe/0x20
[  673.186438]  [<ffffffff8114e905>] proc_create_data+0x75/0xd0
[  673.186444]  [<ffffffffa006ad38>] pktgen_thread_write+0x568/0x640 [pktgen]
[  673.186449]  [<ffffffffa006a7d0>] ? pktgen_thread_write+0x0/0x640 [pktgen]
[  673.186453]  [<ffffffff81149144>] proc_reg_write+0x84/0xc0
[  673.186458]  [<ffffffff810f5a58>] vfs_write+0xb8/0x180
[  673.186463]  [<ffffffff810f5c11>] sys_write+0x51/0x90
[  673.186468]  [<ffffffff8100b51b>] system_call_fastpath+0x16/0x1b
[  673.186470] ---[ end trace ccbb991b0a8d994d ]---

Solution to this problem is to use a odevname field (includes @ tag and suffix),
instead of using netdevice name.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-23 10:39:35 -08:00
David S. Miller
73570314e4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-11-23 09:52:51 -08:00
Patrick McHardy
8fa539bd91 netfilter: xt_limit: fix invalid return code in limit_mt_check()
Commit acc738fe (netfilter: xtables: avoid pointer to self) introduced
an invalid return value in limit_mt_check().

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-23 13:37:23 +01:00
Florian Westphal
3a0429292d netfilter: xtables: fix conntrack match v1 ipt-save output
commit d6d3f08b0f
(netfilter: xtables: conntrack match revision 2) does break the
v1 conntrack match iptables-save output in a subtle way.

Problem is as follows:

    up = kmalloc(sizeof(*up), GFP_KERNEL);
[..]
   /*
    * The strategy here is to minimize the overhead of v1 matching,
    * by prebuilding a v2 struct and putting the pointer into the
    * v1 dataspace.
    */
    memcpy(up, info, offsetof(typeof(*info), state_mask));
[..]
    *(void **)info  = up;

As the v2 struct pointer is saved in the match data space,
it clobbers the first structure member (->origsrc_addr).

Because the _v1 match function grabs this pointer and does not actually
look at the v1 origsrc, run time functionality does not break.
But iptables -nvL (or iptables-save) cannot know that v1 origsrc_addr
has been overloaded in this way:

$ iptables -p tcp -A OUTPUT -m conntrack --ctorigsrc 10.0.0.1 -j ACCEPT
$ iptables-save
-A OUTPUT -p tcp -m conntrack --ctorigsrc 128.173.134.206 -j ACCEPT

(128.173... is the address to the v2 match structure).

To fix this, we take advantage of the fact that the v1 and v2 structures
are identical with exception of the last two structure members (u8 in v1,
u16 in v2).

We extract them as early as possible and prevent the v2 matching function
from looking at those two members directly.

Previously reported by Michel Messerschmidt via Ben Hutchings, also
see Debian Bug tracker #556587.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-23 10:43:57 +01:00
Pablo Neira Ayuso
c4832c7bbc netfilter: nf_ct_tcp: improve out-of-sync situation in TCP tracking
Without this patch, if we receive a SYN packet from the client while
the firewall is out-of-sync, we let it go through. Then, if we see
the SYN/ACK reply coming from the server, we destroy the conntrack
entry and drop the packet to trigger a new retransmission. Then,
the retransmision from the client is used to start a new clean
session.

This patch improves the current handling. Basically, if we see an
unexpected SYN packet, we annotate the TCP options. Then, if we
see the reply SYN/ACK, this means that the firewall was indeed
out-of-sync. Therefore, we set a clean new session from the existing
entry based on the annotated values.

This patch adds two new 8-bits fields that fit in a 16-bits gap of
the ip_ct_tcp structure.

This patch is particularly useful for conntrackd since the
asynchronous nature of the state-synchronization allows to have
backup nodes that are not perfect copies of the master. This helps
to improve the recovery under some worst-case scenarios.

I have tested this by creating lots of conntrack entries in wrong
state:

for ((i=1024;i<65535;i++)); do conntrack -I -p tcp -s 192.168.2.101 -d 192.168.2.2 --sport $i --dport 80 -t 800 --state ESTABLISHED -u ASSURED,SEEN_REPLY; done

Then, I make some TCP connections:

$ echo GET / | nc 192.168.2.2 80

The events show the result:

 [UPDATE] tcp      6 60 SYN_RECV src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
 [UPDATE] tcp      6 120 FIN_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
 [UPDATE] tcp      6 30 LAST_ACK src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]
 [UPDATE] tcp      6 120 TIME_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED]

and tcpdump shows no retransmissions:

20:47:57.271951 IP 192.168.2.101.33221 > 192.168.2.2.www: S 435402517:435402517(0) win 5840 <mss 1460,sackOK,timestamp 4294961827 0,nop,wscale 6>
20:47:57.273538 IP 192.168.2.2.www > 192.168.2.101.33221: S 3509927945:3509927945(0) ack 435402518 win 5792 <mss 1460,sackOK,timestamp 235681024 4294961827,nop,wscale 4>
20:47:57.273608 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024>
20:47:57.273693 IP 192.168.2.101.33221 > 192.168.2.2.www: P 435402518:435402524(6) ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024>
20:47:57.275492 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402524 win 362 <nop,nop,timestamp 235681024 4294961827>
20:47:57.276492 IP 192.168.2.2.www > 192.168.2.101.33221: P 3509927946:3509928082(136) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827>
20:47:57.276515 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509928082 win 108 <nop,nop,timestamp 4294961828 235681025>
20:47:57.276521 IP 192.168.2.2.www > 192.168.2.101.33221: F 3509928082:3509928082(0) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827>
20:47:57.277369 IP 192.168.2.101.33221 > 192.168.2.2.www: F 435402524:435402524(0) ack 3509928083 win 108 <nop,nop,timestamp 4294961828 235681025>
20:47:57.279491 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402525 win 362 <nop,nop,timestamp 235681025 4294961828>

I also added a rule to log invalid packets, with no occurrences  :-) .

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-23 10:37:34 +01:00
Jaswinder Singh Rajput
6ebfbc0656 net: Fix missing kernel-doc notation
Fix the following htmldocs warning:

  Warning(net/core/dev.c:5378): bad line:

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-22 20:43:13 -08:00
David S. Miller
e994b7c901 tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTL
That's extremely non-intuitive, noticed by William Allen Simpson.

And let's make the default be on, it's been suggested by a lot of
people so we'll give it a try.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-21 11:22:25 -08:00
Eric Dumazet
8964be4a9a net: rename skb->iif to skb->skb_iif
To help grep games, rename iif to skb_iif

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-20 15:35:04 -08:00
Linus Torvalds
e6236f781c Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  SUNRPC: Address buffer overrun in rpc_uaddr2sockaddr()
  NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
2009-11-19 13:43:19 -08:00
Patrick McHardy
6440fe059e netfilter: nf_log: fix sleeping function called from invalid context in seq_show()
[  171.925285] BUG: sleeping function called from invalid context at kernel/mutex.c:280
[  171.925296] in_atomic(): 1, irqs_disabled(): 0, pid: 671, name: grep
[  171.925306] 2 locks held by grep/671:
[  171.925312]  #0:  (&p->lock){+.+.+.}, at: [<c10b8acd>] seq_read+0x25/0x36c
[  171.925340]  #1:  (rcu_read_lock){.+.+..}, at: [<c1391dac>] seq_start+0x0/0x44
[  171.925372] Pid: 671, comm: grep Not tainted 2.6.31.6-4-netbook #3
[  171.925380] Call Trace:
[  171.925398]  [<c105104e>] ? __debug_show_held_locks+0x1e/0x20
[  171.925414]  [<c10264ac>] __might_sleep+0xfb/0x102
[  171.925430]  [<c1461521>] mutex_lock_nested+0x1c/0x2ad
[  171.925444]  [<c1391c9e>] seq_show+0x74/0x127
[  171.925456]  [<c10b8c5c>] seq_read+0x1b4/0x36c
[  171.925469]  [<c10b8aa8>] ? seq_read+0x0/0x36c
[  171.925483]  [<c10d5c8e>] proc_reg_read+0x60/0x74
[  171.925496]  [<c10d5c2e>] ? proc_reg_read+0x0/0x74
[  171.925510]  [<c10a4468>] vfs_read+0x87/0x110
[  171.925523]  [<c10a458a>] sys_read+0x3b/0x60
[  171.925538]  [<c1002a49>] syscall_call+0x7/0xb

Fix it by replacing RCU with nf_log_mutex.

Reported-by: "Yin, Kangkai" <kangkai.yin@intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-19 13:16:31 -08:00
Patrick McHardy
d667b9cfd0 netfilter: xt_osf: fix xt_osf_remove_callback() return value
Return a negative error value.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-19 13:16:26 -08:00
Rui Paulo
9f13084d52 mac80211: fix endianess on mesh_path_error_tx() calls
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:38:24 -05:00
Johannes Berg
64491f0ec8 mac80211: add per-station HT capability file
This is sometimes useful to debug HT issues
as it shows what exactly the stack thinks
the peer supports.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:09:08 -05:00
Johannes Berg
a58ce43f2f mac80211: avoid spurious deauth frames/messages
With WEXT, it happens frequently that the SME
requests an authentication but then deauthenticates
right away because some new parameters came along.
Every time this happens we print a deauth message
and send a deauth frame, but both of that is rather
confusing. Avoid it by aborting the authentication
process silently, and telling cfg80211 about that.

The patch looks larger than it really is:
__cfg80211_auth_remove() is split out from
cfg80211_send_auth_timeout(), there's no new code
except __cfg80211_auth_canceled() (a one-liner) and
the mac80211 bits (7 new lines of code).

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:09:02 -05:00
Johannes Berg
7351c6bd48 mac80211: request TX status where needed
Right now all frames mac80211 hands to the driver
have the IEEE80211_TX_CTL_REQ_TX_STATUS flag set to
request TX status. This isn't really necessary, only
the injected frames need TX status (the latter for
hostapd) so move setting this flag.

The rate control algorithms also need TX status, but
they don't require it.

Also, rt2x00 uses that bit for its own purposes and
seems to require it being set for all frames, but
that can be fixed in rt2x00.

This doesn't really change anything for any drivers
but in the future drivers using hw-rate control may
opt to not report TX status for frames that don't
have the IEEE80211_TX_CTL_REQ_TX_STATUS flag set.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com> [rt2x00 bits]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:08:56 -05:00
Johannes Berg
ad4bb6f888 cfg80211: disallow bridging managed/adhoc interfaces
A number of people have tried to add a wireless interface
(in managed mode) to a bridge and then complained that it
doesn't work. It cannot work, however, because in 802.11
networks all packets need to be acknowledged and as such
need to be sent to the right address. Promiscuous doesn't
help here. The wireless address format used for these
links has only space for three addresses, the
 * transmitter, which must be equal to the sender (origin)
 * receiver (on the wireless medium), which is the AP in
   the case of managed mode
 * the recipient (destination), which is on the APs local
   network segment

In an IBSS, it is similar, but the receiver and recipient
must match and the third address is used as the BSSID.

To avoid such mistakes in the future, disallow adding a
wireless interface to a bridge.

Felix has recently added a four-address mode to the AP
and client side that can be used (after negotiating that
it is possible, which must happen out-of-band by setting
up both sides) for bridging, so allow that case.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:08:54 -05:00
Johannes Berg
9bc383de37 cfg80211: introduce capability for 4addr mode
It's very likely that not many devices will support
four-address mode in station or AP mode so introduce
capability bits for both modes, set them in mac80211
and check them when userspace tries to use the mode.
Also, keep track of 4addr in cfg80211 (wireless_dev)
and not in mac80211 any more. mac80211 can also be
improved for the VLAN case by not looking at the
4addr flag but maintaining the station pointer for
it correctly. However, keep track of use_4addr for
station mode in mac80211 to avoid all the derefs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:08:53 -05:00
Johannes Berg
5be83de54c cfg80211: convert bools into flags
We've accumulated a number of options for wiphys
which make more sense as flags as we keep adding
more. Convert the existing ones.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:08:50 -05:00
Johannes Berg
ceb99fe071 mac80211: fix resume
When mac80211 resumes, it currently first sets suspended
to false so the driver can start doing things and we can
receive frames.

However, if we actually receive frames then it can end
up starting some work which adds timers and then later
runs into a BUG_ON in the timer code because it tries
add_timer() on a pending timer.

Fix this by keeping track of the resuming process by
introducing a new variable 'resuming' which gets set to
true early on instead of setting 'suspended' to false,
and allow queueing work but not receiving frames while
resuming.

Reported-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-19 11:08:39 -05:00
Andrew Hendry
386e50cc7d X25: Enable setting of cause and diagnostic fields
Adds SIOCX25SCAUSEDIAG, allowing X.25 programs to set the cause and
diagnostic fields.

Normally used to indicate status upon closing connections.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 23:30:41 -08:00
Eric Dumazet
2939e27599 netsched: Allow var_sk_bound_if meta to work on all namespaces
This fix can probably wait 2.6.33, or should use another patch
if needed in 2.6.32 (no get_dev_by_index_rcu() before 2.6.33)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 23:24:41 -08:00
David S. Miller
3505d1a9fd Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/sfc/sfe4001.c
	drivers/net/wireless/libertas/cmd.c
	drivers/staging/Kconfig
	drivers/staging/Makefile
	drivers/staging/rtl8187se/Kconfig
	drivers/staging/rtl8192e/Kconfig
2009-11-18 22:19:03 -08:00
Linus Torvalds
486bfe5c7c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
  cxgb3: fix premature page unmap
  ibm_newemac: Fix EMACx_TRTR[TRT] bit shifts
  vlan: Fix register_vlan_dev() error path
  gro: Fix illegal merging of trailer trash
  sungem: Fix Serdes detection.
  net: fix mdio section mismatch warning
  ppp: fix BUG on non-linear SKB (multilink receive)
  ixgbe: Fixing EEH handler to handle more than one error
  net: Fix the rollback test in dev_change_name()
  Revert "isdn: isdn_ppp: Use SKB list facilities instead of home-grown implementation."
  TI Davinci EMAC : Fix Console Hang when bringing the interface down
  smsc911x: Fix Console Hang when bringing the interface down.
  mISDN: fix error return in HFCmulti_init()
  forcedeth: mac address fix
  r6040: fix version printing
  Bluetooth: Fix regression with L2CAP configuration in Basic Mode
  Bluetooth: Select Basic Mode as default for SOCK_SEQPACKET
  Bluetooth: Set general bonding security for ACL by default
  r8169: Fix receive buffer length when MTU is between 1515 and 1536
  can: add the missing netlink get_xstats_size callback
  ...
2009-11-18 14:54:45 -08:00
Rui Paulo
76aa5e704c mac80211: update cfg80211 scan result code for the updated mesh conf IE
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:28 -05:00
Rui Paulo
136cfa2861 mac80211: use a structure to hold the mesh config information element
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:27 -05:00
Johannes Berg
fe7a5d5c1a mac80211: move TX status handling
It's enough code to have its own file, I think.
Especially since I'm going to add to it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:27 -05:00
Johannes Berg
62ae67be31 mac80211: remove encrypt parameter from ieee80211_tx_skb
Since the flags moved into skb->cb, there's no
longer a need to have the encrypt bool passed
into the function, anyone who requires it set
to 0 (false) can just set the flag directly.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:27 -05:00
Marcel Holtmann
875405a779 rfkill: Add constant for RFKILL_TYPE_FM radio devices
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Janakiram Sistla <janakiram.sistla@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:26 -05:00
Johannes Berg
98d3a7ca92 cfg80211: re-join IBSS when privacy changes
When going from/to a WEP protected IBSS, we need to
leave this one and join a new one to take care of
the changed capability.

Cc: Hong Zhang <henryzhang62@yahoo.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:25 -05:00
Sujith
0bc6b1871c mac80211: Fix panic in aggregation handling
Not assigning the vif pointer causes an oops.
This patch fixes it.

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:25 -05:00
Jouni Malinen
24b6b15f7d cfg80211: Allow reassociation in associated state
cfg80211 rejects all association requests when in associated state. This
prevents clean roaming within an ESS since one would first need to
disassociate before being able to request reassociation.

Accept the reassociation request and let the old association to be
dropped when the new one is completed. This fixes nl80211-based
roaming with the current snapshot version of wpa_supplicant (that has
code for requesting reassociation explicitly withthe previous BSSID
attribute).

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:24 -05:00
Johannes Berg
af65cd96dd mac80211: make software rate control optional
Some devices implement the entire rate control in
firmware in some way, like wl1271 or like iwlwifi
which does some things in software but not a lot.
Therefore generic software rate control is rather
useless for them and just adds avoidable overhead
to the transmit path.

It's fairly simple to let drivers indicate that
they do not need rate control, but they need to
fulfil a number of conditions that we encode in
WARN_ONs.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:24 -05:00
Johannes Berg
15ff63653e mac80211: use fixed broadcast address
The netdev broadcast address cannot change from
all-ones so there's no need to use it; we can
instead hard-code it. Since we already have an
instance in tkip.c, which will be shared if it
is marked static const, doing this reduces text
size at no data/bss cost.

The real motivation for this is, of course, the
desire to get rid of almost all uses of netdevs
in mac80211 so that auditing their use becomes
easier.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:18 -05:00
Johannes Berg
d84f323477 mac80211: remove dev_hold/put calls
If we move the rcu sections a little, there's
no need to touch the device refcount.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:18 -05:00
Johannes Berg
5f0b7de59f mac80211: improve rate handling
Some code currently assumes that there's a valid
rate pointer even in the HT case, but there can't
be. To reduce reliance on that, remove the rate
pointer from the RX data struct and pass it where
it's needed.

Also, for now, in radiotap announce HT frames as
having a DYN channel type, and remove their rate
from cooked monitor radiotap completely (it isn't
present in the regular monitor radiotap either.)

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:17 -05:00
Johannes Berg
eb9fb5b888 mac80211: trim RX data
The RX data contains the netdev, which is
duplicated since we have the sdata, and the
RX status pointer, which is duplicate since
we have the skb. Remove those two fields to
have fewer fields that depend on each other
and simply load them as necessary.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:17 -05:00
Johannes Berg
a02ae758e8 mac80211: cleanup reorder buffer handling
The reorder buffer handling is written in a quite
peculiar style (especially comments) and also has
a quirk where it invokes the entire reorder code
in ieee80211_sta_manage_reorder_buf() for just a
handful of lines in it with a special argument.

Split out ieee80211_release_reorder_frames which
can then be invoked from BAR handling and other
reordering code, clean up code and comments and
remove function arguments that are now unused from
ieee80211_sta_manage_reorder_buf().

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:17 -05:00
Johannes Berg
af2ced6a32 mac80211: push michael MIC report after DA check
When we receive a michael MIC failure report from the
hardware we currently do not check whether it is actually
reported on a frame that is destined to us. It shouldn't
be possible to get a michael MIC failure report on other
frames, but it also doesn't hurt to verify.

Also, since we then don't need the station struct that
early, move looking it up a bit later in the RX path.

Finally, while at it, a few code cleanups in the area.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:16 -05:00
Johannes Berg
c951ad3550 mac80211: convert aggregation to operate on vifs/stas
The entire aggregation code currently operates on the
hw pointer and station addresses, but that needs to
change to make stations purely per-vif; As one step
preparing for that make the aggregation code callable
with the station, or by the combination of virtual
interface and station address.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:15 -05:00
Johannes Berg
3b53fde8ac mac80211: let sta_info_get_by_idx get sta by sdata
Instead of filtering by device, directly look up by sdata.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:14 -05:00
Felix Fietkau
3e5b1101f5 mac80211: reduce the amount of unnecessary traffic on cooked monitor interfaces
In order to handle association and authentication in AP mode,
hostapd needs access to the tx status info of its own frames
through a cooked monitor interface. Without this patch the
cooked monitor interfaces also passed on tx status information
for packets from other virtual interfaces. This creates a
significant performance issue on embedded system. Hostapd
tries to work around this by installing a Linux Socket Filter
that only captures the frames it's interested in, however
data duplication and socket filter matching still uses up
enough CPU cycles to be very noticeable on small systems.
This patch ensures that tx status information of non-injected
frames does not make it to cooked monitor interfaces.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:09:10 -05:00
Johannes Berg
8ade008246 mac80211: fix addba timer (again...)
commit 2171abc586
  Author: Johannes Berg <johannes@sipsolutions.net>
  Date:   Thu Oct 29 08:34:00 2009 +0100

      mac80211: fix addba timer

left a problem in there, even if the timer was
never started it could be deleted and then added.

Linus pointed out that del_timer_sync() isn't
actually needed if we make the timer able to
deal with no longer being needed when it gets
queued _while_ we're in the locked section that
also deletes it. For that the timer function only
needs to check the HT_ADDBA_RECEIVED_MSK bit as
well as the HT_ADDBA_REQUESTED_MSK bit, only if
the former is clear should it do anything.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-18 17:01:47 -05:00
David S. Miller
dfef948ed2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-11-18 10:55:32 -08:00
Rémi Denis-Courmont
eeb74a9d45 Phonet: convert devices list to RCU
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 10:08:26 -08:00
Eric W. Biederman
6d4561110a sysctl: Drop & in front of every proc_handler.
For consistency drop & in front of every proc_handler.  Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-18 08:37:40 -08:00
Octavian Purdila
d90310243f net: device name allocation cleanups
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 05:03:35 -08:00
Eric Dumazet
f99189b186 netns: net_identifiers should be read_mostly
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 05:03:25 -08:00
Eric Dumazet
e014debecd linkwatch: linkwatch_forget_dev() to speedup device dismantle
Herbert Xu a écrit :
> On Tue, Nov 17, 2009 at 04:26:04AM -0800, David Miller wrote:
>> Really, the link watch stuff is just due for a redesign.  I don't
>> think a simple hack is going to cut it this time, sorry Eric :-)
>
> I have no objections against any redesigns, but since the only
> caller of linkwatch_forget_dev runs in process context with the
> RTNL, it could also legally emit those events.

Thanks guys, here an updated version then, before linkwatch surgery ?

In this version, I force the event to be sent synchronously.

[PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle

time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

real	0m0.266s
user	0m0.000s
sys	0m0.001s

real	0m0.770s
user	0m0.000s
sys	0m0.000s

real	0m1.022s
user	0m0.000s
sys	0m0.000s

One problem of current schem in vlan dismantle phase is the
holding of device done by following chain :

vlan_dev_stop() ->
	netif_carrier_off(dev) ->
		linkwatch_fire_event(dev) ->
			dev_hold() ...

And __linkwatch_run_queue() runs up to one second later...

A generic fix to this problem is to add a linkwatch_forget_dev() method
to unlink the device from the list of watched devices.

dev->link_watch_next becomes dev->link_watch_list (and use a bit more memory),
to be able to unlink device in O(1).

After patch :
time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105

real    0m0.024s
user    0m0.000s
sys     0m0.000s

real    0m0.032s
user    0m0.000s
sys     0m0.001s

real    0m0.033s
user    0m0.000s
sys     0m0.000s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 05:03:11 -08:00
Octavian Purdila
e2ce146848 ipv4: factorize cache clearing for batched unregister operations
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 05:03:07 -08:00
Octavian Purdila
395264d509 net: introduce NETDEV_UNREGISTER_PERNET
This new event is called once for each unique net namespace in batched
unregister operations (with the argument set to a random device from
that namespace) and once per device in non-batched unregister
operations.

It allows us to factorize some device unregister work such as clearing the
routing cache.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-18 05:03:03 -08:00
Eric Dumazet
9793241fe9 vlan: Precise RX stats accounting
With multi queue devices, its possible that several cpus call
vlan RX routines simultaneously for the same vlan device.

We update RX stats counter without any locking, so we can
get slightly wrong counters.

One possible fix is to use percpu counters, to get precise
accounting and also get guarantee of no cache line ping pongs
between cpus.

Note: this adds 16 bytes (32 bytes on 64bit arches) of percpu
data per vlan device.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 23:51:55 -08:00
Eric Dumazet
d83345adf9 net: add dev_txq_stats_fold() helper
Some drivers ndo_get_stats() method need to perform txqueue stats folding.

Move folding from dev_get_stats() to a new dev_txq_stats_fold() function

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 23:51:52 -08:00
Eric Dumazet
6b863d1d32 vlan: Fix register_vlan_dev() error path
In case register_netdevice() returns an error, and a new vlan_group
was allocated and inserted in vlan_group_hash[] we call
vlan_group_free() without deleting group from hash table. Future
lookups can give infinite loops or crashes.

We must delete the vlan_group using RCU safe procedure.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 06:45:04 -08:00
Herbert Xu
69c0cab120 gro: Fix illegal merging of trailer trash
When we've merged skb's with page frags, and subsequently receive
a trailer skb (< MSS) that is not completely non-linear (this can
occur on Intel NICs if the packet size falls below the threshold),
GRO ends up producing an illegal GSO skb with a frag_list.

This is harmless unless the skb is then forwarded through an
interface that requires software GSO, whereupon the GSO code
will BUG.

This patch detects this case in GRO and avoids merging the
trailer skb.

Reported-by: Mark Wagner <mwagner@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 05:18:18 -08:00
Changli Gao
b76965e02b act_mirred: optimization.
move checking if eaction is valid in tcf_mirred_init()

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 04:15:38 -08:00
Changli Gao
feed1f1724 act_mirred: cleanup
1. don't let go back using goto.
2. don't call skb_act_clone() until it is necessary.
3. one exit of the critical context.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 04:15:37 -08:00
Rémi Denis-Courmont
b2a5decddb Phonet: missing rcu_dereference()
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 04:08:50 -08:00
Johannes Berg
649300b927 netlink: remove subscriptions check on notifier
The netlink URELEASE notifier doesn't notify for
sockets that have been used to receive multicast
but it should be called for such sockets as well
since they might _also_ be used for sending and
not solely for receiving multicast. We will need
that for nl80211 (generic netlink sockets) in the
future.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-17 04:08:49 -08:00
Eric W. Biederman
bb9074ff58 Merge commit 'v2.6.32-rc7'
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
2009-11-17 01:01:34 -08:00
David S. Miller
a2bfbc072e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/can/Kconfig
2009-11-17 00:05:02 -08:00
Jouni Malinen
b23709248f mac80211: Do not queue Probe Request frames for station MLME
Cooked monitor interfaces cannot currently receive Probe Request
frames when the interface is in station mode. However, we do not
process Probe Request frames internally in the station MLME, so there
is no point in queueing the frame here. Remove Probe Request frames
from the queued frame list to allow cooked monitor interfaces to
receive these frames.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-16 14:17:14 -05:00
Eric Dumazet
91e9c07bd6 net: Fix the rollback test in dev_change_name()
net: Fix the rollback test in dev_change_name()

In dev_change_name() an err variable is used for storing the original
call_netdevice_notifiers() errno (negative) and testing for a rollback
error later, but the test for non-zero is wrong, because the err might
have positive value as well - from dev_alloc_name(). It means the
rollback for a netdevice with a number > 0 will never happen. (The err
test is reordered btw. to make it more readable.)

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-16 03:30:35 -08:00
Marin Mitov
b9f5d52670 remove deprecated and not used: print_mac()
The function print_mac in net/ethernet/eth.c is marked __deprecated
and not used. Remove it.

Signed-off-by: Marin Mitov <mitov@issp.bas.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-15 22:21:34 -08:00
Eric Dumazet
b93ab837a2 vlan: Use __vlan_hwaccel_put_tag() in rx
Commit 05423b2413 (vlan: allow null VLAN ID to be used)
forgot to update __vlan_hwaccel_rx() & vlan_gro_common()

We need to set VLAN_TAG_PRESENT flag in skb->vlan_tci

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-15 22:21:33 -08:00
Jarek Poplawski
9a1654ba0b net: Optimize hard_start_xmit() return checking
Recent changes in the TX error propagation require additional checking
and masking of values returned from hard_start_xmit(), mainly to
separate cases where skb was consumed. This aim can be simplified by
changing the order of NETDEV_TX and NET_XMIT codes, because the latter
are treated similarly to negative (ERRNO) values.

After this change much simpler dev_xmit_complete() is also used in
sch_direct_xmit(), so it is moved to netdevice.h.

Additionally NET_RX definitions in netdevice.h are moved up from
between TX codes to avoid confusion while reading the TX comment.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-15 22:08:33 -08:00
Eric Dumazet
ed04642f75 net: check the return value of ndo_select_queue()
Check the return value of ndo_select_queue(). If the value isn't smaller
than the real_num_tx_queues, print a warning message, and reset it to zero.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
----
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-15 22:08:05 -08:00
David S. Miller
eaa04dc353 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2009-11-15 20:59:34 -08:00
Gustavo F. Padovan
68ae6639b6 Bluetooth: Fix regression with L2CAP configuration in Basic Mode
Basic Mode is the default mode of operation of a L2CAP entity. In
this case the RFC (Retransmission and Flow Control) configuration
option should not be used at all.

Normally remote L2CAP implementation should just ignore this option,
but it can cause various side effects with other Bluetooth stacks
that are not capable of handling unknown options.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-11-16 01:31:41 +01:00
Gustavo F. Padovan
a0e55a32af Bluetooth: Select Basic Mode as default for SOCK_SEQPACKET
The default mode for SOCK_SEQPACKET is Basic Mode. So when no
mode has been specified, Basic Mode shall be used.

This is important for current application to keep working as
expected and not cause a regression.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-11-16 01:31:16 +01:00
Andrei Emeltchenko
93f19c9fc8 Bluetooth: Set general bonding security for ACL by default
This patch fixes double pairing issues with Secure Simple
Paring support. It was observed that when pairing with SSP
enabled, that the confirmation will be asked twice.

http://www.spinics.net/lists/linux-bluetooth/msg02473.html

This also causes bug when initiating SSP connection from
Windows Vista.

The reason is because bluetoothd does not store link keys
since HCIGETAUTHINFO returns 0. Setting default to general
bonding fixes these issues.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2009-11-16 01:30:28 +01:00
David S. Miller
958fc41e32 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan 2009-11-14 20:24:30 -08:00
Rémi Denis-Courmont
888801357f Phonet: convert routing table to RCU
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:47:02 -08:00
Rémi Denis-Courmont
7ed0132f23 Phonet: put protocols array under RCU
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:47:01 -08:00
Ursula Braun
b7c2aecc07 iucv: add work_queue cleanup for suspend
If iucv_work_queue is not empty during kernel freeze, a kernel panic
occurs. This suspend-patch adds flushing of the work queue for
pending connection requests and severing of remaining pending
connections.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:46:58 -08:00
Eric Dumazet
2c1409a0a2 inetpeer: Optimize inet_getid()
While investigating for network latencies, I found inet_getid() was a
contention point for some workloads, as inet_peer_idlock is shared
by all inet_getid() users regardless of peers.

One way to fix this is to make ip_id_count an atomic_t instead
of __u16, and use atomic_add_return().

In order to keep sizeof(struct inet_peer) = 64 on 64bit arches
tcp_ts_stamp is also converted to __u32 instead of "unsigned long".

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:46:58 -08:00
Eric Dumazet
234b27c3fd ipv6: speedup inet6_dump_addr()
When handling large number of netdevices, inet6_dump_addr()
is very slow because it has O(N^2) complexity.

Instead of scanning one single list, we can use the NETDEV_HASHENTRIES
sub lists of the dev_index hash table, and RCU lookups.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:46:57 -08:00
Eric Dumazet
eec4df9885 ipv4: speedup inet_dump_ifaddr()
Stephen Hemminger a écrit :
> On Thu, 12 Nov 2009 15:11:36 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> When handling large number of netdevices, inet_dump_ifaddr()
>> is very slow because it has O(N^2) complexity.
>>
>> Instead of scanning one single list, we can use the NETDEV_HASHENTRIES
>> sub lists of the dev_index hash table, and RCU lookups.
>>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> You might be able to make RCU critical section smaller by moving
> it into loop.
>

Indeed. But we dump at most one skb (<= 8192 bytes ?), so rcu_read_lock
holding time is small, unless we meet many netdevices without
addresses. I wonder if its really common...

Thanks

[PATCH net-next-2.6] ipv4: speedup inet_dump_ifaddr()

When handling large number of netdevices, inet_dump_ifaddr()
is very slow because it has O(N2) complexity.

Instead of scanning one single list, we can use the NETDEV_HASHENTRIES
sub lists of the dev_index hash table, and RCU lookups.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:46:55 -08:00
Eric Dumazet
6baff15037 igmp: Use next_net_device_rcu()
We need to use next_det_device_rcu() in RCU protected section.

We also can avoid in_dev_get()/in_dev_put() overhead (code size mainly)
in rcu_read_lock() sections.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:38:49 -08:00
Eric Dumazet
ce81b76a39 ipv6: use RCU to walk list of network devices
No longer need read_lock(&dev_base_lock), use RCU instead.
We also can avoid taking references on inet6_dev structs.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:38:49 -08:00
William Allen Simpson
bee7ca9ec0 net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED
Define two symbols needed in both kernel and user space.

Remove old (somewhat incorrect) kernel variant that wasn't used in
most cases.  Default should apply to both RMSS and SMSS (RFC2581).

Replace numeric constants with defined symbols.

Stand-alone patch, originally developed for TCPCT.

Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 20:38:48 -08:00
Dan Carpenter
d0490cfdf4 ipmr: missing dev_put() on error path in vif_add()
The other error paths in front of this one have a dev_put() but this one
got missed.

Found by smatch static checker.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Wang Chen <ellre923@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 19:56:54 -08:00
Vlad Yasevich
a78102e74e sctp: Set socket source address when additing first transport
Recent commits
	sctp: Get rid of an extra routing lookup when adding a transport
and
	sctp: Set source addresses on the association before adding transports

changed when routes are added to the sctp transports.  As such,
we didn't set the socket source address correctly when adding the first
transport.  The first transport is always the primary/active one, so
when adding it, set the socket source address.  This was causing
regression failures in SCTP tests.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 19:56:52 -08:00
Vlad Yasevich
f9c67811eb sctp: Fix regression introduced by new sctp_connectx api
A new (unrealeased to the user) sctp_connectx api

c6ba68a266
    sctp: support non-blocking version of the new sctp_connectx() API

introduced a regression cought by the user regression test
suite.  In particular, the API requires the user library to
re-allocate the buffer and could potentially trigger a SIGFAULT.

This change corrects that regression by passing the original
address buffer to the kernel unmodified, but still allows for
a returned association id.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 19:56:51 -08:00
Vlad Yasevich
409b95aff3 sctp: Set source addresses on the association before adding transports
Recent commit 8da645e101
	sctp: Get rid of an extra routing lookup when adding a transport
introduced a regression in the connection setup.  The behavior was

different between IPv4 and IPv6.  IPv4 case ended up working because the
route lookup routing returned a NULL route, which triggered another
route lookup later in the output patch that succeeded.  In the IPv6 case,
a valid route was returned for first call, but we could not find a valid
source address at the time since the source addresses were not set on the
association yet.  Thus resulted in a hung connection.

The solution is to set the source addresses on the association prior to
adding peers.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 19:56:50 -08:00
Chuck Lever
1e360a60b2 SUNRPC: Address buffer overrun in rpc_uaddr2sockaddr()
The size of buf[] must account for the string termination needed for
the first strict_strtoul() call.  Introduced in commit a02d6926.

Fábio Olivé Leite points out that strict_strtoul() requires _either_
'\n\0' _or_ '\0' termination, so use the simpler '\0' here instead.

See http://bugzilla.kernel.org/show_bug.cgi?id=14546 .

Reported-by: argp@census-labs.com
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Fábio Olivé Leite <fleite@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-11-14 08:17:04 +09:00
Felix Fietkau
c258d2de97 nl80211: only allow adding stations to running vlan interfaces
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:59 -05:00
Felix Fietkau
f501dba4c4 mac80211: fix broadcast frame handling for 4-addr AP VLANs
Without this patch, broadcast frames from the station behind a
4-addr AP VLAN would be reflected back to the source.
Fix this by checking the 4-addr flag before bridging multicast
frames in the cell.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:59 -05:00
Holger Schurig
61fa713c75 cfg80211: return channel noise via survey API
This patch implements the NL80211_CMD_GET_SURVEY command and an get_survey()
ops that a driver can implement. The goal of this command is to allow a
drivers to report channel survey data (e.g. channel noise, channel
occupation).

For now, only the mechanism to report back channel noise has been
implemented.

In future, there will either be a survey-trigger command --- or the existing
scan-trigger command will be enhanced. This will allow user-space to
request survey for arbitrary channels.

Note: any driver that cannot report channel noise should not report
any value at all, e.g. made-up -92 dBm.

Signed-off-by: Holger Schurig <holgerschurig@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:58 -05:00
Holger Schurig
a043897a31 cfg80211: introduce nl80211_get_ifidx()
... which get's rid of three indentical cut-n-paste sections.

Signed-off-by: Holger Schurig <holgerschurig@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:58 -05:00
Rui Paulo
264d9b7d8a mac80211: update copyrights to 2009
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:57 -05:00
Rui Paulo
63c5723bc3 mac80211: add nl80211/cfg80211 handling of the new mesh root mode option.
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:57 -05:00
Rui Paulo
e304bfd30f mac80211: implement a timer to send RANN action frames
RANN (Root Annoucement) frame TX. Send an action frame every second
trying to build a path to all nodes on the mesh.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:56 -05:00
Rui Paulo
d19b3bf638 mac80211: replace "destination" with "target" to follow the spec
Resulting object files have the same MD5 as before.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:56 -05:00
Rui Paulo
be125c60e4 mac80211: add the DS params to the beacon
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:56 -05:00
Rui Paulo
36f0d5f537 mac80211: fix BSSID setup for beacon frames
BSSID is now set to the TA.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:55 -05:00
Rui Paulo
77fa76bb7f mac80211: set the AID field correctly for mesh peer frames
This sets the AID field correctly for mesh peer confirm frames.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:55 -05:00
Rui Paulo
a6a58b4f14 mac80211: properly forward the RANN IE
Increase hopcount and convert metric to LE before forwarding the RANN
action frame.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:55 -05:00
Rui Paulo
d611f062f4 mac80211: update PERR frame format
Update the PERR IE frame format according to latest draft (3.03).

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:54 -05:00
Rui Paulo
90a5e16992 mac80211: implement RANN processing and forwarding
Process the RANN (Root Annoucement) Frame and try to find the HWMP
root station by sending a PREQ.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-13 17:43:54 -05:00
Patrick McHardy
cbbef5e183 vlan/macvlan: propagate transmission state to upper layers
Both vlan and macvlan devices usually don't use a qdisc and immediately
queue packets to the underlying device. Propagate transmission state of
the underlying device to the upper layers so they can react on congestion
and/or inform the sending process.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 14:07:33 -08:00
Patrick McHardy
572a9d7b6f net: allow to propagate errors through ->ndo_hard_start_xmit()
Currently the ->ndo_hard_start_xmit() callbacks are only permitted to return
one of the NETDEV_TX codes. This prevents any kind of error propagation for
virtual devices, like queue congestion of the underlying device in case of
layered devices, or unreachability in case of tunnels.

This patches changes the NET_XMIT codes to avoid clashes with the NETDEV_TX
codes and changes the two callers of dev_hard_start_xmit() to expect either
errno codes, NET_XMIT codes or NETDEV_TX codes as return value.

In case of qdisc_restart(), all non NETDEV_TX codes are mapped to NETDEV_TX_OK
since no error propagation is possible when using qdiscs. In case of
dev_queue_xmit(), the error is propagated upwards.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 14:07:32 -08:00
Ilpo Järvinen
d792c1006f tcp: provide more information on the tcp receive_queue bugs
The addition of rcv_nxt allows to discern whether the skb
was out of place or tp->copied. Also catch fancy combination
of flags if necessary (sadly we might miss the actual causer
flags as it might have already returned).

Btw, we perhaps would want to forward copied_seq in
somewhere or otherwise we might have some nice loop with
WARN stuff within but where to do that safely I don't
know at this stage until more is known (but it is not
made significantly worse by this patch).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-13 13:56:33 -08:00
Wu Fengguang
7378396cd1 netfilter: nf_log: fix sleeping function called from invalid context in seq_show()
[  171.925285] BUG: sleeping function called from invalid context at kernel/mutex.c:280
[  171.925296] in_atomic(): 1, irqs_disabled(): 0, pid: 671, name: grep
[  171.925306] 2 locks held by grep/671:
[  171.925312]  #0:  (&p->lock){+.+.+.}, at: [<c10b8acd>] seq_read+0x25/0x36c
[  171.925340]  #1:  (rcu_read_lock){.+.+..}, at: [<c1391dac>] seq_start+0x0/0x44
[  171.925372] Pid: 671, comm: grep Not tainted 2.6.31.6-4-netbook #3
[  171.925380] Call Trace:
[  171.925398]  [<c105104e>] ? __debug_show_held_locks+0x1e/0x20
[  171.925414]  [<c10264ac>] __might_sleep+0xfb/0x102
[  171.925430]  [<c1461521>] mutex_lock_nested+0x1c/0x2ad
[  171.925444]  [<c1391c9e>] seq_show+0x74/0x127
[  171.925456]  [<c10b8c5c>] seq_read+0x1b4/0x36c
[  171.925469]  [<c10b8aa8>] ? seq_read+0x0/0x36c
[  171.925483]  [<c10d5c8e>] proc_reg_read+0x60/0x74
[  171.925496]  [<c10d5c2e>] ? proc_reg_read+0x0/0x74
[  171.925510]  [<c10a4468>] vfs_read+0x87/0x110
[  171.925523]  [<c10a458a>] sys_read+0x3b/0x60
[  171.925538]  [<c1002a49>] syscall_call+0x7/0xb

Fix it by replacing RCU with nf_log_mutex.

Reported-by: "Yin, Kangkai" <kangkai.yin@intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-13 09:34:44 +01:00
Roel Kluin
1c622ae67b netfilter: xt_osf: fix xt_osf_remove_callback() return value
Return a negative error value.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-13 09:31:35 +01:00
Dmitry Eremin-Solenikov
282a39546f ieee802154: make wpan-phy class registration to subsys_initcall
Move ieee802154 initialisation to subsys_initcall call, so that
wpan-phy class is initialised before all devices (thus saving us from
oops during bootup).

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-13 00:07:15 +03:00
Eric W. Biederman
f8572d8f2a sysctl net: Remove unused binary sysctl code
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.

In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.

Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2009-11-12 02:05:06 -08:00
Arnd Bergmann
805003a41c net/atm: move all compat_ioctl handling to atm/ioctl.c
We have two implementations of the compat_ioctl handling for ATM, the
one that we have had for ages in fs/compat_ioctl.c and the one added to
net/atm/ioctl.c by David Woodhouse. Unfortunately, both versions are
incomplete, and in practice we use a very confusing combination of the
two.

For ioctl numbers that have the same identifier on 32 and 64 bit systems,
we go directly through the compat_ioctl socket operation, for those that

differ, we do a conversion in fs/compat_ioctl.c.

This patch moves both variants into the vcc_compat_ioctl() function,
while preserving the current behaviour. It also kills off the COMPATIBLE_IOCTL
definitions that we never use here.
Doing it this way is clearly not a good solution, but I hope it is a
step into the right direction, so that someone is able to clean up this
mess for real.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11 19:22:23 -08:00
Arnd Bergmann
a2116ed223 net/compat: fix dev_ifsioc emulation corner cases
Handling for SIOCSHWTSTAMP is broken on architectures
with a split user/kernel address space like s390,
because it passes a real user pointer while using
set_fs(KERNEL_DS).
A similar problem might arise the next time somebody
adds code to dev_ifsioc.

Split up dev_ifsioc into three separate functions for
SIOCSHWTSTAMP, SIOC*IFMAP and all other numbers so
we can get rid of set_fs in all potentially affected
cases.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Patrick Ohly <patrick.ohly@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11 19:22:22 -08:00
stephen hemminger
e5c140a340 decnet: convert dndev_lock to spinlock
There is no reason for this lock to be reader/writer since
the reader only has lock held for a very brief period.
The overhead of read_lock is more expensive than spinlock.

Compile tested only, I am not a decnet user.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11 19:22:18 -08:00
stephen hemminger
41bdecf17e decnet: add RTNL lock when reading address list
Add missing locking in the case of auto binding to the
default device. The address list might change while this code is looking
at the list.

Compile tested only, I am not a decnet user.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11 19:22:15 -08:00
stephen hemminger
08e9897d51 netdev: fold name hash properly (v3)
The full_name_hash function does not produce well distributed values in
the lower bits, so most code uses hash_32() to fold it.  This is really
a bug introduced when name hashing was added, back in 2.5 when I added
name hashing.

hash_32 is all that is needed since full_name_hash returns unsigned int
which is only 32 bits on 64 bit platforms.

Also, there is no point in using hash_32 on ifindex, because the is naturally
sequential and usually well distributed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11 19:22:12 -08:00
Anton Vorontsov
e84af6ddef skbuff: Do not allow skb recycling with disabled IRQs
NAPI drivers try to recycle SKBs in their polling routine, but we
generally don't know the context in which the polling will be called,
and the skb recycling itself may require IRQs to be enabled.

This patch adds irqs_disabled() test to the skb_recycle_check()
routine, so that we'll not let the drivers hit the skb recycling
path with IRQs disabled.

As a side effect, this patch actually disables skb recycling for some
[broken] drivers. E.g. gianfar driver grabs an irqsave spinlock during
TX ring processing, and then tries to recycle an skb, and that caused
the following badness:

nf_conntrack version 0.5.0 (1008 buckets, 4032 max)
------------[ cut here ]------------
Badness at kernel/softirq.c:143
NIP: c003e3c4 LR: c423a528 CTR: c003e344
...
NIP [c003e3c4] local_bh_enable+0x80/0xc4
LR [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack]
Call Trace:
[c15d1b60] [c003e32c] local_bh_disable+0x1c/0x34 (unreliable)
[c15d1b70] [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack]
[c15d1b80] [c02c6370] nf_conntrack_destroy+0x3c/0x70

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11 19:03:28 -08:00
David S. Miller
434a8a58d7 ipv6: Remove unused var in inet6_dump_ifinfo()
Reported by Stephen Rothwell:

--------------------
Today's linux-next build (x86_64 allmodconfig) produced this warning:

net/ipv6/addrconf.c: In function 'inet6_dump_ifinfo':
net/ipv6/addrconf.c:3833: warning: unused variable 'err'

Introduced by commit 84d2697d96 ("ipv6:
speedup inet6_dump_ifinfo()").
--------------------

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-11 18:53:00 -08:00
Luis R. Rodriguez
e5d6eb8305 mac80211: fix max HT rate processing on mac80211
The max MCS index is 76, fix the higher check to allow through
frames received at MCS 76. This is a non-issue for current drivers
as MCS 76 is only possible with a device supporting 4 spatial
streams.

While at it change the WARN_ON() on invalid HT rates to a WARN()
to provide more useful information. This will help debug issues
when the driver is passing up a bogus HT rate value.

The rate must map to a valid MCS index which can be any of the
values in the set [0 - 76] (inclusive).

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 17:09:18 -05:00
Felix Fietkau
f14543ee4d mac80211: implement support for 4-address frames for AP and client mode
In some situations it might be useful to run a network with an
Access Point and multiple clients, but with each client bridged
to a network behind it. For this to work, both the client and the
AP need to transmit 4-address frames, containing both source and
destination MAC addresses.
With this patch, you can configure a client to communicate using
only 4-address frames for data traffic.
On the AP side you can enable 4-address frames for individual
clients by isolating them in separate AP VLANs which are configured
in 4-address mode.
Such an AP VLAN will be limited to one client only, and this client
will be used as the destination for all traffic on its interface,
regardless of the destination MAC address in the packet headers.
The advantage of this mode compared to regular WDS mode is that it's
easier to configure and does not require a static list of peer MAC
addresses on any side.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 17:02:10 -05:00
Felix Fietkau
8b787643ca nl80211: add a parameter for using 4-address frames on virtual interfaces
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 17:02:07 -05:00
Rui Paulo
1460dd158a mac80211: improve peer link management debugging
Print the FSM state strings instead of just the numbers.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:59 -05:00
Rui Paulo
f3c0d88a7f mac80211: improve HWMP debugging
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:59 -05:00
Rui Paulo
dbb81c428b mac80211: allow processing of more than one HWMP IE
Since the HWMP IEs are now all optional and the action code is fixed,
allow the HWMP code to find and process each IE on the path
selection action frames.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <rpaulo@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:58 -05:00
Rui Paulo
27db2e423f mac80211: add MAC80211_VERBOSE_MHWMP_DEBUG
Add MAC80211_VERBOSE_MHWMP_DEBUG, a debugging option for HWMP
frame processing.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:58 -05:00
Rui Paulo
095de01325 mac80211: update the format of path selection frames
Update the format of path selection frames according to latest
draft (3.03).

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:58 -05:00
Rui Paulo
0938393f02 mac80211: update peer link management IE and action frames
Update the length and format of the peer link management action frames.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:57 -05:00
Rui Paulo
23c7a29cd0 mac80211: fix typo in a comment
Signed-off-by: Javier Cardona <javier@cozybit.com>
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:57 -05:00
Rui Paulo
8f2fda9594 mac80211: implement the meshconf formation info field
The Mesh Configuration Formation Info field contains the number of
neighbors.  This means that the beacon must be updated every time a
peer joins or leaves.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <rpaulo@gmail.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:57 -05:00
Rui Paulo
a1935218da mac80211: set MESH_TTL to 31
Update the mesh time to live field to 31 according to draft 3.03.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:56 -05:00
Rui Paulo
3491707a07 mac80211: update meshconf IE
This updates the Mesh Configuration IE according to the latest
draft (3.03).
Notable changes include the simplified protocol IDs.

Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-11 15:23:56 -05:00
stephen hemminger
ff879eb611 CAN: use dev_get_by_index_rcu
Use new function to avoid doing read_lock().

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 22:27:13 -08:00
stephen hemminger
61fbab77a8 IPV4: use rcu to walk list of devices in IGMP
This also needs to be optimized for large number of devices.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 22:27:12 -08:00
stephen hemminger
fa918602b6 decnet: use RCU to find network devices
When showing device statistics use RCU rather than read_lock(&dev_base_lock)
Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 22:26:31 -08:00
stephen hemminger
f1e9016da6 net: use rcu for network scheduler API
Use RCU to walk list of network devices in qdisc dump.
This could be optimized for large number of devices.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 22:26:30 -08:00
stephen hemminger
9e067597ee vlan: eliminate use of dev_base_lock
Do not need to use read_lock(&dev_base_lock), use RCU instead.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 22:26:30 -08:00
Brian Haley
856540ee31 IPv6: use ipv6_addr_v4mapped()
Change udp6_portaddr_hash() to use ipv6_addr_v4mapped()
inline instead of ipv6_addr_type().

Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:44 -08:00
Herbert Xu
292f4f3ce4 sit: Clean up DF code by copying from IPIP
This patch rearranges the SIT DF bit handling using the new IPIP DF
code.  The only externally visible effect should be the case where
PMTU is enabled and the MTU is exactly 1280 bytes.  In this case the
previous code would send packets out with DF off while the new code
would set the DF bit.  This is inline with RFC 4213.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Thanks,
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:43 -08:00
Eric Dumazet
bcd323262a ipv6: Allow inet6_dump_addr() to handle more than 64 addresses
Apparently, inet6_dump_addr() is not able to handle more than
64 ipv6 addresses per device. We must break from inner loops
in case skb is full, or else cursor is put at the end of list.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:42 -08:00
Eric Dumazet
84d2697d96 ipv6: speedup inet6_dump_ifinfo()
When handling large number of netdevice, inet6_dump_ifinfo()
is very slow because it has O(N^2) complexity.

Instead of scanning one single list, we can use the 256 sub lists
of the dev_index hash table, and RCU lookups.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:41 -08:00
Cyrill Gorcunov
13cfa97bef net: netlink_getname, packet_getname -- use DECLARE_SOCKADDR guard
Use guard DECLARE_SOCKADDR in a few more places which allow
us to catch if the structure copied back is too big.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:41 -08:00
Eric Dumazet
30fff9231f udp: bind() optimisation
UDP bind() can be O(N^2) in some pathological cases.

Thanks to secondary hash tables, we can make it O(N)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:38 -08:00
Rémi Denis-Courmont
b1704374fd Phonet: allocate and copy for pipe TX without sock lock
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:34 -08:00
Rémi Denis-Courmont
6b0d07ba15 Phonet: put sockets in a hash table
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-10 20:54:33 -08:00
David S. Miller
f6d773cd4f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-11-09 11:17:24 -08:00
Linus Torvalds
1ce55238e2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  net/fsl_pq_mdio: add module license GPL
  can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo()
  can: should not use __dev_get_by_index() without locks
  hisax: remove bad udelay call to fix build error on ARM
  ipip: Fix handling of DF packets when pmtudisc is OFF
  qlge: Set PCIe reset type for EEH to fundamental.
  qlge: Fix early exit from mbox cmd complete wait.
  ixgbe: fix traffic hangs on Tx with ioatdma loaded
  ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled
  ixgbe: Fix gso_max_size for 82599 when DCB is enabled
  macsonic: fix crash on PowerBook 520
  NET: cassini, fix lock imbalance
  ems_usb: Fix byte order issues on big endian machines
  be2net: Bug fix to send config commands to hardware after netdev_register
  be2net: fix to set proper flow control on resume
  netfilter: xt_connlimit: fix regression caused by zero family value
  rt2x00: Don't queue ieee80211 work after USB removal
  Revert "ipw2200: fix oops on missing firmware"
  decnet: netdevice refcount leak
  netfilter: nf_nat: fix NAT issue in 2.6.30.4+
  ...
2009-11-09 09:51:42 -08:00
Dirk Hohndel
06fe9fb418 tree-wide: fix a very frequent spelling mistake
something-bility is spelled as something-blity
so a grep for 'blit' would find these lines

this is so trivial that I didn't split it by subsystem / copy
additional maintainers - all changes are to comments
The only purpose is to get fewer false positives when grepping
around the kernel sources.

Signed-off-by: Dirk Hohndel <hohndel@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-11-09 09:40:54 +01:00
David S. Miller
d0e1e88d6e Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/can/usb/ems_usb.c
2009-11-08 23:00:54 -08:00
Yury Polyanskiy
9e0d57fd6d xfrm: SAD entries do not expire correctly after suspend-resume
This fixes the following bug in the current implementation of
net/xfrm: SAD entries timeouts do not count the time spent by the machine 
in the suspended state. This leads to the connectivity problems because 
after resuming local machine thinks that the SAD entry is still valid, while 
it has already been expired on the remote server.

  The cause of this is very simple: the timeouts in the net/xfrm are bound to 
the old mod_timer() timers. This patch reassigns them to the
CLOCK_REALTIME hrtimer.

  I have been using this version of the patch for a few months on my
machines without any problems. Also run a few stress tests w/o any
issues.

  This version of the patch uses tasklet_hrtimer by Peter Zijlstra
(commit 9ba5f0).

  This patch is against 2.6.31.4. Please CC me.

Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:58:41 -08:00
Arnd Bergmann
7a50a240c4 net/compat_ioctl: support SIOCWANDEV
This adds compat_ioctl support for SIOCWANDEV, which has
always been missing.

The definition of struct compat_ifreq was missing an
ifru_settings fields that is needed to support SIOCWANDEV,
so add that and clean up the whitespace damage in the
struct definition.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:57:03 -08:00
Arnd Bergmann
fab2532ba5 net, compat_ioctl: fix SIOCGMII ioctls
SIOCGMIIPHY and SIOCGMIIREG return data through ifreq,
so it needs to be converted on the way out as well.

SIOCGIFPFLAGS is unused, but has the same problem in theory.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:56:21 -08:00
Eric Dumazet
f6b8f32ca7 udp: multicast RX should increment SNMP/sk_drops counter in allocation failures
When skb_clone() fails, we should increment sk_drops and SNMP counters.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:10 -08:00
Eric Dumazet
a1ab77f97e ipv6: udp: Optimise multicast reception
IPV6 UDP multicast rx path is a bit complex and can hold a spinlock
for a long time.

Using a small (32 or 64 entries) stack of socket pointers can help
to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
outside of the lock, in most cases.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:09 -08:00
Eric Dumazet
1240d1373c ipv4: udp: Optimise multicast reception
UDP multicast rx path is a bit complex and can hold a spinlock
for a long time.

Using a small (32 or 64 entries) stack of socket pointers can help
to perform expensive operations (skb_clone(), udp_queue_rcv_skb())
outside of the lock, in most cases.

It's also a base for a future RCU conversion of multicast recption.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:08 -08:00
Eric Dumazet
fddc17defa ipv6: udp: optimize unicast RX path
We first locate the (local port) hash chain head
If few sockets are in this chain, we proceed with previous lookup algo.

If too many sockets are listed, we take a look at the secondary
(port, address) hash chain.

We choose the shortest chain and proceed with a RCU lookup on the elected chain.

But, if we chose (port, address) chain, and fail to find a socket on given address,
 we must try another lookup on (port, in6addr_any) chain to find sockets not bound
to a particular IP.

-> No extra cost for typical setups, where the first lookup will probabbly
be performed.

RCU lookups everywhere, we dont acquire spinlock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:07 -08:00
Eric Dumazet
5051ebd275 ipv4: udp: optimize unicast RX path
We first locate the (local port) hash chain head
If few sockets are in this chain, we proceed with previous lookup algo.

If too many sockets are listed, we take a look at the secondary
(port, address) hash chain we added in previous patch.

We choose the shortest chain and proceed with a RCU lookup on the elected chain.

But, if we chose (port, address) chain, and fail to find a socket on given address,
 we must try another lookup on (port, INADDR_ANY) chain to find socket not bound
to a particular IP.

-> No extra cost for typical setups, where the first lookup will probabbly
be performed.

RCU lookups everywhere, we dont acquire spinlock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:07 -08:00
Eric Dumazet
512615b6b8 udp: secondary hash on (local port, local address)
Extends udp_table to contain a secondary hash table.

socket anchor for this second hash is free, because UDP
doesnt use skc_bind_node : We define an union to hold
both skc_bind_node & a new hlist_nulls_node udp_portaddr_node

udp_lib_get_port() inserts sockets into second hash chain
(additional cost of one atomic op)

udp_lib_unhash() deletes socket from second hash chain
(additional cost of one atomic op)

Note : No spinlock lockdep annotation is needed, because
lock for the secondary hash chain is always get after
lock for primary hash chain.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:06 -08:00
Eric Dumazet
d4cada4ae1 udp: split sk_hash into two u16 hashes
Union sk_hash with two u16 hashes for udp (no extra memory taken)

One 16 bits hash on (local port) value (the previous udp 'hash')

One 16 bits hash on (local address, local port) values, initialized
but not yet used. This second hash is using jenkin hash for better
distribution.

Because the 'port' is xored later, a partial hash is performed
on local address + net_hash_mix(net)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:05 -08:00
Eric Dumazet
fdcc8aa953 udp: add a counter into udp_hslot
Adds a counter in udp_hslot to keep an accurate count
of sockets present in chain.

This will permit to upcoming UDP lookup algo to chose
the shortest chain when secondary hash is added.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:53:04 -08:00
Stephen Rothwell
415ce61aef net/appletalk: using compat_ptr needs inclusion of linux/compat.h
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 20:41:03 -08:00
Eric W. Biederman
81adee47df net: Support specifying the network namespace upon device creation.
There is no good reason to not support userspace specifying the
network namespace during device creation, and it makes it easier
to create a network device and pass it to a child network namespace
with a well known name.

We have to be careful to ensure that the target network namespace
for the new device exists through the life of the call.  To keep
that logic clear I have factored out the network namespace grabbing
logic into rtnl_link_get_net.

In addtion we need to continue to pass the source network namespace
to the rtnl_link_ops.newlink method so that we can find the base
device source network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2009-11-08 00:53:51 -08:00
Joe Perches
f7a3a1d8af appletalk/ddp.c: Neaten checksum function
atalk_sum_partial can now use the rol16 function in bitops.h

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 00:43:19 -08:00
Eric Dumazet
fd5c002761 ipv6: avoid dev_hold()/dev_put() in rawv6_bind()
Using RCU helps not touching device refcount in rawv6_bind()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 00:43:18 -08:00
Eric Dumazet
6755aebaaf can: should not use __dev_get_by_index() without locks
bcm_proc_getifname() is called with RTNL and dev_base_lock
not held. It calls __dev_get_by_index() without locks, and
this is illegal (might crash)

Close the race by holding dev_base_lock and copying dev->name
in the protected section.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-08 00:33:43 -08:00
Eric Dumazet
e0d087af72 rtnetlink: Cleanups
Pure cleanups patch

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07 01:26:17 -08:00
Arnd Bergmann
91774904fb net/x25: push BKL usage into x25_proto
The x25 driver uses lock_kernel() implicitly through
its proto_ops wrapper. The makes the usage explicit
in order to get rid of that wrapper and to better document
the usage of the BKL.

The next step should be to get rid of the usage of the BKL
in x25 entirely, which requires understanding what data
structures need serialized accesses.

Cc: Henner Eisen <eis@baty.hanse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-x25@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07 00:46:40 -08:00
Arnd Bergmann
58a9d73202 net/irda: push BKL into proto_ops
The irda driver uses the BKL implicitly in its protocol
operations. Replace the wrapped proto_ops with explicit
lock_kernel() calls makes the usage more obvious and
shrinks the size of the object code.

The calls t lock_kernel() should eventually all be replaced
by other serialization methods, which requires finding out

The calls t lock_kernel() should eventually all be replaced
by other serialization methods, which requires finding out
which data actually needs protection.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07 00:46:39 -08:00
Arnd Bergmann
83927ba069 net/ipx: push down BKL into a ipx_dgram_ops
Making the BKL usage explicit in ipx makes it more
obvious where it is used, reduces code size and helps
getting rid of the BKL in common code.

I did not analyse how to kill lock_kernel from ipx
entirely, this will involve either proving that it's not
needed, or replacing with a proper mutex or spinlock,
after finding out which data structures are protected
by the lock.

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07 00:46:39 -08:00
Arnd Bergmann
ecced8ba87 net/appletalk: push down BKL into a atalk_dgram_ops
Making the BKL usage explicit in appletalk makes it more
obvious where it is used, reduces code size and helps
getting rid of the BKL in common code.

I did not analyse how to kill lock_kernel from appletalk
entirely, this will involve either proving that it's not
needed, or replacing with a proper mutex or spinlock,
after finding out which data structures are protected
by the lock.

Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07 00:46:37 -08:00
Thomas Gleixner
d3bcfefaca net: Replace old style lock initializer
SPIN_LOCK_UNLOCKED is deprecated. Use DEFINE_SPINLOCK instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-07 00:46:34 -08:00
Arnd Bergmann
9177efd399 net, compat_ioctl: handle more ioctls correctly
The MII ioctls and SIOCSIFNAME need to go through ifsioc conversion,
which they never did so far. Some others are not implemented in the
native path, so we can just return -EINVAL directly.

Add IFSLAVE ioctls to the EINVAL list and move it to the end to
optimize the code path for the common case.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 23:11:15 -08:00
Arnd Bergmann
6b96018b28 compat: move sockios handling to net/socket.c
This removes the original socket compat_ioctl code
from fs/compat_ioctl.c and converts the code from the copy
in net/socket.c into a single function. We add a few cycles
of runtime to compat_sock_ioctl() with the long switch()
statement, but gain some cycles in return by simplifying
the call chain to get there.

Due to better inlining, save 1.5kb of object size in the
process, and enable further savings:

before:
   text    data     bss     dec     hex filename
  13540   18008    2080   33628    835c obj/fs/compat_ioctl.o
  14565     636      40   15241    3b89 obj/net/socket.o

after:
   text    data     bss     dec     hex filename
   8916   15176    2080   26172    663c obj/fs/compat_ioctl.o
  20725     636      40   21401    5399 obj/net/socket.o

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 23:10:54 -08:00
Arnd Bergmann
2066022177 appletalk: handle SIOCATALKDIFADDR compat ioctl
We must not have a compat ioctl handler for SIOCATALKDIFADDR
in common code, because the same number is used in other protocols
with different data structures.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 23:01:14 -08:00
Arnd Bergmann
7a229387d3 net: copy socket ioctl code to net/socket.h
This makes an identical copy of the socket compat_ioctl code
from fs/compat_ioctl.c to net/socket.c, as a preparation
for moving the functionality in a way that can be easily
reviewed.

The code is hidden inside of #if 0 and gets activated in the
patch that will make it work.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 23:00:29 -08:00
Herbert Xu
23ca0c989e ipip: Fix handling of DF packets when pmtudisc is OFF
RFC 2003 requires the outer header to have DF set if DF is set
on the inner header, even when PMTU discovery is off for the
tunnel.  Our implementation does exactly that.

For this to work properly the IPIP gateway also needs to engate
in PMTU when the inner DF bit is set.  As otherwise the original
host would not be able to carry out its PMTU successfully since
part of the path is only visible to the gateway.

Unfortunately when the tunnel PMTU discovery setting is off, we
do not collect the necessary soft state, resulting in blackholes
when the original host tries to perform PMTU discovery.

This problem is not reproducible on the IPIP gateway itself as
the inner packet usually has skb->local_df set.  This is not
correctly cleared (an unrelated bug) when the packet passes
through the tunnel, which allows fragmentation to occur.  For
hosts behind the IPIP gateway it is readily visible with a simple
ping.

This patch fixes the problem by performing PMTU discovery for
all packets with the inner DF bit set, regardless of the PMTU
discovery setting on the tunnel itself.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 20:33:40 -08:00
Jan Engelhardt
539054a8fa netfilter: xt_connlimit: fix regression caused by zero family value
Commit v2.6.28-rc1~717^2~109^2~2 was slightly incomplete; not all
instances of par->match->family were changed to par->family.

References: http://bugzilla.netfilter.org/show_bug.cgi?id=610
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-11-06 18:08:32 -08:00
David S. Miller
10d626f4f4 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan 2009-11-06 17:57:51 -08:00
Johannes Berg
af81858172 mac80211: async station powersave handling
Some devices require that all frames to a station
are flushed when that station goes into powersave
mode before being able to send frames to that
station again when it wakes up or polls -- all in
order to avoid reordering and too many or too few
frames being sent to the station when it polls.

Normally, this is the case unless the station
goes to sleep and wakes up very quickly again.
But in that case, frames for it may be pending
on the hardware queues, and thus races could
happen in the case of multiple hardware queues
used for QoS/WMM. Normally this isn't a problem,
but with the iwlwifi mechanism we need to make
sure the race doesn't happen.

This makes mac80211 able to cope with the race
with driver help by a new WLAN_STA_PS_DRIVER
per-station flag that can be controlled by the
driver and tells mac80211 whether it can transmit
frames or not. This flag must be set according to
very specific rules outlined in the documentation
for the function that controls it.

When we buffer new frames for the station, we
normally set the TIM bit right away, but while
the driver has blocked transmission to that sta
we need to avoid that as well since we cannot
respond to the station if it wakes up due to the
TIM bit. Once the driver unblocks, we can set
the TIM bit.

Similarly, when the station just wakes up, we
need to wait until all other frames are flushed
before we can transmit frames to that station,
so the same applies here, we need to wait for
the driver to give the OK.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-11-06 16:49:10 -05:00
Patrick McHardy
dee5817e88 netfilter: remove unneccessary checks from netlink notifiers
The NETLINK_URELEASE notifier is only invoked for bound sockets, so
there is no need to check ->pid again.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-11-06 17:04:00 +01:00
David S. Miller
62d83681e5 Merge branch 'linux-2.6.33.y' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax 2009-11-06 05:01:54 -08:00
Dmitry Eremin-Solenikov
bb1cafb8fc ieee802154: add support for creation/removal of logic interfaces
Add support for two more NL802154 commands: ADD_IFACE and DEL_IFACE,
thus allowing creation and removal of logic WPAN interfaces on the top
of wpan-phy.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:32:24 +03:00
Dmitry Eremin-Solenikov
0a868b26c8 ieee802154: add PHY_NAME to LIST_IFACE command results
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:32:22 +03:00
Dmitry Eremin-Solenikov
339b4ca5f6 ieee802154: add two nl802154 helpers
Add two nl802154 helpers: one for starting a reply message, one for sending.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:32:21 +03:00
Dmitry Eremin-Solenikov
1eaa9d03d3 ieee802154: add LIST_PHY command support
Add nl802154 command to get information about PHY's present in
the system.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:31:22 +03:00
Dmitry Eremin-Solenikov
78fe738d1a ieee802154: split away MAC commands implementation
Move all mac-related stuff to separate file so that ieee802154/netlink.c
contains only generic code.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:31:20 +03:00
Dmitry Eremin-Solenikov
cb6b376357 ieee802154: merge nl802154 and wpan-class in single module
There is no real need to have ieee802154 interfaces separate
into several small modules, as neither of them has it's own use.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:31:18 +03:00
Dmitry Eremin-Solenikov
e9cf356c0c wpan-phy: follow usual patter of devices registration
Follow the usual pattern of devices registration by adding new function
(wpan_phy_set_dev) that sets child->parent relationship and removing
parent argument from wpan_phy_register call.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:29:50 +03:00
Dmitry Eremin-Solenikov
a0b4a738e0 wpan-phy: allow specifying a per-page channel mask
IEEE 802.15.4-2006 defines channel pages that hold channels (max 32 pages,
27 channels per page). Allow the driver to specify supported channels
on pages, other than the first one.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:23:41 +03:00
Dmitry Eremin-Solenikov
375bb0e04b wpan-phy: use snprintf to limit the amount of chars written
Use snprintf to limit the amount of chars put in the buffer for attr -> show.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:13:45 +03:00
Dmitry Eremin-Solenikov
37eb0edc84 wpan-phy: init channel/page fields
Set page to zero (for compatibility w/ devices supporting only first page).
Also init channel by default to -1 to disallow transfers for non-initialised
devices.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:13:22 +03:00
Dmitry Eremin-Solenikov
1c889f4db6 wpan-phy: add wpan-phy iteration functions
Add API to iterate over the wpan-phy instances.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
2009-11-06 14:12:24 +03:00