Commit graph

1079 commits

Author SHA1 Message Date
Jamal Hadi Salim
5a6d34162f [XFRM] SPD info TLV aggregation
Aggregate the SPD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:55:39 -07:00
Jamal Hadi Salim
af11e31609 [XFRM] SAD info TLV aggregationx
Aggregate the SAD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:55:13 -07:00
Jennifer Hunt
561e036006 [AF_IUCV]: Implementation of a skb backlog queue
With the inital implementation we missed to implement a skb backlog
queue . The result is that socket receive processing tossed packets.
Since AF_IUCV connections are working synchronously it leads to
connection hangs. Problems with read, close and select also occured.

Using a skb backlog queue is fixing all of these problems .

Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com>
Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-04 12:22:07 -07:00
Eric Dumazet
db3459d1a7 [IPV6]: Some cleanups in include/net/ipv6.h
1) struct ip6_flowlabel : moves 'users' field to avoid two 32bits
   holes for 64bit arches. Shrinks by 8 bytes sizeof(struct
   ip6_flowlabel)

2) ipv6_addr_cmp() and ipv6_addr_copy() dont need (void *) casts :
   Compiler might take into account natural alignement of in6_addr
   structs to emit better code for memcpy()/memcmp() Casts to (void *)
   force byte accesses.

3) ipv6_addr_prefix() optimization :

Better to clear whole struct, as compiler can emit better code for
memset(addr, 0, 16) (2 stores on x86_64), and avoid some conditional
branches.

# size vmlinux.after vmlinux.before
   text    data     bss     dec     hex filename
5262262  647612  557432 6467306  62aeea vmlinux.after
5262550  647612  557432 6467594  62b00a vmlinux.before

thats 288 bytes saved.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 17:39:04 -07:00
Ilpo Järvinen
0ec96822d5 [TCP]: Use S+L catcher only with SACK for now
TCP has a transitional state when SACK is not in use during
which this invariant is temporarily broken. Without SACK,
tcp_clean_rtx_queue does not decrement sacked_out. Therefore
calls to tcp_sync_left_out before sacked_out is again
corrected by tcp_fastretrans_alert can trigger this trap as
sacked_out still has couple of segments that are already out
of window.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:30:34 -07:00
Eric Dumazet
709525fad8 [IPV6]: Get rid of __HAVE_ARCH_ADDR_SET.
__HAVE_ARCH_ADDR_SET seems unused these days, just get rid of it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:08:43 -07:00
Linus Torvalds
152a6a9da1 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
  [IPV4] SNMP: Support OutMcastPkts and OutBcastPkts
  [IPV4] SNMP: Support InMcastPkts and InBcastPkts
  [IPV4] SNMP: Support InTruncatedPkts
  [IPV4] SNMP: Support InNoRoutes
  [SNMP]: Add definitions for {In,Out}BcastPkts
  [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
  [TCP] FRTO: Delay skb available check until it's mandatory
  [XFRM]: Restrict upper layer information by bundle.
  [TCP]: Catch skb with S+L bugs earlier
  [PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo
  [L2TP]: Add the ability to autoload a pppox protocol module.
  [SKB]: Introduce skb_queue_walk_safe()
  [AF_IUCV/IUCV]: smp_call_function deadlock
  [IPV6]: Fix slab corruption running ip6sic
  [TCP]: Update references in two old comments
  [XFRM]: Export SPD info
  [IPV6]: Track device renames in snmp6.
  [SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage.
  [NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats.
  [NETPOLL]: Remove CONFIG_NETPOLL_RX
  ...
2007-04-30 08:14:42 -07:00
Ilpo Järvinen
d551e4541d [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent
This is a corner case where less than MSS sized new data thingie
is awaiting in the send queue. For F-RTO to work correctly, a
new data segment must be sent at certain point or F-RTO cannot
be used at all. RFC4138 allows overriding of Nagle at that
point.

Implementation uses frto_counter states 2 and 3 to distinguish
when Nagle override is needed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:16 -07:00
Masahide NAKAMURA
157bfc2502 [XFRM]: Restrict upper layer information by bundle.
On MIPv6 usage, XFRM sub policy is enabled.
When main (IPsec) and sub (MIPv6) policy selectors have the same
address set but different upper layer information (i.e. protocol
number and its ports or type/code), multiple bundle should be created.
However, currently we have issue to use the same bundle created for
the first time with all flows covered by the case.

It is useful for the bundle to have the upper layer information
to be restructured correctly if it does not match with the flow.

1. Bundle was created by two policies
Selector from another policy is added to xfrm_dst.
If the flow does not match the selector, it goes to slow path to
restructure new bundle by single policy.

2. Bundle was created by one policy
Flow cache is added to xfrm_dst as originated one. If the flow does
not match the cache, it goes to slow path to try searching another
policy.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:58:09 -07:00
Ilpo Järvinen
34588b4c04 [TCP]: Catch skb with S+L bugs earlier
SACKED_ACKED and LOST are mutually exclusive with SACK, thus
having their sum larger than packets_out is bug with SACK.
Eventually these bugs trigger traps in the tcp_clean_rtx_queue
with SACK but it's much more informative to do this here.

Non-SACK TCP, however, could get more than packets_out duplicate
ACKs which each increment sacked_out, so it makes sense to do
this kind of limitting for non-SACK TCP but not for SACK enabled
one. Perhaps the author had the opposite in mind but did the
logic accidently wrong way around? Anyway, the sacked_out
incrementer code for non-SACK already deals this issue before
calling sync_left_out so this trapping can be done
unconditionally.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 00:57:33 -07:00
Martin Schwidefsky
04b090d50c [AF_IUCV/IUCV]: smp_call_function deadlock
Calling smp_call_function can lead to a deadlock if it is called
from tasklet context. 
Fixing this deadlock requires to move the smp_call_function from the
tasklet context to a work queue. To do that queue the path pending
interrupts to a separate list and move the path cleanup out of
iucv_path_sever to iucv_path_connect and iucv_path_pending.
This creates a new requirement for iucv_path_connect: it may not be
called from tasklet context anymore. 
Also fixed compile problem for CONFIG_HOTPLUG_CPU=n and
another one when walking the cpu_online mask. When doing this, 
we must disable cpu hotplug.

Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-28 23:03:59 -07:00
Jamal Hadi Salim
ecfd6b1837 [XFRM]: Export SPD info
With this patch you can use iproute2 in user space to efficiently see
how many policies exist in different directions.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-28 21:20:32 -07:00
Pavel Roskin
6693228da9 [PATCH] Remove comment about IEEE80211_RADIOTAP_FCS
IEEE80211_RADIOTAP_FCS is obsolete and should not be used.  It's no
longer defined.  Remove it from the comment too.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-04-28 11:01:03 -04:00
Jouni Malinen
85d32e7b0e [PATCH] Update my email address from jkmaline@cc.hut.fi to j@w1.fi
After 13 years of use, it looks like my email address is finally going
to disappear. While this is likely to drop the amount of incoming spam
greatly ;-), it may also affect more appropriate messages, so let's
update my email address in various places. In addition, Host AP mailing
list is subscribers-only and linux-wireless can also be used for
discussing issues related to this driver which is now shown in
MAINTAINERS.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-04-28 11:01:01 -04:00
Pavel Roskin
a0d69f229f [PATCH] sparse-annotate radiotap header
Document that all fields must be little endian.  Use annotated types
even in the comments.  Consistently use shorter type names (u8, s8).
Realign the comments.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-04-28 11:01:00 -04:00
Marcelo Tosatti
876c9d3aeb [PATCH] Marvell Libertas 8388 802.11b/g USB driver
Add the Marvell Libertas 8388 802.11 USB driver.

Signed-off-by: Marcelo Tosatti <marcelo@kvack.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-04-28 11:00:54 -04:00
David Howells
b8b8fd2dc2 [NET]: Fix networking compilation errors
Fix miscellaneous networking compilation errors.

 (*) Export ktime_add_ns() for modules.

 (*) wext_proc_init() should have an ANSI declaration.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-27 15:31:24 -07:00
Johannes Berg
295f4a1fa3 [WEXT]: Clean up how wext is called.
This patch cleans up the call paths from the core code into wext.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 20:43:56 -07:00
David Howells
651350d10f [AF_RXRPC]: Add an interface to the AF_RXRPC module for the AFS filesystem to use
Add an interface to the AF_RXRPC module so that the AFS filesystem module can
more easily make use of the services available.  AFS still opens a socket but
then uses the action functions in lieu of sendmsg() and registers an intercept
functions to grab messages before they're queued on the socket Rx queue.

This permits AFS (or whatever) to:

 (1) Avoid the overhead of using the recvmsg() call.

 (2) Use different keys directly on individual client calls on one socket
     rather than having to open a whole slew of sockets, one for each key it
     might want to use.

 (3) Avoid calling request_key() at the point of issue of a call or opening of
     a socket.  This is done instead by AFS at the point of open(), unlink() or
     other VFS operation and the key handed through.

 (4) Request the use of something other than GFP_KERNEL to allocate memory.

Furthermore:

 (*) The socket buffer markings used by RxRPC are made available for AFS so
     that it can interpret the cooked RxRPC messages itself.

 (*) rxgen (un)marshalling abort codes are made available.


The following documentation for the kernel interface is added to
Documentation/networking/rxrpc.txt:

=========================
AF_RXRPC KERNEL INTERFACE
=========================

The AF_RXRPC module also provides an interface for use by in-kernel utilities
such as the AFS filesystem.  This permits such a utility to:

 (1) Use different keys directly on individual client calls on one socket
     rather than having to open a whole slew of sockets, one for each key it
     might want to use.

 (2) Avoid having RxRPC call request_key() at the point of issue of a call or
     opening of a socket.  Instead the utility is responsible for requesting a
     key at the appropriate point.  AFS, for instance, would do this during VFS
     operations such as open() or unlink().  The key is then handed through
     when the call is initiated.

 (3) Request the use of something other than GFP_KERNEL to allocate memory.

 (4) Avoid the overhead of using the recvmsg() call.  RxRPC messages can be
     intercepted before they get put into the socket Rx queue and the socket
     buffers manipulated directly.

To use the RxRPC facility, a kernel utility must still open an AF_RXRPC socket,
bind an addess as appropriate and listen if it's to be a server socket, but
then it passes this to the kernel interface functions.

The kernel interface functions are as follows:

 (*) Begin a new client call.

	struct rxrpc_call *
	rxrpc_kernel_begin_call(struct socket *sock,
				struct sockaddr_rxrpc *srx,
				struct key *key,
				unsigned long user_call_ID,
				gfp_t gfp);

     This allocates the infrastructure to make a new RxRPC call and assigns
     call and connection numbers.  The call will be made on the UDP port that
     the socket is bound to.  The call will go to the destination address of a
     connected client socket unless an alternative is supplied (srx is
     non-NULL).

     If a key is supplied then this will be used to secure the call instead of
     the key bound to the socket with the RXRPC_SECURITY_KEY sockopt.  Calls
     secured in this way will still share connections if at all possible.

     The user_call_ID is equivalent to that supplied to sendmsg() in the
     control data buffer.  It is entirely feasible to use this to point to a
     kernel data structure.

     If this function is successful, an opaque reference to the RxRPC call is
     returned.  The caller now holds a reference on this and it must be
     properly ended.

 (*) End a client call.

	void rxrpc_kernel_end_call(struct rxrpc_call *call);

     This is used to end a previously begun call.  The user_call_ID is expunged
     from AF_RXRPC's knowledge and will not be seen again in association with
     the specified call.

 (*) Send data through a call.

	int rxrpc_kernel_send_data(struct rxrpc_call *call, struct msghdr *msg,
				   size_t len);

     This is used to supply either the request part of a client call or the
     reply part of a server call.  msg.msg_iovlen and msg.msg_iov specify the
     data buffers to be used.  msg_iov may not be NULL and must point
     exclusively to in-kernel virtual addresses.  msg.msg_flags may be given
     MSG_MORE if there will be subsequent data sends for this call.

     The msg must not specify a destination address, control data or any flags
     other than MSG_MORE.  len is the total amount of data to transmit.

 (*) Abort a call.

	void rxrpc_kernel_abort_call(struct rxrpc_call *call, u32 abort_code);

     This is used to abort a call if it's still in an abortable state.  The
     abort code specified will be placed in the ABORT message sent.

 (*) Intercept received RxRPC messages.

	typedef void (*rxrpc_interceptor_t)(struct sock *sk,
					    unsigned long user_call_ID,
					    struct sk_buff *skb);

	void
	rxrpc_kernel_intercept_rx_messages(struct socket *sock,
					   rxrpc_interceptor_t interceptor);

     This installs an interceptor function on the specified AF_RXRPC socket.
     All messages that would otherwise wind up in the socket's Rx queue are
     then diverted to this function.  Note that care must be taken to process
     the messages in the right order to maintain DATA message sequentiality.

     The interceptor function itself is provided with the address of the socket
     and handling the incoming message, the ID assigned by the kernel utility
     to the call and the socket buffer containing the message.

     The skb->mark field indicates the type of message:

	MARK				MEANING
	===============================	=======================================
	RXRPC_SKB_MARK_DATA		Data message
	RXRPC_SKB_MARK_FINAL_ACK	Final ACK received for an incoming call
	RXRPC_SKB_MARK_BUSY		Client call rejected as server busy
	RXRPC_SKB_MARK_REMOTE_ABORT	Call aborted by peer
	RXRPC_SKB_MARK_NET_ERROR	Network error detected
	RXRPC_SKB_MARK_LOCAL_ERROR	Local error encountered
	RXRPC_SKB_MARK_NEW_CALL		New incoming call awaiting acceptance

     The remote abort message can be probed with rxrpc_kernel_get_abort_code().
     The two error messages can be probed with rxrpc_kernel_get_error_number().
     A new call can be accepted with rxrpc_kernel_accept_call().

     Data messages can have their contents extracted with the usual bunch of
     socket buffer manipulation functions.  A data message can be determined to
     be the last one in a sequence with rxrpc_kernel_is_data_last().  When a
     data message has been used up, rxrpc_kernel_data_delivered() should be
     called on it..

     Non-data messages should be handled to rxrpc_kernel_free_skb() to dispose
     of.  It is possible to get extra refs on all types of message for later
     freeing, but this may pin the state of a call until the message is finally
     freed.

 (*) Accept an incoming call.

	struct rxrpc_call *
	rxrpc_kernel_accept_call(struct socket *sock,
				 unsigned long user_call_ID);

     This is used to accept an incoming call and to assign it a call ID.  This
     function is similar to rxrpc_kernel_begin_call() and calls accepted must
     be ended in the same way.

     If this function is successful, an opaque reference to the RxRPC call is
     returned.  The caller now holds a reference on this and it must be
     properly ended.

 (*) Reject an incoming call.

	int rxrpc_kernel_reject_call(struct socket *sock);

     This is used to reject the first incoming call on the socket's queue with
     a BUSY message.  -ENODATA is returned if there were no incoming calls.
     Other errors may be returned if the call had been aborted (-ECONNABORTED)
     or had timed out (-ETIME).

 (*) Record the delivery of a data message and free it.

	void rxrpc_kernel_data_delivered(struct sk_buff *skb);

     This is used to record a data message as having been delivered and to
     update the ACK state for the call.  The socket buffer will be freed.

 (*) Free a message.

	void rxrpc_kernel_free_skb(struct sk_buff *skb);

     This is used to free a non-DATA socket buffer intercepted from an AF_RXRPC
     socket.

 (*) Determine if a data message is the last one on a call.

	bool rxrpc_kernel_is_data_last(struct sk_buff *skb);

     This is used to determine if a socket buffer holds the last data message
     to be received for a call (true will be returned if it does, false
     if not).

     The data message will be part of the reply on a client call and the
     request on an incoming call.  In the latter case there will be more
     messages, but in the former case there will not.

 (*) Get the abort code from an abort message.

	u32 rxrpc_kernel_get_abort_code(struct sk_buff *skb);

     This is used to extract the abort code from a remote abort message.

 (*) Get the error number from a local or network error message.

	int rxrpc_kernel_get_error_number(struct sk_buff *skb);

     This is used to extract the error number from a message indicating either
     a local error occurred or a network error occurred.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:50:17 -07:00
David Howells
17926a7932 [AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both
Provide AF_RXRPC sockets that can be used to talk to AFS servers, or serve
answers to AFS clients.  KerberosIV security is fully supported.  The patches
and some example test programs can be found in:

	http://people.redhat.com/~dhowells/rxrpc/

This will eventually replace the old implementation of kernel-only RxRPC
currently resident in net/rxrpc/.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 15:48:28 -07:00
Adrian Bunk
42bad1da50 [NETLINK]: Possible cleanups.
- make the following needlessly global variables static:
  - core/rtnetlink.c: struct rtnl_msg_handlers[]
  - netfilter/nf_conntrack_proto.c: struct nf_ct_protos[]
- make the following needlessly global functions static:
  - core/rtnetlink.c: rtnl_dump_all()
  - netlink/af_netlink.c: netlink_queue_skip()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 00:57:41 -07:00
Jamal Hadi Salim
28d8909bc7 [XFRM]: Export SAD info.
On a system with a lot of SAs, counting SAD entries chews useful
CPU time since you need to dump the whole SAD to user space;
i.e something like ip xfrm state ls | grep -i src | wc -l
I have seen taking literally minutes on a 40K SAs when the system
is swapping.
With this patch, some of the SAD info (that was already being tracked)
is exposed to user space. i.e you do:
ip xfrm state count
And you get the count; you can also pass -s to the command line and
get the hash info.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-26 00:10:29 -07:00
Herbert Xu
7f7d9a6b96 [IPV6]: Consolidate common SNMP code
This patch moves the non-proc SNMP code into addrconf.c and reuses
IPv4 SNMP code where applicable.

As a result we can skip proc.o if /proc is disabled.

Note that I've made a number of functions static since they're only
used by addrconf.c for now.  If they ever get used elsewhere we can
always remove the static.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:52 -07:00
Herbert Xu
5e0f04351d [IPV4]: Consolidate common SNMP code
This patch moves the SNMP code shared between IPv4/IPv6 from proc.c
into net/ipv4/af_inet.c.  This makes sense because these functions
aren't specific to /proc.

As a result we can again skip proc.o if /proc is disabled.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:51 -07:00
Johannes Berg
43fb45cb79 [WIRELESS] cfg80211: Update comment for locking.
This patch adds a comment that was part of my rtnl locking patch for
cfg80211 but which I forgot for the merge.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:48 -07:00
Stephen Hemminger
164891aadf [TCP]: Congestion control API update.
Do some simple changes to make congestion control API faster/cleaner.
* use ktime_t rather than timeval
* merge rtt sampling into existing ack callback
  this means one indirect call versus two per ack.
* use flags bits to store options/settings

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:45 -07:00
Johannes Berg
9e101eab15 [WIRELESS]: Remove wext over netlink.
As scheduled, this patch removes the pointless wext over netlink code.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:42 -07:00
Johannes Berg
704232c271 [WIRELESS] cfg80211: New wireless config infrastructure.
This patch creates the core cfg80211 code along with some sysfs bits.
This is a stripped down version to allow mac80211 to function, but
doesn't include any configuration yet except for creating and removing
virtual interfaces.

This patch includes the nl80211 header file but it only contains the
interface types which the cfg80211 interface for creating virtual
interfaces relies on.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:41 -07:00
YOSHIFUJI Hideaki
97fc8d0bc5 [IPV6] SNMP: Use put_unaligned() instead of memcpy().
Hint from David Miller <davem@davemloft.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:37 -07:00
YOSHIFUJI Hideaki
2334e97355 [IPV6] SNMP: Avoid unaligned accesses.
Because stats pointer may not be aligned for u64, use memcpy
to fill u64 values.
Issue reported by David Miller <davem@davemloft.net>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-04-25 22:29:35 -07:00
Ilpo Järvinen
9e412ba763 [TCP]: Sed magic converts func(sk, tp, ...) -> func(sk, ...)
This is (mostly) automated change using magic:

sed -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
    -e '/struct sock \*sk/ N' -e '/struct sock \*sk/ N'
    -e 's|struct sock \*sk,[\n\t ]*struct tcp_sock \*tp\([^{]*\n{\n\)|
	  struct sock \*sk\1\tstruct tcp_sock *tp = tcp_sk(sk);\n|g'
    -e 's|struct sock \*sk, struct tcp_sock \*tp|
	  struct sock \*sk|g' -e 's|sk, tp\([^-]\)|sk\1|g'

Fixed four unused variable (tp) warnings that were introduced.

In addition, manually added newlines after local variables and
tweaked function arguments positioning.

$ gcc --version
gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
...
$ codiff -fV built-in.o.old built-in.o.new
net/ipv4/route.c:
  rt_cache_flush |  +14
 1 function changed, 14 bytes added

net/ipv4/tcp.c:
  tcp_setsockopt |   -5
  tcp_sendpage   |  -25
  tcp_sendmsg    |  -16
 3 functions changed, 46 bytes removed

net/ipv4/tcp_input.c:
  tcp_try_undo_recovery |   +3
  tcp_try_undo_dsack    |   +2
  tcp_mark_head_lost    |  -12
  tcp_ack               |  -15
  tcp_event_data_recv   |  -32
  tcp_rcv_state_process |  -10
  tcp_rcv_established   |   +1
 7 functions changed, 6 bytes added, 69 bytes removed, diff: -63

net/ipv4/tcp_output.c:
  update_send_head          |   -9
  tcp_transmit_skb          |  +19
  tcp_cwnd_validate         |   +1
  tcp_write_wakeup          |  -17
  __tcp_push_pending_frames |  -25
  tcp_push_one              |   -8
  tcp_send_fin              |   -4
 7 functions changed, 20 bytes added, 63 bytes removed, diff: -43

built-in.o.new:
 18 functions changed, 40 bytes added, 178 bytes removed, diff: -138

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:34 -07:00
Andi Kleen
9958089a43 [NET]: Move sk_setup_caps() out of line.
It is far too large to be an inline and not in any hot paths.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:26 -07:00
Andi Kleen
4ac02bab77 [TCP]: Uninline tcp_done().
The function is quite big and has several call sites and nothing
to collapse by compiler optimization on inlining.

Besides it's nicer to read in a in .c file.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:25 -07:00
YOSHIFUJI Hideaki
334901700f [IPV4] SNMP: Move some statistic bits to net/ipv4/proc.c.
This also fixes memory leak in error path.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:12 -07:00
YOSHIFUJI Hideaki
bf99f1bde3 [IPV6] SNMP: Netlink interface.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:10 -07:00
Patrick McHardy
0463d4ae25 [NET_SCHED]: Eliminate qdisc_tree_lock
Since we're now holding the rtnl during the entire dump operation, we
can remove qdisc_tree_lock, whose only purpose is to protect dump
callbacks from concurrent changes to the qdisc tree.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:29:07 -07:00
Herbert Xu
604763722c [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY
When a transmitted packet is looped back directly, CHECKSUM_PARTIAL
maps to the semantics of CHECKSUM_UNNECESSARY.  Therefore we should
treat it as such in the stack.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:43 -07:00
Patrick McHardy
c5c2523893 [XFRM]: Optimize MTU calculation
Replace the probing based MTU estimation, which usually takes 2-3 iterations
to find a fitting value and may underestimate the MTU, by an exact calculation.

Also fix underestimation of the XFRM trailer_len, which causes unnecessary
reallocations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:38 -07:00
David Howells
716ea3a7aa [NET]: Move generic skbuff stuff from XFRM code to generic code
Move generic skbuff stuff from XFRM code to generic code so that
AF_RXRPC can use it too.

The kdoc comments I've attached to the functions needs to be checked
by whoever wrote them as I had to make some guesses about the workings
of these functions.

Signed-off-By: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:33 -07:00
Arnaldo Carvalho de Melo
2a123b86e2 [BLUETOOTH]: Introduce skb->data accessor methods for hci_{acl,event,sco}_hdr
For consistency with other skb data accessors, reducing the number of direct
accesses to skb->data.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2007-04-25 22:28:21 -07:00
Thomas Graf
73417f617a [NET] fib_rules: Flush route cache after rule modifications
The results of FIB rules lookups are cached in the routing cache
except for IPv6 as no such cache exists. So far, it was the
responsibility of the user to flush the cache after modifying any
rules. This lead to many false bug reports due to misunderstanding
of this concept.

This patch automatically flushes the route cache after inserting
or deleting a rule.

Thanks to Muli Ben-Yehuda <muli@il.ibm.com> for catching a bug
in the previous patch.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:18 -07:00
Thomas Graf
0947c9fe56 [NET] fib_rules: goto rule action
This patch adds a new rule action FR_ACT_GOTO which allows
to skip a set of rules by jumping to another rule. The rule
to jump to is specified via the FRA_GOTO attribute which
carries a rule preference.

Referring to a rule which doesn't exists is explicitely allowed.
Such goto rules are marked with the flag FIB_RULE_UNRESOLVED
and will act like a rule with a non-matching selector. The rule
will become functional as soon as its target is present.

The goto action enables performance optimizations by reducing
the average number of rules that have to be passed per lookup.

Example:
0:      from all lookup local
40:     not from all to 192.168.23.128 goto 32766
41:     from all fwmark 0xa blackhole
42:     from all fwmark 0xff blackhole
32766:  from all lookup main

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:12 -07:00
David S. Miller
b3da2cf37c [INET]: Use jhash + random secret for ehash.
The days are gone when this was not an issue, there are folks out
there with huge bot networks that can be used to attack the
established hash tables on remote systems.

So just like the routing cache and connection tracking
hash, use Jenkins hash with random secret input.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:06 -07:00
Johannes Berg
d30045a0bc [NETLINK]: introduce NLA_BINARY type
This patch introduces a new NLA_BINARY attribute policy type with the
verification of simply checking the maximum length of the payload.

It also fixes a small typo in the example.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:05 -07:00
Vlad Yasevich
703315712c [SCTP]: Implement SCTP_MAX_BURST socket option.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:04 -07:00
Vlad Yasevich
a5a35e7675 [SCTP]: Implement sac_info field in SCTP_ASSOC_CHANGE notification.
As stated in the sctp socket api draft:

   sac_info: variable

   If the sac_state is SCTP_COMM_LOST and an ABORT chunk was received
   for this association, sac_info[] contains the complete ABORT chunk as
   defined in the SCTP specification RFC2960 [RFC2960] section 3.3.7.

We now save received ABORT chunks into the sac_info field and pass that
to the user.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:03 -07:00
Vlad Yasevich
bdf3092af6 [SCTP]: Honor flags when setting peer address parameters
Parameters only take effect when a corresponding flag bit is set
and a value is specified. This means we need to check the flags
in addition to checking for non-zero value.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:02 -07:00
Vlad Yasevich
1ae4114dce [SCTP]: Implement SCTP_ADDR_CONFIRMED state for ADDR_CHNAGE event
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:01 -07:00
Vlad Yasevich
d49d91d79a [SCTP]: Implement SCTP_PARTIAL_DELIVERY_POINT option.
This option induces partial delivery to run as soon
as the specified amount of data has been accumulated on
the association.  However, we give preference to fully
reassembled messages over PD messages.  In any case,
window and buffer is freed up.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@.hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:28:00 -07:00
Vlad Yasevich
b6e1331f3c [SCTP]: Implement SCTP_FRAGMENT_INTERLEAVE socket option
This option was introduced in draft-ietf-tsvwg-sctpsocket-13.  It
prevents head-of-line blocking in the case of one-to-many endpoint.
Applications enabling this option really must enable SCTP_SNDRCV event
so that they would know where the data belongs.  Based on an
earlier patch by Ivan Skytte Jørgensen.

Additionally, this functionality now permits multiple associations
on the same endpoint to enter Partial Delivery.  Applications should
be extra careful, when using this functionality, to track EOR indicators.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:59 -07:00
Patrick McHardy
a48b5a6144 [NET_SCHED]: Unline tcf_destroy
Uninline tcf_destroy and add a helper function to destroy an entire filter
chain.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:56 -07:00
Patrick McHardy
3bebcda280 [NET_SCHED]: turn PSCHED_GET_TIME into inline function
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:55 -07:00
Patrick McHardy
03cc45c0a5 [NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline function
Also rename to psched_tdiff_bounded.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:54 -07:00
Patrick McHardy
8edc0c31d6 [NET_SCHED]: kill PSCHED_TDIFF
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:53 -07:00
Patrick McHardy
a084980dcb [NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECT
Use direct assignment and comparison instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:51 -07:00
Patrick McHardy
104e087898 [NET_SCHED]: kill PSCHED_TLESS
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:50 -07:00
Patrick McHardy
7c59e25f31 [NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:49 -07:00
Patrick McHardy
26e252df1e [NET_SCHED]: kill PSCHED_AUDIT_TDIFF
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:48 -07:00
Thomas Graf
1d00a4eb42 [NETLINK]: Remove error pointer from netlink message handler
The error pointer argument in netlink message handlers is used
to signal the special case where processing has to be interrupted
because a dump was started but no error happened. Instead it is
simpler and more clear to return -EINTR and have netlink_run_queue()
deal with getting the queue right.

nfnetlink passed on this error pointer to its subsystem handlers
but only uses it to signal the start of a netlink dump. Therefore
it can be removed there as well.

This patch also cleans up the error handling in the affected
message handlers to be consistent since it had to be touched anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:30 -07:00
Thomas Graf
c454673da7 [NET] rules: Unified rules dumping
Implements a unified, protocol independant rules dumping function
which is capable of both, dumping a specific protocol family or
all of them. This speeds up dumping as less lookups are required.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:17 -07:00
Thomas Graf
c127ea2c45 [IPv6]: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:13 -07:00
Thomas Graf
fa34ddd739 [DECNet]: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:12 -07:00
Thomas Graf
be577ddc2b [PKT_SCHED] qdisc: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:09 -07:00
Thomas Graf
63f3444fb9 [IPv4]: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:08 -07:00
Thomas Graf
9d9e6a5819 [NET] rules: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:07 -07:00
Thomas Graf
c8822a4e00 [NEIGH]: Use rtnl registration interface
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:06 -07:00
Thomas Graf
e284986385 [RTNL]: Message handler registration interface
This patch adds a new interface to register rtnetlink message
handlers replacing the exported rtnl_links[] array which
required many message handlers to be exported unnecessarly.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:27:04 -07:00
Arnaldo Carvalho de Melo
dc5fc579b9 [NETLINK]: Use nlmsg_trim() where appropriate
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:37 -07:00
Arnaldo Carvalho de Melo
27a884dc3c [SK_BUFF]: Convert skb->tail to sk_buff_data_t
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:28 -07:00
Patrick McHardy
00c04af9df [NET_SCHED]: kill jiffie conversion macros
Now that all packet schedulers have been converted to hrtimers most users
of PSCHED_JIFFIE2US and PSCHED_US2JIFFIE are gone. The remaining users use
it to convert external time units to packet scheduler clock ticks, so use
PSCHED_TICKS_PER_SEC instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:14 -07:00
Patrick McHardy
4179477f63 [NET_SCHED]: Add hrtimer based qdisc watchdog
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:05 -07:00
Patrick McHardy
641b9e0e8b [NET_SCHED]: Use ktime as clocksource
Get rid of the manual clock source selection mess and use ktime. Also
use a scalar representation, which allows to clean up pkt_sched.h a bit
more and results in less ktime_to_ns() calls in most cases.

The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite
inefficient by this patch, following patches will convert all qdiscs
to hrtimers and get rid of them entirely.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:26:04 -07:00
Patrick McHardy
010c7d6f86 [NETFILTER]: nf_conntrack: uninline notifier registration functions
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:46 -07:00
Patrick McHardy
a3c5029cf7 [NETFILTER]: nfnetlink: use mutex instead of semaphore
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:43 -07:00
Patrick McHardy
ac5357ebac [NETFILTER]: nf_conntrack: remove ugly hack in l4proto registration
Remove ugly special-casing of nf_conntrack_l4proto_generic, all it
wants is its sysctl tables registered, so do that explicitly in an
init function and move the remaining protocol initialization and
cleanup code to nf_conntrack_proto.c as well.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:40 -07:00
Patrick McHardy
587aa64163 [NETFILTER]: Remove IPv4 only connection tracking/NAT
Remove the obsolete IPv4 only connection tracking/NAT as scheduled in
feature-removal-schedule.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:34 -07:00
Arnaldo Carvalho de Melo
9c70220b73 [SK_BUFF]: Introduce skb_transport_header(skb)
For the places where we need a pointer to the transport header, it is
still legal to touch skb->h.raw directly if just adding to,
subtracting from or setting it to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:31 -07:00
Arnaldo Carvalho de Melo
aa8223c7bb [SK_BUFF]: Introduce tcp_hdr(), remove skb->h.th
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:26 -07:00
Arnaldo Carvalho de Melo
4bedb45203 [SK_BUFF]: Introduce udp_hdr(), remove skb->h.uh
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:22 -07:00
Arnaldo Carvalho de Melo
ea2ae17d64 [SK_BUFF]: Introduce skb_transport_offset()
For the quite common 'skb->h.raw - skb->data' sequence.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:16 -07:00
Arnaldo Carvalho de Melo
0660e03f6b [SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h
Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:14 -07:00
Arnaldo Carvalho de Melo
eddc9ec53b [SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:10 -07:00
Arnaldo Carvalho de Melo
c9bdd4b525 [IP]: Introduce ip_hdrlen()
For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open
coded skb->nh.iph uses, now to go after the rest...

Just out of curiosity, here are the idioms found to get the same result:

skb->nh.iph->ihl << 2
skb->nh.iph->ihl<<2
skb->nh.iph->ihl * 4
skb->nh.iph->ihl*4
(skb->nh.iph)->ihl * sizeof(u32)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:25:07 -07:00
Arnaldo Carvalho de Melo
d56f90a7c9 [SK_BUFF]: Introduce skb_network_header()
For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:59 -07:00
Arnaldo Carvalho de Melo
c1d2bbe1cd [SK_BUFF]: Introduce skb_reset_network_header(skb)
For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can
later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:46 -07:00
Arnaldo Carvalho de Melo
37e6636669 [LLC]: Kill llc_set_pdu_hdr
We'll have skb_reset_network_header soon.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:42 -07:00
Arnaldo Carvalho de Melo
459a98ed88 [SK_BUFF]: Introduce skb_reset_mac_header(skb)
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:32 -07:00
Eric Dumazet
92f37fd2ee [NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS support
Now that network timestamps use ktime_t infrastructure, we can add a new
SOL_SOCKET sockopt  SO_TIMESTAMPNS.

This command is similar to SO_TIMESTAMP, but permits transmission of
a 'timespec struct' instead of a 'timeval struct' control message.
(nanosecond resolution instead of microsecond)

Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP

A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are
mutually exclusive.

sock_recv_timestamp() became too big to be fully inlined so I added a
__sock_recv_timestamp() helper function.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: linux-arch@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:21 -07:00
Stephen Hemminger
a2a316fd06 [NET]: Replace CONFIG_NET_DEBUG with sysctl.
Covert network warning messages from a compile time to runtime choice.
Removes kernel config option and replaces it with new /proc/sys/net/core/warnings.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:05 -07:00
Eric Dumazet
ae40eb1ef3 [NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution
Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:04 -07:00
David S. Miller
fe067e8ab5 [TCP]: Abstract out all write queue operations.
This allows the write queue implementation to be changed,
for example, to one which allows fast interval searching.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:24:02 -07:00
Herbert Xu
759e5d0064 [UDP]: Clean up UDP-Lite receive checksum
This patch eliminates some duplicate code for the verification of
receive checksums between UDP-Lite and UDP.  It does this by
introducing __skb_checksum_complete_head which is identical to
__skb_checksum_complete_head apart from the fact that it takes
a length parameter rather than computing the first skb->len bytes.

As a result UDP-Lite will be able to use hardware checksum offload
for packets which do not use partial coverage checksums.  It also
means that UDP-Lite loopback no longer does unnecessary checksum
verification.

If any NICs start support UDP-Lite this would also start working
automatically.

This patch removes the assumption that msg_flags has MSG_TRUNC clear
upon entry in recvmsg.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:51 -07:00
Neil Horman
95c385b4d5 [IPV6] ADDRCONF: Optimistic Duplicate Address Detection (RFC 4429) Support.
Nominally an autoconfigured IPv6 address is added to an interface in the
Tentative state (as per RFC 2462).  Addresses in this state remain in this
state while the Duplicate Address Detection process operates on them to
determine their uniqueness on the network.  During this period, these
tentative addresses may not be used for communication, increasing the time
before a node may be able to communicate on a network.  Using Optimistic
Duplicate Address Detection, autoconfigured addresses may be used
immediately for communication on the network, as long as certain rules are
followed to avoid conflicts with other nodes during the Duplicate Address
Detection process.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:43 -07:00
Eric Dumazet
b7aa0bf70c [NET]: convert network timestamps to ktime_t
We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:34 -07:00
James Morris
9d729f72dc [NET]: Convert xtime.tv_sec to get_seconds()
Where appropriate, convert references to xtime.tv_sec to the
get_seconds() helper function.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:32 -07:00
Eric Dumazet
fa438ccfdf [NET]: Keep sk_backlog near sk_lock
sk_backlog is a critical field of struct sock. (known famous words)

It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(),
tcp_v4_rcv(), sk_receive_skb().

It really makes sense to place it next to sk_lock, because sk_backlog is only
used after sk_lock locked (and thus memory cache line in L1 cache). This
should reduce cache misses and sk_lock acquisition time.

(In theory, we could only move the head pointer near sk_lock, and leaving tail
far away, because 'tail' is normally not so hot, but keep it simple :) )

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:27 -07:00
Ilpo Järvinen
3cfe3baaf0 [TCP]: Add two new spurious RTO responses to FRTO
New sysctl tcp_frto_response is added to select amongst these
responses:
	- Rate halving based; reuses CA_CWR state (default)
	- Very conservative; used to be the only one available (=1)
	- Undo cwr; undoes ssthresh and cwnd reductions (=2)

The response with rate halving requires a new parameter to
tcp_enter_cwr because FRTO has already reduced ssthresh and
doing a second reduction there has to be prevented. In addition,
to keep things nice on 80 cols screen, a local variable was
added.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:23 -07:00
John Heffner
886236c124 [TCP]: Add RFC3742 Limited Slow-Start, controlled by variable sysctl_tcp_max_ssthresh.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:19 -07:00
Ilpo Järvinen
46d0de4ed9 [TCP] FRTO: Entry is allowed only during (New)Reno like recovery
This interpretation comes from RFC4138:
    "If the sender implements some loss recovery algorithm other
     than Reno or NewReno [FHG04], the F-RTO algorithm SHOULD
     NOT be entered when earlier fast recovery is underway."

I think the RFC means to say (especially in the light of
Appendix B) that ...recovery is underway (not just fast recovery)
or was underway when it was interrupted by an earlier (F-)RTO
that hasn't yet been resolved (snd_una has not advanced enough).
Thus, my interpretation is that whenever TCP has ever
retransmitted other than head, basic version cannot be used
because then the order assumptions which are used as FRTO basis
do not hold.

NewReno has only the head segment retransmitted at a time.
Therefore, walk up to the segment that has not been SACKed, if
that segment is not retransmitted nor anything before it, we know
for sure, that nothing after the non-SACKed segment should be
either. This assumption is valid because TCPCB_EVER_RETRANS does
not leave holes but each non-SACKed segment is rexmitted
in-order.

Check for retrans_out > 1 avoids more expensive walk through the
skb list, as we can know the result beforehand: F-RTO will not be
allowed.

SACKed skb can turn into non-SACked only in the extremely rare
case of SACK reneging, in this case we might fail to detect
retransmissions if there were them for any other than head. To
get rid of that feature, whole rexmit queue would have to be
walked (always) or FRTO should be prevented when SACK reneging
happens. Of course RTO should still trigger after reneging which
makes this issue even less likely to show up. And as long as the
response is as conservative as it's now, nothing bad happens even
then.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:12 -07:00
Ilpo Järvinen
bdaae17da8 [TCP] FRTO: Moved tcp_use_frto from tcp.h to tcp_input.c
In addition, removed inline.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25 22:23:02 -07:00
Patrick McHardy
c01003c205 [IFB]: Fix crash on input device removal
The input_device pointer is not refcounted, which means the device may
disappear while packets are queued, causing a crash when ifb passes packets
with a stale skb->dev pointer to netif_rx().

Fix by storing the interface index instead and do a lookup where neccessary.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-29 11:46:52 -07:00
Jeff Garzik
a9c87a10db Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes 2007-03-28 02:21:18 -04:00
Jean Tourrilhes
c2805fbb86 [PATCH] WE-22 : prevent information leak on 64 bit
Johannes Berg discovered that kernel space was leaking to
userspace on 64 bit platform. He made a first patch to fix that. This
is an improved version of his patch.

Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-03-27 14:10:26 -04:00
David S. Miller
f11e6659ce [IPV6]: Fix routing round-robin locking.
As per RFC2461, section 6.3.6, item #2, when no routers on the
matching list are known to be reachable or probably reachable we
do round robin on those available routes so that we make sure
to probe as many of them as possible to detect when one becomes
reachable faster.

Each routing table has a rwlock protecting the tree and the linked
list of routes at each leaf.  The round robin code executes during
lookup and thus with the rwlock taken as a reader.  A small local
spinlock tries to provide protection but this does not work at all
for two reasons:

1) The round-robin list manipulation, as coded, goes like this (with
   read lock held):

	walk routes finding head and tail

	spin_lock();
	rotate list using head and tail
	spin_unlock();

   While one thread is rotating the list, another thread can
   end up with stale values of head and tail and then proceed
   to corrupt the list when it gets the lock.  This ends up causing
   the OOPS in fib6_add() later onthat many people have been hitting.

2) All the other code paths that run with the rwlock held as
   a reader do not expect the list to change on them, they
   expect it to remain completely fixed while they hold the
   lock in that way.

So, simply stated, it is impossible to implement this correctly using
a manipulation of the list without violating the rwlock locking
semantics.

Reimplement using a per-fib6_node round-robin pointer.  This way we
don't need to manipulate the list at all, and since the round-robin
pointer can only ever point to real existing entries we don't need
to perform any locking on the changing of the round-robin pointer
itself.  We only need to reset the round-robin pointer to NULL when
the entry it is pointing to is removed.

The idea is from Thomas Graf and it is very similar to how this
was implemented before the advanced router selection code when in.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:05 -07:00
Alexey Kuznetsov
ecbb416939 [NET]: Fix neighbour destructor handling.
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.

The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.

I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down.  But it
would be better to add explicit test for neigh->dead in any case.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:01 -07:00
Thomas Graf
e1701c68c1 [NET]: Fix fib_rules compatibility breakage
Based upon a patch from Patrick McHardy.

The fib_rules netlink attribute policy introduced in 2.6.19 broke
userspace compatibilty. When specifying a rule with "from all"
or "to all", iproute adds a zero byte long netlink attribute,
but the policy requires all addresses to have a size equal to
sizeof(struct in_addr)/sizeof(struct in6_addr), resulting in a
validation error.

Check attribute length of FRA_SRC/FRA_DST in the generic framework
by letting the family specific rules implementation provide the
length of an address. Report an error if address length is non
zero but no address attribute is provided. Fix actual bug by
checking address length for non-zero instead of relying on
availability of attribute.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-25 18:48:00 -07:00
Vlad Yasevich
749bf9215e [SCTP]: Reset some transport and association variables on restart
If the association has been restarted, we need to reset the
transport congestion variables as well as accumulated error
counts and CACC variables.  If we do not, the association
will use the wrong values and may terminate prematurely.

This was found with a scenario where the peer restarted
the association when lksctp was in the last HB timeout for
its association.  The restart happened, but the error counts
have not been reset and when the timeout occurred, a newly
restarted association was terminated due to excessive
retransmits.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-20 00:09:45 -07:00
Vlad Yasevich
0b58a81146 [SCTP]: Clean up stale data during association restart
During association restart we may have stale data sitting
on the ULP queue waiting for ordering or reassembly.  This
data may cause severe problems if not cleaned up.  In particular
stale data pending ordering may cause problems with receive
window exhaustion if our peer has decided to restart the
association.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-20 00:09:43 -07:00
Eric Paris
ef41aaa0b7 [IPSEC]: xfrm_policy delete security check misplaced
The security hooks to check permissions to remove an xfrm_policy were
actually done after the policy was removed.  Since the unlinking and
deletion are done in xfrm_policy_by* functions this moves the hooks
inside those 2 functions.  There we have all the information needed to
do the security check and it can be done before the deletion.  Since
auditing requires the result of that security check err has to be passed
back and forth from the xfrm_policy_by* functions.

This patch also fixes a bug where a deletion that failed the security
check could cause improper accounting on the xfrm_policy
(xfrm_get_policy didn't have a put on the exit path for the hold taken
by xfrm_policy_by*)

It also fixes the return code when no policy is found in
xfrm_add_pol_expire.  In old code (at least back in the 2.6.18 days) err
wasn't used before the return when no policy is found and so the
initialization would cause err to be ENOENT.  But since err has since
been used above when we don't get a policy back from the xfrm_policy_by*
function we would always return 0 instead of the intended ENOENT.  Also
fixed some white space damage in the same area.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Venkat Yekkirala <vyekkirala@trustedcs.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-07 16:08:09 -08:00
David S. Miller
64a146513f [NET]: Revert incorrect accept queue backlog changes.
This reverts two changes:

8488df894d
248f06726e

A backlog value of N really does mean allow "N + 1" connections
to queue to a listening socket.  This allows one to specify
"0" as the backlog and still get 1 connection.

Noticed by Gerrit Renker and Rick Jones.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-06 11:21:05 -08:00
Eric Dumazet
187f5f84ef [INET]: twcal_jiffie should be unsigned long, not int
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05 13:32:48 -08:00
Patrick McHardy
ec68e97ded [NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops
Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling:

- unconfirmed entries can not be killed manually, they are removed on
  confirmation or final destruction of the conntrack entry, which means
  we might iterate forever without making forward progress.

  This can happen in combination with the conntrack event cache, which
  holds a reference to the conntrack entry, which is only released when
  the packet makes it all the way through the stack or a different
  packet is handled.

- taking references to an unconfirmed entry and using it outside the
  locked section doesn't work, the list entries are not refcounted and
  another CPU might already be waiting to destroy the entry

What the code really wants to do is make sure the references of the hash
table to the selected conntrack entries are released, so they will be
destroyed once all references from skbs and the event cache are dropped.

Since unconfirmed entries haven't even entered the hash yet, simply mark
them as dying and skip confirmation based on that.

Reported and tested by Chuck Ebbert <cebbert@redhat.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05 13:25:18 -08:00
Wei Dong
8488df894d [NET]: Fix bugs in "Whether sock accept queue is full" checking
when I use linux TCP socket, and find there is a bug in function
sk_acceptq_is_full().

	When a new SYN comes, TCP module first checks its validation. If valid,
send SYN,ACK to the client and add the sock to the syn hash table. Next
time if received the valid ACK for SYN,ACK from the client. server will
accept this connection and increase the sk->sk_ack_backlog -- which is
done in function tcp_check_req().We check wether acceptq is full in
function tcp_v4_syn_recv_sock().

Consider an example:

 After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to
1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept()
system call is not invoked now.

1. 1st connection comes. invoke sk_acceptq_is_full(). sk-
>sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept
this connection. Increase the sk->sk_ack_backlog
2. 2nd connection comes. invoke sk_acceptq_is_full(). sk-
>sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept
this connection. Increase the sk->sk_ack_backlog
3. 3rd connection comes. invoke sk_acceptq_is_full(). sk-
>sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. Refuse
this connection.

I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1
but now it can accept 2 connections.

Signed-off-by: Wei Dong <weid@np.css.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-02 20:37:33 -08:00
Patrick McHardy
4498121ca3 [NET]: Handle disabled preemption in gfp_any()
ctnetlink uses netlink_unicast from an atomic_notifier_chain
(which is called within a RCU read side critical section)
without holding further locks. netlink_unicast calls netlink_trim
with the result of gfp_any() for the gfp flags, which are passed
down to pskb_expand_header. gfp_any() only checks for softirq
context and returns GFP_KERNEL, resulting in this warning:

BUG: sleeping function called from invalid context at mm/slab.c:3032
in_atomic():1, irqs_disabled():0
no locks held by rmmod/7010.

Call Trace:
 [<ffffffff8109467f>] debug_show_held_locks+0x9/0xb
 [<ffffffff8100b0b4>] __might_sleep+0xd9/0xdb
 [<ffffffff810b5082>] __kmalloc+0x68/0x110
 [<ffffffff811ba8f2>] pskb_expand_head+0x4d/0x13b
 [<ffffffff81053147>] netlink_broadcast+0xa5/0x2e0
 [<ffffffff881cd1d7>] :nfnetlink:nfnetlink_send+0x83/0x8a
 [<ffffffff8834f6a6>] :nf_conntrack_netlink:ctnetlink_conntrack_event+0x94c/0x96a
 [<ffffffff810624d6>] notifier_call_chain+0x29/0x3e
 [<ffffffff8106251d>] atomic_notifier_call_chain+0x32/0x60
 [<ffffffff881d266d>] :nf_conntrack:destroy_conntrack+0xa5/0x1d3
 [<ffffffff881d194e>] :nf_conntrack:nf_ct_cleanup+0x8c/0x12c
 [<ffffffff881d4614>] :nf_conntrack:kill_l3proto+0x0/0x13
 [<ffffffff881d482a>] :nf_conntrack:nf_conntrack_l3proto_unregister+0x90/0x94
 [<ffffffff883551b3>] :nf_conntrack_ipv4:nf_conntrack_l3proto_ipv4_fini+0x2b/0x5d
 [<ffffffff8109d44f>] sys_delete_module+0x1b5/0x1e6
 [<ffffffff8105f245>] trace_hardirqs_on_thunk+0x35/0x37
 [<ffffffff8105911e>] system_call+0x7e/0x83

Since netlink_unicast is supposed to be callable from within RCU
read side critical sections, make gfp_any() check for in_atomic()
instead of in_softirq().

Additionally nfnetlink_send needs to use gfp_any() as well for the
call to netlink_broadcast).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-28 09:42:13 -08:00
Adrian Bunk
a39a21982c [IRDA] net/irda/: proper prototypes
This patch adds proper prototypes for some functions in
include/net/irda/irda.h

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26 11:42:43 -08:00
Kazunori MIYAZAWA
73d605d1ab [IPSEC]: changing API of xfrm6_tunnel_register
This patch changes xfrm6_tunnel register and deregister
interface to prepare for solving the conflict of device
tunnels with inter address family IPsec tunnel.
There is no device which conflicts with IPv4 over IPv6
IPsec tunnel.

Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13 12:55:55 -08:00
Kazunori MIYAZAWA
c0d56408e3 [IPSEC]: Changing API of xfrm4_tunnel_register.
This patch changes xfrm4_tunnel register and deregister
interface to prepare for solving the conflict of device
tunnels with inter address family IPsec tunnel.

Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13 12:54:47 -08:00
Patrick McHardy
fe3eb20c1a [NETFILTER]: nf_conntrack: change nf_conntrack_l[34]proto_unregister to void
No caller checks the return value, and since its usually called within the
module unload path there's nothing a module could do about errors anyway,
so BUG on invalid conditions and return void.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:14:28 -08:00
Patrick McHardy
c0e912d7ed [NETFILTER]: nf_conntrack: fix invalid conntrack statistics RCU assumption
NF_CT_STAT_INC assumes rcu_read_lock in nf_hook_slow disables
preemption as well, making it legal to use __get_cpu_var without
disabling preemption manually. The assumption is not correct anymore
with preemptable RCU, additionally we need to protect against softirqs
when not holding nf_conntrack_lock.

Add NF_CT_STAT_INC_ATOMIC macro, which disables local softirqs,
and use where necessary.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:13:43 -08:00
Patrick McHardy
923f4902fe [NETFILTER]: nf_conntrack: properly use RCU API for nf_ct_protos/nf_ct_l3protos arrays
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
all paths not obviously only used within packet process context
(nfnetlink_conntrack).
  
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:57 -08:00
Arjan van de Ven
540473208f [PATCH] mark struct file_operations const 1
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:44 -08:00
Eric Dumazet
1e19e02ca0 [NET]: Reorder fields of struct dst_entry
This last patch (but not least :) ) finally moves the next pointer at
the end of struct dst_entry. This permits to perform route cache
lookups with a minimal cost of one cache line per entry, instead of
two.

Both 32bits and 64bits platforms benefit from this new layout.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:45 -08:00
Eric Dumazet
0c195c3fc4 [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer
This patch removes the next pointer from 'struct dn_route.u' union,
and renames u.rt_next to u.dst.dn_next.

It also moves 'struct flowi' right after 'struct dst_entry' to prepare
speedup lookups.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:43 -08:00
Eric Dumazet
7cc482634f [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer
This patch removes the next pointer from 'struct rt6_info.u' union,
and renames u.next to u.dst.rt6_next.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:40 -08:00
Eric Dumazet
093c2ca416 [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer
This patch removes the rt_next pointer from 'struct rtable.u' union,
and renames u.rt_next to u.dst_rt_next.

It also moves 'struct flowi' right after 'struct dst_entry' to prepare
the gain on lookups.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:38 -08:00
Eric Dumazet
75ce7ceaa1 [NET]: Introduce union in struct dst_entry to hold 'next' pointer
This patch introduces an anonymous union to nicely express the fact that all
objects inherited from struct dst_entry should access to the generic 'next'
pointer but with appropriate type verification.

This patch is a prereq before following patches.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:36 -08:00
Eric Dumazet
dbca9b2750 [NET]: change layout of ehash table
ehash table layout is currently this one :

First half of this table is used by sockets not in TIME_WAIT state
Second half of it is used by sockets in TIME_WAIT state.

This is non optimal because of for a given hash or socket, the two chain heads 
are located in separate cache lines.
Moreover the locks of the second half are never used.

If instead of this halving, we use two list heads in inet_ehash_bucket instead 
of only one, we probably can avoid one cache miss, and reduce ram usage, 
particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, 
CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep 
together related chains to speedup lookups and socket state change.

In this patch I did not try to align struct inet_ehash_bucket, but a future 
patch could try to make this structure have a convenient size (a power of two 
or a multiple of L1_CACHE_SIZE).
I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so 
maybe we dont need to scratch our heads to align the bucket...

Note : In case struct inet_ehash_bucket is not a power of two, we could 
probably change alloc_large_system_hash() (in case it use __get_free_pages()) 
to free the unused space. It currently allocates a big zone, but the last 
quarter of it could be freed. Again, this should be a temporary 'problem'.

Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 14:16:46 -08:00
Jennifer Hunt
eac3731bd0 [S390]: Add AF_IUCV socket support
From: Jennifer Hunt <jenhunt@us.ibm.com>

This patch adds AF_IUCV socket support.

Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:51:54 -08:00
Martin Schwidefsky
2356f4cb19 [S390]: Rewrite of the IUCV base code, part 2
Add rewritten IUCV base code to net/iucv.

Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:37:42 -08:00
Andrew Hendry
39e21c0d34 [X.25]: Adds /proc/sys/net/x25/x25_forward to control forwarding.
echo "1" > /proc/sys/net/x25/x25_forward 
To turn on x25_forwarding, defaults to off
Requires the previous patch.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:34:36 -08:00
Andrew Hendry
95a9dc4390 [X.25]: Add call forwarding
Adds call forwarding to X.25, allowing it to operate like an X.25 router.
Useful if one needs to manipulate X.25 traffic with tools like tc.
This is an update/cleanup based off a patch submitted by Daniel Ferenci a few years ago.

Thanks Alan for the feedback.
Added the null check to the clones.
Moved the skb_clone's into the forwarding functions.

Worked ok with Cisco XoT, linux X.25 back to back, and some old NTUs/PADs.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:34:02 -08:00
Shinta Sugimoto
80c9abaabf [XFRM]: Extension for dynamic update of endpoint address(es)
Extend the XFRM framework so that endpoint address(es) in the XFRM
databases could be dynamically updated according to a request (MIGRATE
message) from user application. Target XFRM policy is first identified
by the selector in the MIGRATE message. Next, the endpoint addresses
of the matching templates and XFRM states are updated according to
the MIGRATE message.

Signed-off-by: Shinta Sugimoto <shinta.sugimoto@ericsson.com>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 13:11:42 -08:00
Eric Leblond
41f4689a7c [NETFILTER]: NAT: optional source port randomization support
This patch adds support to NAT to randomize source ports.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:17 -08:00
Michal Schmidt
6fecd19851 [NETFILTER]: Add SANE connection tracking helper
This is nf_conntrack_sane, a netfilter connection tracking helper module
for the SANE protocol used by the 'saned' daemon to make scanners available
via network. The SANE protocol uses separate control & data connections,
similar to passive FTP. The helper module is needed to recognize the data
connection as RELATED to the control one.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:09 -08:00
Miika Komu
cdca72652a [IPSEC]: exporting xfrm_state_afinfo
This patch exports xfrm_state_afinfo.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:00 -08:00
David S. Miller
8eb9086f21 [IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts.
Do this even for non-blocking sockets.  This avoids the silly -EAGAIN
that applications can see now, even for non-blocking sockets in some
cases (f.e. connect()).

With help from Venkat Tekkirala.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:45 -08:00
Frederik Deweerdt
ba7808eac1 [TCP]: remove tcp header from tcp_v4_check (take #2)
The tcphdr struct passed to tcp_v4_check is not used, the following
patch removes it from the parameter list.

This adds the netfilter modifications missing in the patch I sent
for rc3-mm1.

Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:44 -08:00
David S. Miller
e89862f4c5 [TCP]: Restore SKB socket owner setting in tcp_transmit_skb().
Revert 931731123a

We can't elide the skb_set_owner_w() here because things like certain
netfilter targets (such as owner MATCH) need a socket to be set on the
SKB for correct operation.

Thanks to Jan Engelhardt and other netfilter list members for
pointing this out.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 01:04:55 -08:00
Vlad Yasevich
610ab73ac4 [SCTP]: Correctly handle unexpected INIT-ACK chunk.
Consider the chunk as Out-of-the-Blue if we don't have
an endpoint.  Otherwise discard it as before.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23 20:25:46 -08:00
Mikael Pettersson
16d807988f [NETFILTER]: fix xt_state compile failure
In file included from net/netfilter/xt_state.c:13:
include/net/netfilter/nf_conntrack_compat.h: In function 'nf_ct_l3proto_try_module_get':
include/net/netfilter/nf_conntrack_compat.h:70: error: 'PF_INET' undeclared (first use in this function)
include/net/netfilter/nf_conntrack_compat.h:70: error: (Each undeclared identifier is reported only once
include/net/netfilter/nf_conntrack_compat.h:70: error: for each function it appears in.)
include/net/netfilter/nf_conntrack_compat.h:71: warning: control reaches end of non-void function
make[2]: *** [net/netfilter/xt_state.o] Error 1
make[1]: *** [net/netfilter] Error 2
make: *** [net] Error 2

A simple fix is to have nf_conntrack_compat.h #include <linux/socket.h>.

Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23 20:25:43 -08:00
Jeff Garzik
11897539a9 Merge branch 'upstream-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream-fixes 2007-01-07 22:44:56 -05:00
Gerrit Renker
0d630cc0a6 [TCP]: Use old definition of before
This reverts the new (unambiguous) definition of the TCP `before'
relation. As pointed out in an example by Herbert Xu, there is 
existing code which implicitly requires the old definition in order
to work correctly.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-04 12:25:16 -08:00
Adrian Bunk
7f18ba6248 [X25]: proper prototype for x25_init_timers()
This patch adds a proper prototype for x25_init_timers() in 
include/net/x25.h

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-03 18:48:13 -08:00
Zhu Yi
3eb546057d [PATCH] ieee80211: WLAN_GET_SEQ_SEQ fix (select correct region)
The WLAN_GET_SEQ_SEQ(seq) macro in ieee80211 is selecting the wrong region.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2007-01-02 20:56:26 -05:00
Al Viro
b23e353666 [IPV6]: Dumb typo in generic csum_ipv6_magic()
... duh

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:07 -08:00
Adrian Bunk
24123186fa [SCTP]: make 2 functions static
This patch makes the following needlessly global functions static:
- ipv6.c: sctp_inet6addr_event()
- protocol.c: sctp_inetaddr_event()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:05 -08:00
Ivan Skytte Jorgensen
0f3fffd8ab [SCTP]: Fix typo adaption -> adaptation as per the latest API draft.
Signed-off-by: Ivan Skytte Jorgensen <isj-sctp@i1.dk>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:04 -08:00
Gerrit Renker
9a036b9c33 [TCP]: Fix ambiguity in the `before' relation.
While looking at DCCP sequence numbers, I stumbled over a problem with
the following definition of before in tcp.h:

static inline int before(__u32 seq1, __u32 seq2)
{
        return (__s32)(seq1-seq2) < 0;
}

Problem: This definition suffers from an an ambiguity, i.e. always

           before(a, (a + 2^31) % 2^32)) = 1
           before((a + 2^31) % 2^32), a) = 1

         In text: when the difference between a and b amounts to 2^31,
         a is always considered `before' b, the function can not decide.
         The reason is that implicitly 0 is `before' 1 ... 2^31-1 ... 2^31

Solution: There is a simple fix, by defining before in such a way that
          0 is no longer `before' 2^31, i.e. 0 `before' 1 ... 2^31-1
          By not using the middle between 0 and 2^32, before can be made
          unambiguous.
          This is achieved by testing whether seq2-seq1 > 0 (using signed
          32-bit arithmetic).

I attach a patch to codify this. Also the `after' relation is basically
a redefinition of `before', it is now defined as a macro after before.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:12:01 -08:00
Ralf Baechle
a3d384029a [AX.25]: Fix unchecked rose_add_loopback_neigh uses
rose_add_loopback_neigh uses kmalloc and the callers were ignoring the
error value.  Rewrite to let the caller deal with the allocation.  This
allows the use of static allocation of kmalloc use entirely.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:14 -08:00
Ralf Baechle
a4282717c1 [AX.25]: Fix unchecked ax25_linkfail_register uses
ax25_linkfail_register uses kmalloc and the callers were ignoring the
error value.  Rewrite to let the caller deal with the allocation.  This
allows the use of static allocation of kmalloc use entirely.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:11 -08:00