The CCID should be notified of packet reception only when a packet is
valid. Therefore, the ACK vector needs to be processed before
notifying the CCID. Also, the CCID might need information provided by
the ACK vector.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If ACK vectors are used, each packet with an ACK should contain an ACK
vector. The only exception currently is response packets. It
probably is not a good idea to store ACK vector state before the
connection is completed (to help protect from syn floods).
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When packets are received, the connection is either in DCCP_OPEN
[fast-path] or it isn't. If it's not [e.g. DCCP_PARTOPEN] upper
layers will perform sanity checks and parse options. If it is in
DCCP_OPEN, dccp_rcv_established() will do it. It is important not to
re-parse options in dccp_rcv_established() when it is not called from
the fast-path. Else, fore example, the ack vector will be added twice
and the CCID will see the packet twice.
The solution is to always enfore sanity checks from the upper layers.
When packets arrive in the fast-path, sanity checks will be performed
before calling dccp_rcv_established().
Note(acme): I rewrote the patch to achieve the same result but keeping
dccp_rcv_established with the previous semantics and having it split
into __dccp_rcv_established, that doesn't does do any sanity check,
code in state != DCCP_OPEN use this lighter version as they already do
the sanity checks.
Signed-off-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The attached patch makes DECnet routing only use routers from the same
area - rather than the highest rated router seen.
In theory there should not be an out-of-area router on a local network
but some networks are bridged rather than properly routed. VMS seems
to behave similarly: if I bring up a VMS node with no router then it
can't see anything else on the global network.
Signed-off-by: Patrick Caulfield <patrick@tykepenguin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Roberto Nibali <ratz@drugphish.ch>
The attached patch (against current -GIT) is a cleanup patch which does
following:
o lookup debug messages shifted back to 9
o added more informational value to flags and refcnt since those
entries can be in multiple referenced structures
o cleanup 80 char violation
It's the prepatch to the session pool implementation and helps very much
to debug and monitor important variables and structures regarding the
threshold limitation and persistency without the thousands of lookup
messages which noone is interested in.
Signed-off-by: Horms <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turning struct iphdr::tot_len into __be16 added sparse warning.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently all network protocols need to call dev_ioctl as the default
fallback in their ioctl implementations. This patch adds a fallback
to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD.
This way all the procotol ioctl handlers can be simplified and we don't
need to export dev_ioctl.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
lock_sock is needed only in very few cases, so do it there instead of
around the switch statement.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Benjamin LaHaise <bcrl@kvack.org>
In af_unix, a rwlock is used to protect internal state. At least on my
P4 with HT it is faster to use a spinlock due to the simpler memory
barrier used to unlock. This patch raises bw_unix to ~690K/s.
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Benjamin LaHaise <bcrl@kvack.org>
In __alloc_skb(), the use of skb_shinfo() which casts a u8 * to the
shared info structure results in gcc being forced to do a reload of the
pointer since it has no information on possible aliasing. Fix this by
using a pointer to refer to skb_shared_info.
By initializing skb_shared_info sequentially, the write combining buffers
can reduce the number of memory transactions to a single write. Reorder
the initialization in __alloc_skb() to match the structure definition.
There is also an alignment issue on 64 bit systems with skb_shared_info
by converting nr_frags to a short everything packs up nicely.
Also, pass the slab cache pointer according to the fclone flag instead
of using two almost identical function calls.
This raises bw_unix performance up to a peak of 707KB/s when combined
with the spinlock patch. It should help other networking protocols, too.
Signed-off-by: David S. Miller <davem@davemloft.net>
And actually, with this, the whole pppox layer can basically
be removed and subsumed into pppoe.c, no other pppox sub-protocol
implementation exists and we've had this thing for at least 4
years.
Signed-off-by: David S. Miller <davem@davemloft.net>
To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.
Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Its common enough to to justify that, TCP still can't use it as it has the
prequeueing stuff, still to be made generic in the not so distant future :-)
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mid-term I plan to restructure the file_operations so that we don't need
to have all these duplicate aio and vectored versions. This patch is
a small step in that direction but also a worthwile cleanup on it's own:
(1) introduce a alloc_sock_iocb helper that encapsulates allocating a
proper sock_iocb
(2) add do_sock_read and do_sock_write helpers for common read/write
code
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
It needs to return zero now that it is an initcall.
Also, net/nonet.c no longer needs a dummy sock_init().
Signed-off-by: David S. Miller <davem@davemloft.net>
static variables should not be explicitly initialised to 0. This causes
them to be placed in .data instead of .bss. This patch de-initialises 3
static variables in net/core/pktgen.c.
There are approximately 800 more such variables in the source tree
(2.6.15rc5). If there is more interrest I'd be willing to track down the
rest of these as well and de-initialise them as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed that some of 'struct proto_ops' used in the kernel may share
a cache line used by locks or other heavily modified data. (default
linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
least)
This patch makes sure a 'struct proto_ops' can be declared as const,
so that all cpus can share all parts of it without false sharing.
This is not mandatory : a driver can still use a read/write structure
if it needs to (and eventually a __read_mostly)
I made a global stubstitute to change all existing occurences to make
them const.
This should reduce the possibility of false sharing on SMP, and
speedup some socket system calls.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sock_init can be done as a core_initcall instead of calling
it directly in init/main.c
Also I removed an out of date #ifdef.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to set/get heartbeat interval, maximum number of
retransmissions, pathmtu, sackdelay time for a particular transport/
association/socket as per the latest SCTP sockets api draft11.
Signed-off-by: Frank Filz <ffilz@us.ibm.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace cube root algorithim with a faster version using Newton-Raphson.
Surprisingly, doing the scaled div64_64 is faster than a true 64 bit
division on 64 bit CPU's.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Revised version of patch to pre-compute values for TCP cubic.
* d32,d64 replaced with descriptive names
* cube_factor replaces
srtt[scaled by count] / HZ * ((1 << (10+2*BICTCP_HZ)) / bic_scale)
* beta_scale replaces
8*(BICTCP_BETA_SCALE+beta)/3/(BICTCP_BETA_SCALE-beta);
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a new feature for netem in 2.6.16. It adds the ability to
randomly corrupt packets with netem. A version was done by
Hagen Paul Pfeifer, but I redid it to handle the cases of backwards
compatibility with netlink interface and presence of hardware checksum
offload. It is useful for testing hardware offload in devices.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add limited ethtool support to bridge to allow disabling
features.
Note: if underlying device does not support a feature (like checksum
offload), then the bridge device won't inherit it.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While in the learning state, run filters but drop the result.
This prevents us from acquiring bad fdb entries in learning state.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Speed of a interface may not be available until carrier
is detected in the case of autonegotiation. To get the correct value
we need to recheck speed after carrier event. But the check needs to
be done in a context that is similar to normal ethtool interface (can sleep).
Also, delay check for 1ms to try avoid any carrier bounce transitions.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some people are using bridging to hide multiple machines from an ISP
that restricts by MAC address. So in that case allow the bridge mac
address to be set to any of the existing interfaces. I don't want to
allow any arbitrary value and confuse STP.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This lock is actually taken mostly as a writer,
so using a rwlock actually just makes performance
worse especially on chips like the Intel P4.
Signed-off-by: David S. Miller <davem@davemloft.net>
As DCCP needs to be called in the same spots.
Now we have a member in inet_sock (is_icsk), set at sock creation time from
struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and
DCCP) to see if a struct sock instance is a inet_connection_sock for places
like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if
sk_type was SOCK_STREAM, that is insufficient because we now use the same code
for DCCP, that has sk_type SOCK_DCCP.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upcoming patches will make, for instance, ip_sockglue.c need just this enum
and not all of tcp.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming it to inet6_hash_connect, making it possible to ditch
dccp_v6_hash_connect and share the same code with TCP instead.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming it to inet_hash_connect, making it possible to ditch
dccp_v4_hash_connect and share the same code with TCP instead.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that we can share several timewait sockets related functions and
make the timewait mini sockets infrastructure closer to the request
mini sockets one.
Next changesets will take advantage of this, moving more code out of
TCP and DCCP v4 and v6 to common infrastructure.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we have the destructor (dccp_v4_reqsk_destructor) in our
request_sock_ops vtable.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Still needs mucho polishing, specially in the checksum code, but works
just fine, inet_diag/iproute2 and all 8)
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was already non-TCP specific, will be used by DCCPv6.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Basically exports a similar set of functions as the one exported by
the non-AF specific TCP code.
In the process moved some non-AF specific code from dccp_v4_connect to
dccp_connect_init and moved the checksum verification from
dccp_invalid_packet to dccp_v4_rcv, so as to use it in dccp_v6_rcv
too.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Out of tcp6_timewait_sock, that now is just an aggregation of
inet_timewait_sock and inet6_timewait_sock, using tw_ipv6_offset in struct
inet_timewait_sock, that is common to the IPv6 transport protocols that use
timewait sockets, like DCCP and TCP.
tw_ipv6_offset plays the struct inet_sock pinfo6 role, i.e. for the generic
code to find the IPv6 area in a timewait sock.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using sk->sk_protocol instead of IPPROTO_TCP.
Will be used by DCCPv6 in the next changesets.
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
AF_UNIX stream socket performance on P4 CPUs tends to suffer due to a
lot of pipeline flushes from atomic operations. The patch below
removes the sock_hold() and sock_put() in unix_stream_sendmsg(). This
should be safe as the socket still holds a reference to its peer which
is only released after the file descriptor's final user invokes
unix_release_sock(). The only consideration is that we must add a
memory barrier before setting the peer initially.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It also looks like there were 2 places where the test on sk_err was
missing from the event wait logic (in sk_stream_wait_connect and
sk_stream_wait_memory), while the rest of the sock_error() users look
to be doing the right thing. This version of the patch fixes those,
and cleans up a few places that were testing ->sk_err directly.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes dead code. I don't see the reason to keep this cruft
around, besides cluttering the nice and functionally working code.
Signed-off-by: Roberto Nibali <ratz@drugphish.ch>
Signed-off-by: Horms <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>