I did it just in alloc_skb_from_cache, forgot __alloc_skb, fixed now.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now to convert the last one, skb->data, that will allow many simplifications
and removal of some of the offset helpers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)
Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
skb->mac to skb->mac_header, to match the names of the associated helpers
(skb[_[re]set]_{transport,network,mac}_header).
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len,
that is precalculated tho, don't think we need to bloat skb with one more
member, so just use this new helper, reducing the number of non-skbuff.h
references to the layer headers even more.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This unifies the codes to copy netfilter related datas. Note that
__nf_copy() assumes destination skb doesn't have any netfilter
related members.
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the transport header, it is
still legal to touch skb->h.raw directly if just adding to,
subtracting from or setting it to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the quite common 'skb->h.raw - skb->data' sequence.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple cases:
skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()
The next ones will handle the slightly more "complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Show what protocols are bound to what packet types in /proc/net/ptype
Uses kallsyms to decode function pointers if possible.
Example:
Type Device Function
ALL eth1 packet_rcv_spkt+0x0
0800 ip_rcv+0x0
0806 arp_rcv+0x0
86dd :ipv6:ipv6_rcv+0x0
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The seq_file operations stuff can be marked constant to
get it out of dirty cache.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For Eric, mark packet type and network device watermarks
as read mostly.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the quite common 'skb->nh.raw - skb->data' sequence.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_push updates and returns skb->data, so we can just call
skb_reset_network_header after the call to skb_push.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->nh.raw = skb->data' operation, so that we can
later turn skb->nh.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the places where we need a pointer to the mac header, it is still legal to
touch skb->mac.raw directly if just adding to, subtracting from or setting it
to another layer header.
This one also converts some more cases to skb_reset_mac_header() that my
regex missed as it had no spaces before nor after '=', ugh.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.
This one touches just the most simple case, next will handle the slightly more
"complex" cases.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that network timestamps use ktime_t infrastructure, we can add a new
SOL_SOCKET sockopt SO_TIMESTAMPNS.
This command is similar to SO_TIMESTAMP, but permits transmission of
a 'timespec struct' instead of a 'timeval struct' control message.
(nanosecond resolution instead of microsecond)
Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP
A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are
mutually exclusive.
sock_recv_timestamp() became too big to be fully inlined so I added a
__sock_recv_timestamp() helper function.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: linux-arch@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
net_msg_warn should be placed in the read_mostly section, to avoid
performance problems on SMP
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several functions are marked inline or forced inline, but it
would be better to let the compiler decide.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix whitespace around keywords. Fix indentation especially of switch
statements.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Covert network warning messages from a compile time to runtime choice.
Removes kernel config option and replaces it with new /proc/sys/net/core/warnings.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates some duplicate code for the verification of
receive checksums between UDP-Lite and UDP. It does this by
introducing __skb_checksum_complete_head which is identical to
__skb_checksum_complete_head apart from the fact that it takes
a length parameter rather than computing the first skb->len bytes.
As a result UDP-Lite will be able to use hardware checksum offload
for packets which do not use partial coverage checksums. It also
means that UDP-Lite loopback no longer does unnecessary checksum
verification.
If any NICs start support UDP-Lite this would also start working
automatically.
This patch removes the assumption that msg_flags has MSG_TRUNC clear
upon entry in recvmsg.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.
This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16
I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.
As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)
Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)
Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since devices can change name and other wierdness, don't hold onto
a copy of device name, instead use pointer to output device.
Fix a couple of leaks in error handling path as well.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing htonl() macro is smart enough to do the same code as
using __constant_htonl() and it looks cleaner.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Can use random32() now.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove private debug macro and replace with standard version
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk_backlog is a critical field of struct sock. (known famous words)
It is (ab)used in hot paths, in particular in release_sock(), tcp_recvmsg(),
tcp_v4_rcv(), sk_receive_skb().
It really makes sense to place it next to sk_lock, because sk_backlog is only
used after sk_lock locked (and thus memory cache line in L1 cache). This
should reduce cache misses and sk_lock acquisition time.
(In theory, we could only move the head pointer near sk_lock, and leaving tail
far away, because 'tail' is normally not so hot, but keep it simple :) )
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise the following calltrace will lead to a wrong
lockdep warning:
neigh_proxy_process()
`- lock(neigh_table->proxy_queue.lock);
arp_redo /* via tbl->proxy_redo */
arp_process
neigh_event_ns
neigh_update
skb_queue_purge
`- lock(neighbor->arp_queue.lock);
This is not a deadlock actually, as neighbor table's proxy_queue
and the neighbor's arp_queue are different queues.
Lockdep thinks there is a deadlock as both queues are initialized
with skb_queue_head_init() and thus have a common class.
Signed-off-by: David S. Miller <davem@davemloft.net>
In net poll mode, the current checksum function doesn't consider the
kind of packet which is padded to reach a specific minimum length. I
believe that's the problem causing my test case failed. The following
patch fixed this issue.
Signed-off-by: Aubrey.Li <aubreylee@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since this was added originally for Xen, and Xen has recently (~2.6.18)
stopped using this function, we can safely get rid of it. Good timing
too since this function has started to bit rot.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pktgen module prevents suspend-to-disk. Fix.
Acked-by: "Michal Piotrowski" <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic networking code ensures that no two networking devices
have the same name, so there is no time except when sysfs has
implementation bugs that device_rename when called from
dev_change_name will fail.
The current error handling for errors from device_rename in
dev_change_name is wrong and results in an unusable and unrecoverable
network device if device_rename is happens to return an error.
This patch removes the buggy error handling. Which confines the mess
when device_rename hits a problem to sysfs, instead of propagating it
the rest of the network stack. Making linux a little more robust.
Without this patch you can observe what happens when sysfs has a bug
when CONFIG_SYSFS_DEPRECATED is not set and you attempt to rename
a real network device to a name like (broken_parity_status, device,
modalias, power, resource2, subsystem_vendor, class, driver, irq,
msi_bus, resource, subsystem, uevent, config, enable, local_cpus,
numa_node, resource0, subsystem_device, vendor)
Greg has a patch that fixes the sysfs bugs but he doesn't trust it
for a 2.6.21 timeframe. This patch which just ignores errors should
be safe and it keeps the system from going completely wacky.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes the "not found" error return for the lookup
function to -ESRCH so that it can be distinguished from
the case where a rule or route resulting in -ENETUNREACH
has been found during the search.
It fixes a bug where if DECnet was compiled with routing
support, but no routes were added to the routing table,
it was failing to fall back to endnode routing.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input_device pointer is not refcounted, which means the device may
disappear while packets are queued, causing a crash when ifb passes packets
with a stale skb->dev pointer to netif_rx().
Fix by storing the interface index instead and do a lookup where neccessary.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg discovered that kernel space was leaking to
userspace on 64 bit platform. He made a first patch to fix that. This
is an improved version of his patch.
Signed-off-by: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Ingress queueing uses a seperate lock for serializing enqueue operations,
but fails to properly protect itself against concurrent changes to the
qdisc tree. Use queue_lock for now since the real fix it quite intrusive.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.
The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.
I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down. But it
would be better to add explicit test for neigh->dead in any case.
Signed-off-by: David S. Miller <davem@davemloft.net>