Commit graph

157417 commits

Author SHA1 Message Date
Guennadi Liakhovetski
20de03dae5 i.MX31: fix framebuffer locking regressions
Recent framebuffer locking patches first made affected systems unbootable,
then the dead-lock has been fixed but as of 2.6.31-rc4 the framebuffer on
mx3 machines doesn't work. Fix this.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:56 -07:00
OGAWA Hirofumi
2d8dd38a5a vfs: mnt_want_write_file(): fix special file handling
I suspect that mnt_want_write_file() may have wrong assumption.  I think
mnt_want_write_file() is assuming it increments ->mnt_writers if
(file->f_mode & FMODE_WRITE).  But, if it's special_file(), it is false?

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:56 -07:00
Eric Sandeen
69130c7cf9 compat_ioctl: hook up compat handler for FIEMAP ioctl
The FIEMAP_IOC_FIEMAP mapping ioctl was missing a 32-bit compat handler,
which means that 32-bit suerspace on 64-bit kernels cannot use this ioctl
command.

The structure is nicely aligned, padded, and sized, so it is just this
simple.

Tested w/ 32-bit ioctl tester (from Josef) on a 64-bit kernel on ext4.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Mark Lord <lkml@rtr.ca>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Josef Bacik <josef@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:56 -07:00
Johannes Weiner
0035fe00f7 fbcon: don't use vc_resize() on initialization
Catalin and kmemleak spotted a leak of a VC screen buffer in
vc_allocate() due to the following chain of events:

	vc_allocate()
	  visual_init(init=1)
	    vc->vc_sw->con_init(init=1)
              fbcon_init()
	        vc_resize()
	          vc->screen_buf = kmalloc()
	  vc->screen_buf = kmalloc()

The common way for the VC drivers is to set the screen dimension
parameters manually in the init case and only call vc_resize() for
!init - which allocates a screen buffer according to the new
dimensions.

fbcon instead would do vc_resize() unconditionally and afterwards set
the dimensions manually (again) for !init - i.e. completely upside
down.  The vc_resize() allocated buffer would then get lost by
vc_allocate() allocating a fresh one.

Use vc_resize() only for actual resizing to close the leak.

Set the dimensions manually only in initialization mode to remove the
redundant setting in resize mode.

The kmemleak trace from Catalin:

unreferenced object 0xde158000 (size 12288):
  comm "Xorg", pid 1439, jiffies 4294961016
  hex dump (first 32 bytes):
    20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
    20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
  backtrace:
    [<c006f74b>] __save_stack_trace+0x17/0x1c
    [<c006f81d>] create_object+0xcd/0x188
    [<c01f5457>] kmemleak_alloc+0x1b/0x3c
    [<c006e303>] __kmalloc+0xdb/0xe8
    [<c012cc4b>] vc_do_resize+0x73/0x1e0
    [<c012cdf1>] vc_resize+0x15/0x18
    [<c011afc1>] fbcon_init+0x1f9/0x2b8
    [<c0129e87>] visual_init+0x9f/0xdc
    [<c012aff3>] vc_allocate+0x7f/0xfc
    [<c012b087>] con_open+0x17/0x80
    [<c0120e43>] tty_open+0x1f7/0x2e4
    [<c0072fa1>] chrdev_open+0x101/0x118
    [<c006ffad>] __dentry_open+0x105/0x1cc
    [<c00700fd>] nameidata_to_filp+0x2d/0x38
    [<c00788cd>] do_filp_open+0x2c1/0x54c
    [<c006fdff>] do_sys_open+0x3b/0xb4

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:56 -07:00
Florian Tobias Schandinat
521594442c viafb: fix rmmod bug
This fixes a bug caused by changing pointers (viafb_mode, viafb_mode1)
assigned by module_param.  It reduces driver complexity by not needlessly
changing these vars as they are only read once and removing now
superfluous code.

On unpatched kernels loading viafb with viafb_mode or viafb_mode1 option
used and afterwards unloading it results in:

kernel BUG at mm/slub.c:2926!
invalid opcode: 0000 [#1] PREEMPT
last sysfs file: /sys/devices/virtual/block/loop0/removable
Modules linked in: snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm rtl8187 snd_timer eeprom_93cx6 mmc_block snd soundcore
via_sdmmc fb snd_page_alloc i2c_algo_bit i2c_viapro ehci_hcd uhci_hcd
cfbcopyarea mmc_core cfbimgblt cfbfillrect video output [last unloaded:
viafb]

  Pid: 3355, comm: rmmod Not tainted (2.6.31-rc1 #0)
  EIP: 0060:[<c106a759>] EFLAGS: 00010246 CPU: 0
  EIP is at kfree+0x80/0xda
  EAX: c17c2da0 EBX: dc7edbdc ECX: 0000010f EDX: 00000000
  ESI: c102c700 EDI: dc7ed8fa EBP: d703ff2c ESP: d703ff20
   DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
  Process rmmod (pid: 3355, ti=d703e000 task=db1412c0 task.ti=d703e000)
  Stack:
   dc7edbdc 00000014 00000016 d703ff40 c102c700 dc7f45d4 dc7f45d4 00000880
   d703ff4c c103e571 00000000 d703ffac c103e751 66616976 da140062 db89ba80
   00000328 d702edf8 db89ba80 d703ff9c c105d0f0 00000200 da14f898 00000014
  Call Trace:
   [<c102c700>] ? destroy_params+0x1e/0x2b
   [<c103e571>] ? free_module+0xa2/0xd7
   [<c103e751>] ? sys_delete_module+0x1ab/0x1da
   [<c105d0f0>] ? do_munmap+0x20a/0x225
   [<c10029b4>] ? sysenter_do_call+0x12/0x26
  Code: 10 76 7a 8d 87 00 00 00 40 c1 e8 0c c1 e0 05 03 05 1c 87 41 c1 66 83 38 00 79 03 8b 40 0c 8b 10 84 d2 78 12 66 f7 c2 00 c0 75 04 <0f> 0b eb fe e8 6f 5a fe ff eb 47 8b 55 04 8b 58 0c 9c 5e fa 3b
  EIP: [<c106a759>] kfree+0x80/0xda SS:ESP 0068:d703ff20

This is caused by the current code changing the pointers assigned by
module_param.  During unload it tries to free the memory the pointers
point at which is now part of an internal structure.

The patch simply avoids changing the pointers.  This is okay as they are
read only once during the initialization process.

Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Scott Fang <ScottFang@viatech.com.cn>
Cc: Joseph Chan <JosephChan@via.com.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:56 -07:00
KAMEZAWA Hiroyuki
4bfc44958e mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware
At first, init_task's mems_allowed is initialized as this.
 init_task->mems_allowed == node_state[N_POSSIBLE]

And cpuset's top_cpuset mask is initialized as this
 top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]

Before 2.6.29:
policy's mems_allowed is initialized as this.

  1. update tasks->mems_allowed by its cpuset->mems_allowed.
  2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)

Updating task's mems_allowed in reference to top_cpuset's one.
cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.

In 2.6.30: After commit 58568d2a82
("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
is initialized as this.

  1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)

Here, if task is in top_cpuset, task->mems_allowed is not updated from
init's one.  Assume user excutes command as #numactrl --interleave=all
,....

  policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)

Then, policy's mems_allowd can includes a possible node, which has no pgdat.

MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
directly.

  NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL

Then, what's we need is making policy->mems_allowed be aware of
N_HIGH_MEMORY.  This patch does that.  But to do so, extra nodemask will
be on statck.  Because I know cpumask has a new interface of
CPUMASK_ALLOC(), I added it to node.

This patch stands on old behavior.  But I feel this fix itself is just a
Band-Aid.  But to do fundametal fix, we have to take care of memory
hotplug and it takes time.  (task->mems_allowd should be N_HIGH_MEMORY, I
think.)

mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
should be includes only online nodes.

In old behavior, this is guaranteed by frequent reference to cpuset's
code.  Now, most of them are removed and mempolicy has to check it by
itself.

To do check, a few nodemask_t will be used for calculating nodemask.  But,
size of nodemask_t can be big and it's not good to allocate them on stack.

Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
NODEMASK_ALLOC/FREE shoudl be there.

[akpm@linux-foundation.org: cleanups & tweaks]
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Paul Menage <menage@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:55 -07:00
Stefani Seibold
93274e4d4e fbcon: fix rotate upside down crash
Fix the rotate_ud() function not to crash in case of a font which has not
a width of multiple by 8: The inner loop of the font pixel copy should not
access a bit outside the font memory area.  Subtract the shift offset from
the font width will prevent this.

Signed-off-by: Stefani Seibold <stefani@seibold.net>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:55 -07:00
Xiao Guangrong
69dd647f96 generic-ipi: fix hotplug_cfd()
Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG

When hot-unpluging a cpu, it will leak memory allocated at cpu hotplug,
but only if CPUMASK_OFFSTACK=y, which is default to n.

The bug was introduced by 8969a5ede0
("generic-ipi: remove kmalloc()").

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:55 -07:00
Stoyan Gaydarov
2020002a87 drivers/w1/masters/omap_hdq.c: fix missing mutex unlock
This was found using a semantic patch, more info can be found at:
http://www.emn.fr/x-info/coccinelle/

Signed-off-by: Stoyan Gaydarov <sgayda2@uiuc.edu>
Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-07 10:39:55 -07:00
Christoph Hellwig
b36ec0428a xfs: fix freeing of inodes not yet added to the inode cache
When freeing an inode that lost race getting added to the inode cache we
must not call into ->destroy_inode, because that would delete the inode
that won the race from the inode cache radix tree.

This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim
and uses that plus __destroy_inode to make sure we really only free
the memory allocted for the inode that lost the race, and not mess with
the inode cache state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reported-by: Alex Samad <alex@samad.com.au>
Reported-by: Andrew Randrianasulu <randrik@mail.ru>
Reported-by: Stephane <sharnois@max-t.com>
Reported-by: Tommy <tommy@news-service.com>
Reported-by: Miah Gregory <mace@darksilence.net>
Reported-by: Gabriel Barazer <gabriel@oxeva.fr>
Reported-by: Leandro Lucarella <llucax@gmail.com>
Reported-by: Daniel Burr <dburr@fami.com.au>
Reported-by: Nickolay <newmail@spaces.ru>
Reported-by: Michael Guntsche <mike@it-loops.com>
Reported-by: Dan Carley <dan.carley+linuxkern-bugs@gmail.com>
Reported-by: Michael Ole Olsen <gnu@gmx.net>
Reported-by: Michael Weissenbacher <mw@dermichi.com>
Reported-by: Martin Spott <Martin.Spott@mgras.net>
Reported-by: Christian Kujau <lists@nerdbynature.de>
Tested-by: Michael Guntsche <mike@it-loops.com>
Tested-by: Dan Carley <dan.carley+linuxkern-bugs@gmail.com>
Tested-by: Christian Kujau <lists@nerdbynature.de>
2009-08-07 14:38:34 -03:00
Christoph Hellwig
2e00c97e2c vfs: add __destroy_inode
When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.

This patch provides the __destroy_inode helper needed to fix this,
the actual fix will be in th next patch.  As XFS was the only reason
destroy_inode was exported we shift the export to the new __destroy_inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2009-08-07 14:38:29 -03:00
Christoph Hellwig
54e346215e vfs: fix inode_init_always calling convention
Currently inode_init_always calls into ->destroy_inode if the additional
initialization fails.  That's not only counter-intuitive because
inode_init_always did not allocate the inode structure, but in case of
XFS it's actively harmful as ->destroy_inode might delete the inode from
a radix-tree that has never been added.  This in turn might end up
deleting the inode for the same inum that has been instanciated by
another process and cause lots of cause subtile problems.

Also in the case of re-initializing a reclaimable inode in XFS it would
free an inode we still want to keep alive.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
2009-08-07 14:38:25 -03:00
Bob Dunlop
dd1f57ecaf libertas: correct packing of rxpd structure
Older Gcc compilers (3.4.5 tested) need additional hints in order to get
the packing of the rxpd structure (which contains a 16 bit union)
correct on the ARM processor.

struct txpd does not need these hints since it contains a 32 bit union
that packs naturally.

Signed-off-by: R.J.Dunlop <rdunlop@guralp.com>
Acked-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-07 13:09:33 -04:00
Lennert Buytenhek
60aa569f92 mwl8k: prevent module unload hang
We need to unregister our ieee80211_hw before resetting the chip, as
the former causes firmware commands to be issued which will time out
once the chip has been reset.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-07 13:09:33 -04:00
Lennert Buytenhek
a94cc97e14 mwl8k: prevent crash in ->configure_filter() if no interface was added
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-07 13:09:32 -04:00
Lennert Buytenhek
37055bd455 mwl8k: call pci_unmap_single() before accessing command structure again
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-07 13:09:30 -04:00
Lennert Buytenhek
4ff6432ea6 mwl8k: add various missing GET_HW_SPEC endian conversions
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-07 13:09:29 -04:00
Lennert Buytenhek
d25f9f1357 mwl8k: fix NULL pointer dereference on receive out-of-memory
When we go into out-of-memory and fail to allocate skbuffs to
refill the receive ring with, rxq_process can end up running into
a receive ring entry that is marked as host-owned but doesn't have
an associated skbuff.  If this happens, we must break out of the
rx processing loop instead of trying to process the descriptor.

Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Acked-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-07 13:09:29 -04:00
Zhu Yi
7dd2459d8f ipw2x00: Write outside array bounds
> channel_index loops up to IPW_SCAN_CHANNELS, but is used after being
> incremented. This might be able to access 1 past the end of the array

Reported-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-08-07 13:09:28 -04:00
Eric Dumazet
bd3f02212d ring-buffer: Fix memleak in ring_buffer_free()
I noticed oprofile memleaked in linux-2.6 current tree,
and tracked this ring-buffer leak.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4A7C06B9.2090302@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2009-08-07 12:46:39 -04:00
Dave Airlie
17332925d7 drm/radeon/kms: setup MC/VRAM the same way for suspend/resume
we should align the GTT after VRAM no matter what, as we can
come back from resume and put in a different place and bad things happen.

Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-08-07 20:33:11 +10:00
Li Zefan
0e692a94e3 lockdep: Fix typos in documentation
s/head/held

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4A7BD37E.9060806@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-07 12:03:46 +02:00
Li Zefan
9795447f71 lockdep: Fix file mode of lock_stat
/proc/lock_stat is writable.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <4A7BE7B6.10904@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-07 11:58:38 +02:00
Roel Kluin
53cb780adb [S390] KVM: Read buffer overflow
Check whether index is within bounds before testing the element.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-08-07 10:40:40 +02:00
Hendrik Brueckner
677c1dd706 [S390] kernel: Storing machine flags early in lowcore
Currently, the machine_flags are stored late in the startup
initialization which results in failing machine type checks
(e.g. for MACHINE_IS_VM).
To allow these checks, store the machine flags in the lowcore
when the machine type has been detected.

Moving the machine_flags to the lowcore has been introduced with
git commit 25097bf153

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-08-07 10:40:39 +02:00
Steven Rostedt
7dbdee2e9a tracing: Fix recordmcount.pl to handle sections with only weak functions
Roland Dreier found that a section that contained only a weak
function in one of the staging drivers and this caused
recordmcount.pl to spit out a warning and fail.

Although it is strange that a driver would have a weak function, and
this function only be used in one place, it should not be something
to make recordmcount.pl fail.

This patch fixes the issue in a simple manner: if only weak
functions exist in a section, then that section will not be
recorded.

Reported-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-07 08:50:29 +02:00
Krishna Kumar
bbd8a0d3a3 net: Avoid enqueuing skb for default qdiscs
dev_queue_xmit enqueue's a skb and calls qdisc_run which
dequeue's the skb and xmits it. In most cases, the skb that
is enqueue'd is the same one that is dequeue'd (unless the
queue gets stopped or multiple cpu's write to the same queue
and ends in a race with qdisc_run). For default qdiscs, we
can remove the redundant enqueue/dequeue and simply xmit the
skb since the default qdisc is work-conserving.

The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the
default fast queue. The controversial part of the patch is
incrementing qlen when a skb is requeued - this is to avoid
checks like the second line below:

+  } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
>>         !q->gso_skb &&
+          !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {

Results of a 2 hour testing for multiple netperf sessions (1,
2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are
aggregate Mb/s across iterations tested with this version on
System-X boxes with Chelsio 10gbps cards:

----------------------------------
Size |  ORG BW          NEW BW   |
----------------------------------
128K |  156964          159381   |
256K |  158650          162042   |
----------------------------------

Changes from ver1:

1. Move sch_direct_xmit declaration from sch_generic.h to
   pkt_sched.h
2. Update qdisc basic statistics for direct xmit path.
3. Set qlen to zero in qdisc_reset.
4. Changed some function names to more meaningful ones.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 20:10:18 -07:00
Yevgeny Petrilin
9f519f68cf mlx4_en: Not using Shared Receive Queues
We use 1:1 mapping between QPs and SRQs on receive side,
so additional indirection level not required. Allocated the receive
buffers for the RSS QPs.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 19:28:18 -07:00
Yevgeny Petrilin
b6b912e080 mlx4_en: Using real number of rings as RSS map size
There is no point in using more QPs then actual number of receive rings.
If the RSS function for two streams gives the same result modulo number
of rings, they will arrive to the same RX ring anyway.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 19:27:51 -07:00
Yevgeny Petrilin
a35ee541a6 mlx4_en: Adaptive moderation policy change
If the net device is identified as "sender" (number of sent packets
is higher then the number of received packets and the incoming packets are
small), set the moderation time to its low limit.
We do it because the incoming packets are acks, and we don't want to delay them

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 19:27:28 -07:00
Daniel Mack
6cb8782362 net: smsc911x: switch to new dev_pm_ops
Hibernation is unsupported for now, which meets the actual
implementation in the driver. For free/thaw, the chip's D2 state should
be entered.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Acked-by: <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:25:31 -07:00
David S. Miller
0d502d8267 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lowpan/lowpan 2009-08-06 13:18:22 -07:00
Atsushi Nemoto
a48ec346fc tc35815: Use 0 RxFragSize.MinFrag value for non-packing mode
The datasheet say "When not enabling packing, the MinFrag value must
remain at 0".  Do not set value to RxFragSize register if
TC35815_USE_PACKEDBUFFER disabled.

This is not a bugfix.  No real problem reported on this.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:14:25 -07:00
Atsushi Nemoto
7bb82e834c tc35815: Fix rx_missed_errors count
The Miss_Cnt register is cleared by reading.  Accumulate its value to
rx_missed_errors count.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:14:24 -07:00
Atsushi Nemoto
c60a5cf7af tc35815: Increase timeout for mdio
The current timeout value is too short for very high-load condition
which jiffies might jump up in busy-loop.
Also add minimum delay before checking completion of MDIO.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:14:23 -07:00
Atsushi Nemoto
db30f5ef6e tc35815: Improve BLEx / FDAEx handling
Clear Int_BLEx / Int_FDAEx after (not before) processing Rx interrupt.
This will reduce number of unnecessary interrupts.
Also print rx error messages only if netif_msg_rx_err() enabled.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:14:22 -07:00
Atsushi Nemoto
297713deca tc35815: Disable StripCRC
It seems Rx_StripCRC cause trouble on recovering from the BLEx (Buffer
List Exhaust) or FDAEx (Free Descriptor Area Exhaust) condition.
Do not use it.

Also bump version number up.

Reported-by: Ralf Roesch <ralf.roesch@rw-gmbh.de>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:14:20 -07:00
Eric Dumazet
09384dfc76 irda: Fix irda_getname() leak
irda_getname() can leak kernel memory to user.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:08:46 -07:00
Eric Dumazet
3d392475c8 appletalk: fix atalk_getname() leak
atalk_getname() can leak 8 bytes of kernel memory to user

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:08:45 -07:00
Eric Dumazet
f6b97b2951 netrom: Fix nr_getname() leak
nr_getname() can leak kernel memory to user.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:08:43 -07:00
Eric Dumazet
80922bbb12 econet: Fix econet_getname() leak
econet_getname() can leak kernel memory to user.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:08:40 -07:00
Eric Dumazet
17ac2e9c58 rose: Fix rose_getname() leak
rose_getname() can leak kernel memory to user.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-06 13:08:38 -07:00
David S. Miller
bfe34ebbaa Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2009-08-06 12:57:18 -07:00
Peter Zijlstra
1054598cab perf_counter: Fix double list iteration in per task precise stats
Brice Goglin reported this crash with per task precise stats:

> I finally managed to test the threaded perfcounter statistics (thanks a
> lot for implementing it). I am running 2.6.31-rc5 (with the AMD
> magny-cours patches but I don't think they matter here). I am trying to
> measure local/remote memory accesses per thread during the well-known
> stream benchmark. It's compiled with OpenMP using 16 threads on a
> quad-socket quad-core barcelona machine.
>
> Command line is:
>  /mnt/scratch/bgoglin/cpunode/linux-2.6.31/tools/perf/perf record -f -s
> -e r1000001e0 -e r1000002e0 -e r1000004e0 -e r1000008e0 ./stream
>
> It seems to work fine with a single -e <counter> on the command line
> while it crashes when there are at least 2 of them.
> It seems to work fine without -s as well.

A silly copy-paste resulted in a messed up iteration which would
cause the OOPS.

Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Brice Goglin <Brice.Goglin@inria.fr>
LKML-Reference: <1249574786.32113.550.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-06 20:25:18 +02:00
Peter Zijlstra
9424edc2da perf: Auto-detect libelf
Adds autodetection for libelf as well, and simplifies the
libbfd code. Furthermore, fail make with an error when libelf
is not found and warn about the lack of libbfd.

Also provide an option to build a 32bit version even though you
might be running a 64bit kernel.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-06 20:25:13 +02:00
Arnaldo Carvalho de Melo
4d1e00a8af perf symbol: Fix symbol parsing in certain cases: use the build-id as a symlink
In some cases distros have binaries and debuginfo in weird places:

[root@doppio tuna]# ls -la /usr/lib64/{xulrunner-1.9.1/xulrunner-stub,firefox-3.5.2/firefox}
-rwxr-xr-x 1 root root 90024 2009-08-03 19:45 /usr/lib64/firefox-3.5.2/firefox
-rwxr-xr-x 1 root root 90024 2009-08-03 18:23 /usr/lib64/xulrunner-1.9.1/xulrunner-stub
[root@doppio tuna]# sha1sum /usr/lib64/{xulrunner-1.9.1/xulrunner-stub,firefox-3.5.2/firefox}
19a858077d263d5de22c9c5da250d3e4396ae739  /usr/lib64/xulrunner-1.9.1/xulrunner-stub
19a858077d263d5de22c9c5da250d3e4396ae739  /usr/lib64/firefox-3.5.2/firefox
[root@doppio tuna]# rpm -qf /usr/lib64/{xulrunner-1.9.1/xulrunner-stub,firefox-3.5.2/firefox}
xulrunner-1.9.1.2-1.fc11.x86_64
firefox-3.5.2-2.fc11.x86_64
[root@doppio tuna]# ls -la /usr/lib/debug/{usr/lib64/xulrunner-1.9.1/xulrunner-stub,usr/lib64/firefox-3.5.2/firefox}.debug
ls: cannot access /usr/lib/debug/usr/lib64/firefox-3.5.2/firefox.debug: No such file or directory
-rwxr-xr-x 1 root root 403608 2009-08-03 18:22 /usr/lib/debug/usr/lib64/xulrunner-1.9.1/xulrunner-stub.debug

Seemingly we don't have a .symtab when we actually can find it
if we use the .note.gnu.build-id ELF section put in place by
some distros. Use it and find the symbols we need.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-06 20:24:37 +02:00
Robert Richter
469535a598 ring-buffer: Fix advance of reader in rb_buffer_peek()
When calling rb_buffer_peek() from ring_buffer_consume() and a
padding event is returned, the function rb_advance_reader() is
called twice. This may lead to missing samples or under high
workloads to the warning below. This patch fixes this. If a padding
event is returned by rb_buffer_peek() it will be consumed by the
calling function now.

Also, I simplified some code in ring_buffer_consume().

------------[ cut here ]------------
WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
Hardware name: Anaheim
Modules linked in:
Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
Call Trace:
[<ffffffff8106776f>] ? rb_advance_reader+0x2e/0xc5
[<ffffffff81039ffe>] warn_slowpath_common+0x77/0x8f
[<ffffffff8103a025>] warn_slowpath_null+0xf/0x11
[<ffffffff8106776f>] rb_advance_reader+0x2e/0xc5
[<ffffffff81068bda>] ring_buffer_consume+0xa0/0xd2
[<ffffffff81326933>] op_cpu_buffer_read_entry+0x21/0x9e
[<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
[<ffffffff8132749b>] sync_buffer+0xa5/0x401
[<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
[<ffffffff81326c1b>] ? wq_sync_buffer+0x0/0x78
[<ffffffff81326c76>] wq_sync_buffer+0x5b/0x78
[<ffffffff8104aa30>] worker_thread+0x113/0x1ac
[<ffffffff8104dd95>] ? autoremove_wake_function+0x0/0x38
[<ffffffff8104a91d>] ? worker_thread+0x0/0x1ac
[<ffffffff8104dc9a>] kthread+0x88/0x92
[<ffffffff8100bdba>] child_rip+0xa/0x20
[<ffffffff8104dc12>] ? kthread+0x0/0x92
[<ffffffff8100bdb0>] ? child_rip+0x0/0x20
---[ end trace f561c0a58fcc89bd ]---

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@kernel.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-06 14:20:25 +02:00
Benjamin Herrenschmidt
e0d82a0a4e perf_counter/powerpc: Check oprofile_cpu_type for NULL before using it
If the current CPU doesn't support performance counters,
cur_cpu_spec->oprofile_cpu_type can be NULL. The current
perf_counter modules don't test for that case and would thus
crash at boot time.

Bug reported by David Woodhouse.

Reported-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul Mackerras <paulus@samba.org>
LKML-Reference: <19066.48028.446975.501454@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-06 13:55:09 +02:00
Sheng Yang
c5b1525533 intel-iommu: Fix enabling snooping feature by mistake
Two defects work together result in KVM device passthrough randomly can't
work:
1. iommu_snooping is not initialized to zero when vm_iommu_init() called.
So it is possible to get a random value.
2. One line added by commit 2c2e2c38("IOMMU Identity Mapping Support")
change the code path, let it bypass domain_update_iommu_cap(), as well as
missing the increment of domain iommu reference count.

The latter is also likely to cause a leak of domains on repeated VMM 
assignment and deassignment.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-08-06 11:35:50 +01:00
Marcelo Tosatti
53a27b39ff KVM: MMU: limit rmap chain length
Otherwise the host can spend too long traversing an rmap chain, which
happens under a spinlock.

Cc: stable@kernel.org
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-06 12:06:54 +03:00