This make "cat /proc/${PID}/pagemap" more efficient for
32-bit tasks.
Based upon a report by Mariusz Kozlowski.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
x86: fix performance drop for glx
x86: fix trim mtrr not to setup_memory two times
x86: GEODE: add missing module.h include
x86, cpufreq: fix Speedfreq-SMI call that clobbers ECX
x86: fix memoryless node oops during boot
x86: add dmi quirk for io_delay
x86: convert mtrr/generic.c to kernel-doc
x86: Documentation/i386/IO-APIC.txt: fix description
Running the counters testcase from libhugetlbfs results in on 2.6.25-rc5
and 2.6.25-rc5-mm1:
BUG: soft lockup - CPU#3 stuck for 61s! [counters:10531]
NIP: c0000000000d1f3c LR: c0000000000d1f2c CTR: c0000000001b5088
REGS: c000005db12cb360 TRAP: 0901 Not tainted (2.6.25-rc5-autokern1)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 48008448 XER: 20000000
TASK = c000005dbf3d6000[10531] 'counters' THREAD: c000005db12c8000 CPU: 3
GPR00: 0000000000000004 c000005db12cb5e0 c000000000879228 0000000000000004
GPR04: 0000000000000010 0000000000000000 0000000000200200 0000000000100100
GPR08: c0000000008aba10 000000000000ffff 0000000000000004 0000000000000000
GPR12: 0000000028000442 c000000000770080
NIP [c0000000000d1f3c] .return_unused_surplus_pages+0x84/0x18c
LR [c0000000000d1f2c] .return_unused_surplus_pages+0x74/0x18c
Call Trace:
[c000005db12cb5e0] [c000005db12cb670] 0xc000005db12cb670 (unreliable)
[c000005db12cb670] [c0000000000d24c4] .hugetlb_acct_memory+0x2e0/0x354
[c000005db12cb740] [c0000000001b5048] .truncate_hugepages+0x1d4/0x214
[c000005db12cb890] [c0000000001b50a4] .hugetlbfs_delete_inode+0x1c/0x3c
[c000005db12cb920] [c000000000103fd8] .generic_delete_inode+0xf8/0x1c0
[c000005db12cb9b0] [c0000000001b5100] .hugetlbfs_drop_inode+0x3c/0x24c
[c000005db12cba50] [c00000000010287c] .iput+0xdc/0xf8
[c000005db12cbad0] [c0000000000fee54] .dentry_iput+0x12c/0x194
[c000005db12cbb60] [c0000000000ff050] .d_kill+0x6c/0xa4
[c000005db12cbbf0] [c0000000000ffb74] .dput+0x18c/0x1b0
[c000005db12cbc70] [c0000000000e9e98] .__fput+0x1a4/0x1e8
[c000005db12cbd10] [c0000000000e61ec] .filp_close+0xb8/0xe0
[c000005db12cbda0] [c0000000000e62d0] .sys_close+0xbc/0x134
[c000005db12cbe30] [c00000000000872c] syscall_exit+0x0/0x40
Instruction dump:
ebbe8038 38800010 e8bf0002 3bbd0008 7fa3eb78 38a50001 7ca507b4 4818df25
60000000 38800010 38a00000 7c601b78 <7fa3eb78> 2f800010 409d0008 38000010
This was tracked down to a potential livelock in
return_unused_surplus_hugepages(). In the case where we have surplus
pages on some node, but no free pages on the same node, we may never
break out of the loop. To avoid this livelock, terminate the search if
we iterate a number of times equal to the number of online nodes without
freeing a page.
Thanks to Andy Whitcroft and Adam Litke for helping with debugging and
the patch.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we show the surplus hugetlb pool state in /proc/meminfo, but
not in the per-node meminfo files, even though we track the information
on a per-node basis. Printing it there can help track down dynamic pool
bugs including the one in the follow-on patch.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix the 3D performance drop reported at:
http://bugzilla.kernel.org/show_bug.cgi?id=10328
fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with
WC attribute. Recent changes in page attribute code made both
ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This
breaks the graphics performance, as the effective memory type is UC instead
of expected WC.
The correct way to fix this is to add ioremap_wc() (which uses UC- in the
absence of PAT kernel support and WC with PAT) and change all the
fb drivers to use this new ioremap_wc() API.
We can take this correct and longer route for post 2.6.25. For now,
revert back to the UC- behavior for ioremap/ioremap_nocache.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
we could call find_max_pfn() directly instead of setup_memory() to get
max_pfn needed for mtrr trimming.
otherwise setup_memory() is called two times... that is duplicated...
[ mingo@elte.hu: both Thomas and me simulated a double call to
setup_bootmem_allocator() and can confirm that it is a real bug
which can hang in certain configs. It's not been reported yet but
that is probably due to the relatively scarce nature of
MTRR-trimming systems. ]
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Wed, 26 Mar 2008 11:56:22 -0600
Jordan Crouse <jordan.crouse@amd.com> wrote:
> On 26/03/08 14:31 +0100, Stefan Pfetzing wrote:
> > Hello Jordan,
> >
> > I just tried to build your geodwdt driver for the geode watchdog. Therefore
> > I pulled your repository from http://git.infradead.org/geode.git (or more,
> > the git url).
> >
> > I tried to build the geodewdt driver as a module - which didn't work, and
> > it failed with the same problem as earlier mentioned on lkmk [1]. I also
> > checked the fix [2], but that seems to be already in your (or linus) tree -
> > and so I'm unsure what the problem is.
> >
> > [1] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884074
> > [2] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884174
> >
> > Building directly into the kernel seems to work.
> >
> > Maybe you have some idea?
>
> Hmm - that is strange. Exporting the symbols should work. I recommend
> starting over with a clean tree.
>
> CCing Andres - any thoughts?
>
> Jordan
>
Er, yeah. The patch below should fix it. This should probably go into
2.6.25.
Oops, EXPORT_SYMBOL_GPL wasn't being declared due to this header
being missing.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
I have found that using SMI to change the cpu's frequency on my DELL
Latitude L400 clobbers the ECX register in speedstep_set_state, causing
unneccessary retries because the "state" variable has changed silently (GCC
assumes it is still present in ECX).
play safe and avoid gcc caching any register across IO port accesses
that trigger SMIs.
Signed-off by: <Stephan.Diestelhorst@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Convert function comment blocks to kernel-doc notation.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The description of the interrupt routing doesn't match the (nice) diagram.
Signed-off-by: Nick Andrew <nick@nick-andrew.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'slab-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/christoph/vm:
slab: fix cache_cache bootstrap in kmem_cache_init()
count_partial() is not used if !SLUB_DEBUG and !CONFIG_SLABINFO
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt:
NOHZ: reevaluate idle sleep length after add_timer_on()
clocksource: revert: use init_timer_deferrable for clocksource_watchdog
The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly
when the last chunk in the read-list spanned multiple pages. This
resulted in a kernel panic when the wrong context was used to
build the RPC iovec page list.
RDMA_READ is used to fetch RPC data from the client for
NFS_WRITE requests. A scatter-gather is used to map the
advertised client side buffer to the server-side iovec and
associated page list.
WR contexts are used to convey which scatter-gather entries are
handled by each WR. When the write data is large, a single RPC may
require multiple RDMA_READ requests so the contexts for a single RPC
are chained together in a linked list. The last context in this list
is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes,
the CQ handler code can enqueue the RPC for processing.
The code in rdma_read_xdr was setting this bit on the last two
contexts on this list when the last read-list chunk spanned multiple
pages. This caused the svc_rdma_recvfrom logic to incorrectly build
the RPC and caused the kernel to crash because the second-to-last
context doesn't contain the iovec page list.
Modified the condition that sets this bit so that it correctly detects
the last context for the RPC.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Tested-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 8fa5913d54, which
caused various interesting problems for people, including wrong resource
allocations. See for example bugzilla entry "2.6.25-rc2: ohci1394
problem (MMIO broken)" at
http://bugzilla.kernel.org/show_bug.cgi?id=10080
And Gary Hade says:
"The same change had also exposed an issue reported by Paul Martin that
has been causing an Oops while hotplugging ThinkPads to a ThinkPad
Dock II. See
http://lkml.org/lkml/2008/2/19/405http://bugzilla.kernel.org/show_bug.cgi?id=9961
I have a fix for the ThinkPad docking Oops but if the issue being
discussed here is caused by the transparent bridge sizing removal
change I totally agree that it should be reverted."
The transparent bridge sizing removal change was motivated by
insufficient PCI memory resource for a transparent bridge window that
was being created as a result of expansion ROM(s) being included in
the transparent bridge sizing calculations.
A later "PCI: Remove default PCI expansion ROM memory allocation"
change ( re: http://lkml.org/lkml/2007/12/11/361 ) removes the
expansion ROM(s) from the transparent bridge sizing calculations which
actually resolves the original issue in a different manner. So, even
if the "PCI: remove transparent bridge sizing" is not problematic it
is no longer needed anyway."
Identified-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Tested-by: Thomas Meyer <thomas@m3y3r.de>
Acked-by: Gary Hade <garyhade@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 556a169dab ("slab: fix bootstrap on
memoryless node") introduced bootstrap-time cache_cache list3s for all nodes
but forgot that initkmem_list3 needs to be accessed by [somevalue + node]. This
patch fixes list_add() corruption in mm/slab.c seen on the ES7000.
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Avoid warnings about unused functions if neither SLUB_DEBUG nor CONFIG_SLABINFO
is defined. This patch will be reversed when slab defrag is merged since slab
defrag requires count_partial() to determine the fragmentation status of
slab caches.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
relay doesn't reference the pages it adds, however we need a non-NULL
hook or splice_to_pipe() can oops.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
I found that relay files can be read by pread(2). I fix it,
for relay files are not capable of seeking.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Sparse still doesn't like the funny cast we make from a scalar to a
"union semun" (which is correct by the C language and in particular
works with the sparc64 calling conventions, but sparse doesn't grok
that yet).
Signed-off-by: David S. Miller <davem@davemloft.net>
add_timer_on() can add a timer on a CPU which is currently in a long
idle sleep, but the timer wheel is not reevaluated by the nohz code on
that CPU. So a timer can be delayed for quite a long time. This
triggered a false positive in the clocksource watchdog code.
To avoid this we need to wake up the idle CPU and enforce the
reevaluation of the timer wheel for the next timer event.
Add a function, which checks a given CPU for idle state, marks the
idle task with NEED_RESCHED and sends a reschedule IPI to notify the
other CPU of the change in the timer wheel.
Call this function from add_timer_on().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
--
include/linux/sched.h | 6 ++++++
kernel/sched.c | 43 +++++++++++++++++++++++++++++++++++++++++++
kernel/timer.c | 10 +++++++++-
3 files changed, 58 insertions(+), 1 deletion(-)
Add 'UL' markers to DCU_* macros.
Declare C functions called from assembler in entry.h
Declare C functions called from within the sparc64 arch
code in include/asm-sparc64/*.h headers as appropriate.
Remove unused routines in traps.c
Signed-off-by: David S. Miller <davem@davemloft.net>
We create a local header file entry.h, under arch/sparc64/kernel/,
that we can use to declare routines either defined in assembler
or only invoked from assembler. As well as other data objects
which are private to the inner sparc64 kernel arch code.
Signed-off-by: David S. Miller <davem@davemloft.net>
Move them further from the main kernel image area
to facilitate larger kernel sizes.
Adjust comments to match.
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the builtin RTL8139 NIC on the Medion MD9580-F laptop. The
BIOS reports the interrupt routing incorrectly. I recently added a
quirk to work around this, and this patch fixes a typo in the quirk.
We pad every ACPI pathname component to four characters, so ".ISA." will
never match anything. We need ".ISA_." instead.
Thank you Johann-Nikolaus Andreae <johann-nikolaus.andreae@nacs.de>
for patiently testing this patch.
See http://bugzilla.kernel.org/show_bug.cgi?id=4773
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert
commit 1077f5a917
Author: Parag Warudkar <parag.warudkar@gmail.com>
Date: Wed Jan 30 13:30:01 2008 +0100
clocksource.c: use init_timer_deferrable for clocksource_watchdog
clocksource_watchdog can use a deferrable timer - reduces wakeups from
idle per second.
The watchdog timer needs to run with the specified interval. Otherwise
it will miss the possible wrap of the watchdog clocksource.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
i2c: Fix docbook problem
ASoC/TLV320AIC3X: Stop I2C driver ID abuse
i2c-omap: Fix unhandled fault
i2c-bfin-twi: Disable BF54x support for now
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
[PATCH] get stack footprint of pathname resolution back to relative sanity
[PATCH] double iput() on failure exit in hugetlb
[PATCH] double dput() on failure exit in tiny-shmem
[PATCH] fix up new filp allocators
[PATCH] check for null vfsmount in dentry_open()
[PATCH] reiserfs: eliminate private use of struct file in xattr
[PATCH] sanitize hppfs
hppfs pass vfsmount to dentry_open()
[PATCH] restore export of do_kern_mount()
While backporting 72dc67a696, a gfn_to_page()
call was duplicated instead of moved (due to an unrelated patch not being
present in mainline). This caused a page reference leak, resulting in a
fairly massive memory leak.
Fix by removing the extraneous gfn_to_page() call.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Do not assume that a shadow mapping will always point to the same host
frame number. Fixes crash with madvise(MADV_DONTNEED).
[avi: move after first printk(), add another printk()]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The vmx hardware state restore restores the tss selector and base address, but
not its length. Usually, this does not matter since most of the tss contents
is within the default length of 0x67. However, if a process is using ioperm()
to grant itself I/O port permissions, an additional bitmap within the tss,
but outside the default length is consulted. The effect is that the process
will receive a SIGSEGV instead of transparently accessing the port.
Fix by restoring the tss length. Note that i386 had this working already.
Closes bugzilla 10246.
Signed-off-by: Avi Kivity <avi@qumranet.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: Fix cut-and-paste error in rtl8150.c
USB: ehci: stop vt6212 bus hogging
USB: sierra: add another device id
USB: sierra: dma fixes
USB: add support for Motorola ROKR Z6 cellphone in mass storage mode
USB: isd200: fix memory leak in isd200_get_inquiry_data
USB: pl2303: another product ID
USB: new quirk flag to avoid Set-Interface
USB: fix gadgetfs class request delegation