The xfrm_timer calls __xfrm_state_delete, which drops the final reference
manually without triggering destruction of the state. Change it to use
xfrm_state_put to add the state to the gc list when we're dropping the
last reference. The timer function may still continue to use the state
safely since the final destruction does a del_timer_sync().
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (39 commits)
ACPI: EC: Workaround for optimized controllers (version 3)
ACPI: EC: use printk_ratelimit(), add some DEBUG mode messages
Revert "ACPI: EC: Workaround for optimized controllers"
ACPI: fix two IRQ8 issues in IOAPIC mode
ACPI: Add missing spaces to printk format
cpuidle: fix HP nx6125 regression
cpuidle: add sched_clock_idle_[sleep|wakeup]_event() hooks
cpuidle: fix C3 for no bus-master control case
ACPI: thinkpad-acpi: fix oops when a module parameter has no value
Revert "Fix very high interrupt rate for IRQ8 (rtc) unless pnpacpi=off"
ACPI: EC: Don't init EC early if it has no _INI
Revert "acpi: make ACPI_PROCFS default to y"
Revert "ACPI: add documentation for deprecated /proc/acpi/battery in ACPI_PROCFS"
ACPI: Split out control for /proc/acpi entries from battery, ac, and sbs.
ACPI: Video: Increase buffer size for writes to brightness proc file.
ACPI: EC: Workaround for optimized controllers
ACPI: SBS: Fix retval warning
ACPI: Enable MSR (FixedHW) support for T-States
ACPI: Get throttling info from BIOS only after evaluating _PDC
ACPI: Use _TSS for throttling control, when present. Add error checks.
...
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] vpe: Add missing "space"
[MIPS] Compliment va_start() with va_end().
[MIPS] IP22: Fix broken eeprom access by using __raw_readl/__raw_writel
[MIPS] IP22: Fix broken EISA interrupt setup by switching to generic i8259
[MIPS] 64-bit Sibyte kernels need DMA32.
[MIPS] Only build r4k clocksource for systems that work ok with it.
[MIPS] Handle R4000/R4400 mfc0 from count register.
[MIPS] Fix possible hang in LL/SC futex loops.
[MIPS] Fix context DSP context / TLS pointer switching bug for new threads.
[MIPS] IP32: More interrupt renumbering fixes.
[MIPS] time: MIPSsim's plat_time_init doesn't need to be irq safe.
[MIPS] time: Fix negated condition in cevt-r4k driver.
[MIPS] Fix pcspeaker build.
if you are lucky (unlucky?) enough to have shared interrupts, the
interrupt handler can be called before the tasklet and lock are ready
for use.
Signed-off-by: chas williams <chas@cmf.nrl.navy.mil>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
For a kernel built with "make ARCH=x86" the following system
information is displayed when running the new kernel
$ uname -m
x86
On some i386 systems (e.g. K7) we even have the following information
$ uname -m
x66
This is weird. The usual information for "uname -m" should be "x86_64"
on 64-bit and "i386" or "i686" on 32-bit.
This patch fixes the issue by setting UTS_MACHINE to "i386" for 32-bit
kernel builds and to "x86_64" for 64-bit kernel builds. I.e., "x86"
won't be used for UTS_MACHINE anymore.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andreas Herrmann <aherrman@arcor.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify calling sequence of nfs_direct_{read,write}_schedule(), and
rename them to reflect their new role.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A zero byte count direct write request should be a successful no-op, not an
error.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow applications to perform asynchronous scatter-gather direct I/O
to NFS files.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add helpers that iterate over multi-segment iovecs. These will
be used to support multi-segment scatter/gather direct I/O in a
later patch.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Every file should include the headers containing the prototypes for its global
functions (in this case nfs_access_cache_shrinker()).
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_wb_page_priority() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
While testing a kernel based upon ecd744eec3
(with wrong boot arguments), I got the following bad page state entry while
NFS was trying to mount it's rootfs:
IP-Config: Complete:
device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255,
host=192.168.1.101, domain=, nis-domain=(none),
bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath=
Looking up port of RPC 100003/2 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get mountd port number from server, using default
mount: server 192.168.1.100 not responding, timed out
Root-NFS: Server returned error -5 while mounting /nfs/rootfs/
VFS: Unable to mount root fs via NFS, trying floppy.
Bad page state in process 'swapper'
page:c02b1260 flags:0x00000400 mapping:00000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
[<c0023e34>] (dump_stack+0x0/0x14) from [<c0062570>] (bad_page+0x70/0xac)
[<c0062500>] (bad_page+0x0/0xac) from [<c0064914>] (free_hot_cold_page+0x80/0x178)
[<c0064894>] (free_hot_cold_page+0x0/0x178) from [<c0064a74>] (free_hot_page+0x14/0x18)
[<c0064a60>] (free_hot_page+0x0/0x18) from [<c0067078>] (put_page+0xf8/0x154)
[<c0066f80>] (put_page+0x0/0x154) from [<c007dbc8>] (kfree+0xc8/0xd0)
[<c007db00>] (kfree+0x0/0xd0) from [<c00cbb54>] (nfs_get_sb+0x230/0x710)
[<c00cb924>] (nfs_get_sb+0x0/0x710) from [<c0084334>] (vfs_kern_mount+0x58/0xac)[<c00842dc>] (vfs_kern_mount+0x0/0xac) from [<c00843c0>] (do_kern_mount+0x38/0xf4)
[<c0084388>] (do_kern_mount+0x0/0xf4) from [<c0099c7c>] (do_mount+0x1e8/0x614)
...
This seems to be caused by use of an uninitialised structure due to NULL
options being passed to nfs_validate_mount_data(). Ensure that the
parsed mount data is always initialised.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
(Trond: added fix for the same bug in nfs4_validate_mount_data()).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Support for binary sysctls is being deprecated in 2.6.24. Since there
are no applications using the NFS/RDMA client's binary sysctls, it
makes sense to remove them. The patch below does this while leaving
the /proc/sys interface unchanged.
Please consider this for 2.6.24.
Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
increase the default minimum granularity some more - this gives us
more performance in aim7 benchmarks.
also correct some comments: we scale with ilog(ncpus) + 1.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Devan Lippman noticed that the RLIMIT_CPU comment in resource.h is
incorrect: the field is in seconds, not msecs. We used msecs in
earlier versions of the patch but that got changed.
Found-by: Devan Lippman <devan.lippman@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Srivatsa Vaddagiri noticed occasionally incorrect CPU usage
values in top and tracked it down to stime going below 0 in
task_stime(). Negative values are possible there due to the
sampled nature of stime/utime.
Fix suggested by Balbir Singh.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com>
The commit
commit 5cb350baf5
Author: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Date: Mon Oct 15 17:00:14 2007 +0200
sched: group scheduling, sysfs tunables
introduced the uids_mutex and the helpers to lock/unlock it.
Unfortunately, the error paths of alloc_uid() were not patched
to unlock it.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
warmbloodedcreature@gmail.com reported that an APIC-enabled
Asus a7v8x-x with an Athlon XP reboots early in the bootup:
http://bugzilla.kernel.org/show_bug.cgi?id=8723
after a long marathon of spontaneous-reboot debugging, it turns
out to be caused by sync_Arb_ids(). AMD CPUs never really needed
this sequence anyway, so just return early if we meet an AMD CPU.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Michael Kerrisk reported that a long standing bug in the adjtimex()
system call causes glibc's adjtime(3) function to deliver the wrong
results if 'delta' is NULL.
add the ADJ_OFFSET_SS_READ API detail, which will be used by glibc
to fix this API compatibility bug.
Also see: http://bugzilla.kernel.org/show_bug.cgi?id=6761
[ mingo@elte.hu: added patch description and made it backwards compatible ]
NOTE: the new flag is defined 0xa001 so that it returns -EINVAL on
older kernels - this way glibc can use it safely. Suggested by Ulrich
Drepper.
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The latest KVM driver wants to use the empty_zero_page symbol, and it's
not exported in 32-bit x86 (although it is exported by x86_64, s390, and
uml architectures).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: tglx@linutronix.de
Cc: linux-kernel@vger.kernel.com
Cc: kvm-devel@lists.sourceforge.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/kprobes_64.c: In function 'set_current_kprobe':
arch/x86/kernel/kprobes_64.c:152: sorry, unimplemented: inlining failed in call to 'is_IF_modifier': recursive inlining
arch/x86/kernel/kprobes_64.c:166: sorry, unimplemented: called from here
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@elte.hu
Cc: akpm@linux-foundation.org
Cc: tglx@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
HP ProLiant systems DL385 G2 and DL585 G2 need pci=bfsort to enumerate PCI
devices in the expected order.
Matt sayeth:
biosdevname is a userspace app I wrote to help solve this so we don't need
to patch the kernel for future systems. It's not integrated into any
distributions properly yet, but is included in openSUSE 10.3 and Fedora 8
for people who want to download and install it there. It acts as a udev
helper.
For the time being, patching the kernel is necessary. I really hope
biosdevname eliminates that need in future distributions.
http://linux.dell.com/biosdevname/
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Cc: mingo@elte.hu
Cc: andy@greyhouse.net
Cc: john.cagle@hp.com
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: correctly set UTS_MACHINE for "make ARCH=x86"
For a kernel built with "make ARCH=x86" the following system
information is displayed when running the new kernel
$ uname -m
x86
On some i386 systems (e.g. K7) we even have the following information
$ uname -m
x66
This is weird. The usual information for "uname -m" should be "x86_64"
on 64-bit and "i386" or "i686" on 32-bit.
This patch fixes the issue by setting UTS_MACHINE to "i386" for 32-bit
kernel builds and to "x86_64" for 64-bit kernel builds. I.e., "x86"
won't be used for UTS_MACHINE anymore.
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andreas Herrmann <aherrman@arcor.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@redhat.com>
ACPI processor idle code references local_apic_timer_c2_ok, which
is not available when LOCAL_APIC is disabled.
Define local_apic_timer_c2_ok as a constant, when LOCAL_APIC=n
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
today, all oopses contain a version number of the kernel, which is nice
because the people who actually do bother to read the oops get this
vital bit of information always without having to ask the reporter in
another round trip.
However, WARN_ON() and many other dump_stack() users right now lack this
information; the patch below adds this. This information is essential
for getting people to use their time effectively when looking at these
things; in addition, it's essential for tools that try to collect
statistics about defects.
Please consider, since its so simple and important for long term kernel
quality processes.
The code is identical between 32/64 bit; a lot of this code should be
unified over time, the patch keeps the identical-ness intact.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
AMD Opteron processors before CG revision don't like C-states > 1.
This solves the long standing bugzilla #5303 and probably some more
on affected machines:
http://bugzilla.kernel.org/show_bug.cgi?id=5303
[ tglx@linutronix.de: reworked the patch so it does not wreck ia64 ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
More than 3 years ago Niclas Gustafsson reported a 'stopped time'
problem:
> Watching the /proc/interrupts with 10s apart after the "stop".
>
> [root@s151 root]# more /proc/interrupts
> CPU0
> 0: 66413955 local-APIC-edge timer
[...]
> LOC: 67355837
> ERR: 0
> MIS: 0
> [root@s151 root]# more /proc/interrupts
> CPU0
> 0: 66413955 local-APIC-edge timer
[...]
> LOC: 67379568
> ERR: 0
> MIS: 0
This may be because buggy SMM firmware messes with the 8259A (configured
for a transparent mode -- yes that rare "local-APIC-edge" mode is tricky
;-) ) insanely.
this should resolve:
http://bugzilla.kernel.org/show_bug.cgi?id=2544http://bugzilla.kernel.org/show_bug.cgi?id=6296
Patch-dusted-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sibyte SOCs only have 32-bit PCI. Due to the sparse use of the address
space only the first 1GB of memory is mapped at physical addresses
below 1GB. If a system has more than 1GB of memory 32-bit DMA will
not be able to reach all of it.
For now this patch is good enough to keep Sibyte users happy but it seems
eventually something like swiotlb will be needed for Sibyte.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
In particular as-is it's not suited for multicore and mutiprocessors
systems where there is on guarantee that the counter are synchronized
or running from the same clock at all. This broke Sibyte and probably
others since the "[MIPS] Handle R4000/R4400 mfc0 from count register."
commit.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The R4000 and R4400 have an errata where if the cp0 count register is read
in the exact moment when it matches the compare register no interrupt will
be generated.
This bug may be triggered if the cp0 count register is being used as
clocksource and the compare interrupt as clockevent. So a simple
workaround is to avoid using the compare for both facilities on the
affected CPUs.
This is different from the workaround suggested in the old errata documents;
at some opportunity probably the official version should be implemented
and tested. Another thing to find out is which processor versions
exactly are affected. I only have errata documents upto R4400 V3.0
available so for the moment the code treats all R4000 and R4400 as broken.
This is potencially a problem for some machines that have no other decent
clocksource available; this workaround will cause them to fall back to
another clocksource, worst case the "jiffies" source.
The LL / SC loops in __futex_atomic_op() have the usual fixups necessary
for memory acccesses to userspace from kernel space installed:
__asm__ __volatile__(
" .set push \n"
" .set noat \n"
" .set mips3 \n"
"1: ll %1, %4 # __futex_atomic_op \n"
" .set mips0 \n"
" " insn " \n"
" .set mips3 \n"
"2: sc $1, %2 \n"
" beqz $1, 1b \n"
__WEAK_LLSC_MB
"3: \n"
" .set pop \n"
" .set mips0 \n"
" .section .fixup,\"ax\" \n"
"4: li %0, %6 \n"
" j 2b \n" <-----
" .previous \n"
" .section __ex_table,\"a\" \n"
" "__UA_ADDR "\t1b, 4b \n"
" "__UA_ADDR "\t2b, 4b \n"
" .previous \n"
: "=r" (ret), "=&r" (oldval), "=R" (*uaddr)
: "0" (0), "R" (*uaddr), "Jr" (oparg), "i" (-EFAULT)
: "memory");
The branch at the end of the fixup code, it goes back to the SC
instruction, no matter if the fault was first taken by the LL or SC
instruction resulting in an endless loop which will only terminate if
the address become valid again due to another thread setting up an
accessible mapping and the CPU happens to execute the SC instruction
successfully which due to the preceeding ERET instruction of the fault
handler would only happen if UNPREDICTABLE instruction behaviour of the
SC instruction without a preceeding LL happens to favor that outcome.
But normally processes are nice, pass valid arguments and we were just
getting away with this.
Thanks to Kaz Kylheku <kaz@zeugmasystems.com> for providing the original
report and a test case.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
A new born thread starts execution not in schedule but rather in
ret_from_fork which results in it bypassing the part of the code to
load a new context written in C which are the DSP context and the
userlocal register which Linux uses for the TLS pointer. Frequently
we were just getting away with this bug for a number of reasons:
o Real world application scenarios are very unlikely to use clone or fork
in blocks of DSP code.
o Linux by default runs the child process right after the fork, so the
child by luck will find all the right context in the DSP and userlocal
registers.
o So far the rdhwr instruction was emulated on all hardware so userlocal
wasn't getting referenced at all and the emulation wasn't suffering
from the issue since it gets it's value straight from the thread's
thread_info.
Fixed by moving the code to load the context from switch_to() to
finish_arch_switch which will be called by newborn and old threads.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>