Added early_find_capability that wraps pci_bus_find_capability and uses
fake_pci_bus() to allow us to call it before we've fully setup the
pci_controller.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Now that the last inlined instances are gone, all that is left to do
is turning disable_irq_nosync on arm26 and m68k from defines to aliases
and we are all set - we can make these externs in linux/interrupt.h
uncoditional and kill remaining instances in asm/irq.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Not everyone wants libsas automatically to pull in libata. This patch
makes the behaviour configurable, so you can build libsas with or
without ATA support.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
leds: Convert from struct class_device to struct device
leds: leds-gpio for ngw100
leds: Add warning printks in error paths
leds: Fix trigger unregister_simple if register_simple fails
leds: Use menuconfig objects II - LED
leds: Teach leds-gpio to handle timer-unsafe GPIOs
leds: Add generic GPIO LED driver
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight:
leds: cr_bllcd.c: build fix
backlight: Convert from struct class_device to struct device
backlight: Fix order of Kconfig entries
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Clean up duplicate includes in drivers/macintosh/
[POWERPC] Quiet section mismatch warning on pcibios_setup
[POWERPC] init and exit markings for hvc_iseries
[POWERPC] Quiet section mismatch in hvc_rtas.c
[POWERPC] Constify of_platform_driver match_table
[POWERPC] hvcs: Make some things static and const
[POWERPC] Constify of_platform_driver name
[POWERPC] MPIC protected sources
[POWERPC] of_detach_node()'s device node argument cannot be const
[POWERPC] Fix ARCH=ppc builds
[POWERPC] mv64x60: Use mutex instead of semaphore
[POWERPC] Allow smp_call_function_single() to current cpu
[POWERPC] Allow exec faults on readable areas on classic 32-bit PowerPC
[POWERPC] Fix future firmware feature fixups function failure
[POWERPC] fix showing xmon help
[POWERPC] Make xmon_write accept a const buffer
[POWERPC] Fix misspelled "CONFIG_CHECK_CACHE_COHERENCY" Kconfig option.
[POWERPC] cell: CONFIG_SPE_BASE is a typo
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (77 commits)
ACPI: Populate /sys/firmware/acpi/tables/
ACPI: create CONFIG_ACPI_DEBUG_FUNC_TRACE
ACPI: update ACPI proc I/F removal schedule
ACPI: update feature-removal-schedule.txt, /sys/firmware/acpi/namespace is gone
ACPI: export ACPI events via acpi_mc_group multicast group
ACPI: fix empty macros found by -Wextra
ACPI: drivers/acpi/pci_link.c: lower printk severity
sony-laptop: Fix event reading in sony-laptop
sony-laptop: Add Vaio FE to the special init sequence
sony-laptop: Make the driver use MSC_SCAN and a setkeycode and getkeycode key table.
sony-laptop: Invoke _INI for SNC devices that provide it
sony-laptop: Add support for recent Vaios Fn keys (C series for now)
sony-laptop: map wireless switch events to KEY_WLAN
sony-laptop: add new SNC handlers
ACPI: thinkpad-acpi: add locking to brightness subdriver
ACPI: thinkpad-acpi: bump up version to 0.15
ACPI: thinkpad-acpi: make EC-based thermal readings non-experimental
ACPI: thinkpad-acpi: make sure DSDT TMPx readings don't return +128
ACPI: thinkpad-acpi: react to Lenovo ThinkPad differences in hot key
ACPI: thinkpad-acpi: allow use of CMOS NVRAM for brightness control
...
They are identical
Indirectly pointed out by Thomas Gleixner
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously lock was unconditionally used, but shouldn't be needed on
UP systems.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to index register access ordering problems, when using macros a line
like this fails (and does nothing):
setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
With inlined functions this line will work as expected.
Note about a side effect: Seems on Geode GX1 based systems the
"suspend on halt power saving feature" was never enabled due to this
wrong macro expansion. With inlined functions it will be enabled, but
this will stop the TSC when the CPU runs into a HLT instruction.
Kernel output something like this:
Clocksource tsc unstable (delta = -472746897 ns)
This is the 3rd version of this patch.
- Adding missed arch/i386/kernel/cpu/mtrr/state.c
Thanks to Andres Salomon
- Adding some big fat comments into the new header file
Suggested by Andi Kleen
AK: fixed x86-64 compilation
Signed-off-by: Juergen Beisert <juergen@kreuzholzen.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
local_cmpxchg() should not use any LOCK prefix. This change probably
got lost in the move to cmpxchg.h.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a machine check or NMI occurs while multiple byte code is patched
the CPU could theoretically see an inconsistent instruction and crash.
Prevent this by temporarily disabling MCEs and returning early in the
NMI handler.
Based on discussion with Mathieu Desnoyers.
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reenable kprobes and alternative patching when the kernel text is write
protected by DEBUG_RODATA
Add a general utility function to change write protected text. The new
function remaps the code using vmap to write it and takes care of CPU
synchronization. It also does CLFLUSH to make icache recovery faster.
There are some limitations on when the function can be used, see the
comment.
This is a newer version that also changes the paravirt_ops code.
text_poke also supports multi byte patching now.
Contains bug fixes from Zach Amsden and suggestions from Mathieu
Desnoyers.
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch uses the read and write functions provided at system.h
for control registers instead of writting raw assembly over and
over again in .c files. Functions to manipulate cr2 and cr8 were
provided, as they were lacking.
Also, removed some extra space after closing brackets
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes the i386 behave the same way that x86_64 does when a
segfault happens. A line gets printed to the kernel log so that tools
that need to check for failures can behave more uniformly between
debug.show_unhandled_signals sysctl variable to 0 (or by doing echo 0 >
/proc/sys/debug/exception-trace)
Also, all of the lines being printed are now using printk_ratelimit() to
deny the ability of DoS from a local user with a program like the
following:
main()
{
while (1)
if (!fork()) *(int *)0 = 0;
}
This new revision also includes the fix that Andrew did which got rid of
new sysctl that was added to the system in earlier versions of this.
Also, 'show-unhandled-signals' sysctl has been renamed back to the old
'exception-trace' to avoid breakage of people's scripts.
AK: Enabling by default for i386 will be likely controversal, but let's see what happens
AK: Really folks, before complaining just fix your segfaults
AK: I bet this will find a lot of silent issues
Signed-off-by: Masoud Sharbiani <masouds@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
[ Personally, I've found the complaints useful on x86-64, so I'm all for
this. That said, I wonder if we could do it more prettily.. -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move register and other definitions out of the
include/asm-arm/arch-s3c2410 into the the arch
directories of include/asm-arm/plat-s3c24xx and
include/asm-arm/plat-s3c.
This move is in preperation of the merging of
s3c2400 and s3c6400.
The following git mv commands are needed before
this patch can be applied:
git mv include/asm-arm/arch-s3c2410/regs-ac97.h include/asm-arm/plat-s3c/regs-ac97.h
git mv include/asm-arm/arch-s3c2410/regs-adc.h include/asm-arm/plat-s3c/regs-adc.h
git mv include/asm-arm/arch-s3c2410/regs-iis.h include/asm-arm/plat-s3c24xx/regs-iis.h
git mv include/asm-arm/arch-s3c2410/regs-spi.h include/asm-arm/plat-s3c24xx/regs-spi.h
git mv include/asm-arm/arch-s3c2410/regs-udc.h include/asm-arm/plat-s3c24xx/regs-udc.h
git mv include/asm-arm/arch-s3c2410/udc.h include/asm-arm/plat-s3c24xx/udc.h
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We've fixed up a number of faults with the uncompressors
so remove the now unused FIFO_MAX as it is not needed.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Split the S3C2400 out of S3C2410 memory.h files
ready for S3C2400 support to be added.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reorganise the definition of the virtual addresses
used into a common header and update the users to
rename S3C2410 items into a more common S3C defined
macros.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Remove the static maps for the LCD and USB devices.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move the S3C2400 values to their own include directory
series in include/asm-arm/arch-s3c2400 as the support
for the S3C2400 is best placed in its own arch directory.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rename the S3C24XX configuration options for the watchdog
boot controls for moving to the arch/arm/plat-s3c moves.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Check for ARM926 based S3C24XX based devices as these
only have 64 byte FIFOs, and do not have the model
detection refisters in the same place as the ARM920
based CPUs
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ensure we check for ARM926 in the uncompressor, as all current
ARM926s do not have an ID register and all have S3C2440 style
UARTs.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rename DEBUG_S3C2410_PORT to DEBUG_S3C_PORT as well as
DEBUG_S3C2410_UART to DEBUG_S3C_UART as part of the updates
to moving to plat-s3c for S3C base support.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rename CONFIG_S3C2410_LOWLEVEL_UART_PORT to be
CONFIG_S3C_LOWLEVEL_UART_PORT as we move to using
plat-s3c for base of S3C operations.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Update the debug macros for use with the new per-cpu
configuration and usage.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move the common parts of the debug macros into include/asm-arm/plat-s3c
ready to be used for the common S3C support.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch moves items of the s3c24xx support into
a new plat-s3c directory for items that use the
s3c24xx support but are not directly s3c24xx
compatible, such as the s3c2400 and s3c6400.
git mv commands:
git mv include/asm-arm/arch-s3c2410/iic.h include/asm-arm/plat-s3c/iic.h
git mv include/asm-arm/arch-s3c2410/nand.h include/asm-arm/plat-s3c/nand.h
git mv include/asm-arm/arch-s3c2410/regs-iic.h include/asm-arm/plat-s3c/regs-iic.h
git mv include/asm-arm/arch-s3c2410/regs-nand.h include/asm-arm/plat-s3c/regs-nand.h
git mv include/asm-arm/arch-s3c2410/regs-rtc.h include/asm-arm/plat-s3c/regs-rtc.h
git mv include/asm-arm/arch-s3c2410/regs-serial.h include/asm-arm/plat-s3c/regs-serial.h
git mv include/asm-arm/arch-s3c2410/regs-timer.h include/asm-arm/plat-s3c/regs-timer.h
git mv include/asm-arm/arch-s3c2410/regs-watchdog.h include/asm-arm/plat-s3c/regs-watchdog.h
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds the foundation pieces for
the Freescale MXC platforms, including
i.MX2 and i.MX3 based systems.
The bare-bones MX31 support in this patch
boots to the rootdev panic with 8250 serial
console configured "console=ttyS0,115200".
It assumes that Redboot is the boot loader.
Signed-off-by: Quinn Jensen <quinn.jensen@freescale.com>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Selinux folks had been complaining about the lack of AVC_PATH
records when audit is disabled. I must admit my stupidity - I assumed
that avc_audit() really couldn't use audit_log_d_path() because of
deadlocks (== could be called with dcache_lock or vfsmount_lock held).
Shouldn't have made that assumption - it never gets called that way.
It _is_ called under spinlocks, but not those.
Since audit_log_d_path() uses ab->gfp_mask for allocations,
kmalloc() in there is not a problem. IOW, the simple fix is sufficient:
let's rip AUDIT_AVC_PATH out and simply generate pathname as part of main
record. It's trivial to do.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: James Morris <jmorris@namei.org>
Right now the audit filter can match on = != > < >= blah blah blah.
This allow the filter to also look at bitwise AND operations, &
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
blk_fill_sghdr_rq, blk_unmap_sghdr_rq, and blk_complete_sghdr_rq were
exported for bsg, however bsg was changed to support only sg v4.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some HW platforms, such as the new cell blades, requires some MPIC sources
to be left alone by the operating system. This implements support for
a "protected-sources" property in the mpic controller node containing a list
of source numbers to be protected against operating system interference.
For those interested in the gory details, the MPIC on the southbridge of
those blades has some of the processor outputs routed to the cell, and
at least one routed as a GPIO to the service processor. It will be used
in the GA product for routing some of the southbridge error interrupts
to the service processor which implements some of the RAS stuff, such
as checkstopping when fatal errors occurs before they can propagate.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
...since it modifies it (when it sets the OF_DETACHED flag).
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The recent signal rework broke ARCH=ppc builds with the following
error:
CC arch/powerpc/kernel/signal.o
arch/powerpc/kernel/signal.c: In function Âdo_signalÂ:
arch/powerpc/kernel/signal.c:142: error: implicit declaration of
function Âset_dabrÂ
make[1]: *** [arch/powerpc/kernel/signal.o] Error 1
This fixes it by including a function prototype in asm-ppc/system.h.
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
applied after Rafel's 'PM: Update global suspend and hibernation operations framework' patch set
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Based on the David Brownell's patch at
http://marc.info/?l=linux-acpi&m=117873972806360&w=2
updated by: Rafael J. Wysocki <rjw@sisk.pl>
Add a helper routine returning the lowest power (highest number) ACPI device
power state that given device can be in while the system is in the sleep state
indicated by acpi_target_sleep_state .
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
Split ACPI_DEBUG into function trace enabled and not enabled.
Function trace is most of the ACPI_DEBUG costs, but is
not much of use for kernel ACPI debugging.
Size of kernel image increased on test compile:
+ 48k (Full ACPI_DEBUG)
+ 35k (ACPI_DEBUG with function trace compiled out)
Performance without function trace is also much better.
Also remove ACPI_LV_DEBUG_OBJECT from default debug level as
a lot vendors let Store (value, debug) in their code and this
might confuse users when it pops up in syslog.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
ACPI has a ton of macros which make a bunch of empty if's when configured
in non-debug mode.
[lenb: The code it complaines about is functionally correct,
so this patch is just to make -Wextra happier]
#define DBG()
if(...)
DBG();
next_c_statement
which turns into
if(...) ;
next_c_statement
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Keep note of ThinkPad model, BIOS and EC firmware information, and log it
on startup. Makes for far more readable code in places, too.
This patch also adds Lenovo's PCI ID to the pci ids table.
Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[NET]: Add missing entries to family name tables
[NET]: Make NETDEVICES depend on NET.
[IPV6]: endianness bug in ip6_tunnel
[IrDA]: TOSHIBA_FIR depends on virt_to_bus
[IrDA]: EP7211 IR driver port to the latest SIR API
[IrDA] Typo fix in irnetlink.c copyright
[NET]: Fix loopback crashes when multiqueue is enabled.
[IPV4]: Fix inetpeer gcc-4.2 warnings
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: ERROR: "sys_ioctl" [arch/sparc64/solaris/solaris.ko] undefined!
[SPARC32]: Make PAGE_SHARED a read-mostly variable.
[SPARC32]: Take enable_irq/disable_irq out of line.
[SPARC32]: clean include/asm-sparc/irq.h
[SPARC32]: Fix rounding errors in ndelay/udelay implementation.
Move stuff used only by arch/sparc/kernel/* into arch/sparc/kernel/irq.h
and into individual files in there (e.g. macros internal to sun4m_irq.c,
etc.)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EP7211 SIR driver was the only one left without a new SIR API port.
Signed-off-by: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces struct pci_sysdata to x86 and x86-64, and
converts the existing two users (NUMA, Calgary) to use it.
This lays the groundwork for having other users of sysdata, such as
the PCI domains work.
The Calgary bits are tested, the NUMA bits just look ok.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This builds upon the existing geode infrastructure, but adds southbridge
support, some GPIO functions, and a header file (asm-i386/geode.h) with some
useful GX/LX detection tests.
The majority of this code was written by Jordan Crouse.
Signed-off-by: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_vm_area always returns an area with an adjacent guard page. That guard
page is included in vm_struct.size. iounmap uses vm_struct.size to
determine how much address space needs to have change_page_attr applied to
it, which will BUG if applied to the guard page.
This patch adds a helper function - get_vm_area_size() in linux/vmalloc.h -
to return the actual size of a vm area, and uses it to make iounmap do the
right thing. There are probably other places which should be using
get_vm_area_size().
Thanks to Dave Young <hidave.darkstar@gmail.com> for debugging the
problem.
[ Andi, it wasn't clear to me whether x86_64 needs the same fix. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
setup_pit_timer is declared in asm-i386/timer.h. Move it to the pit header
file, so it can be used by x86_64 as well.
Move also the PIT constants.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G
RAM installed. when using kexec to load second kernel. In the second
kernel, when mem is allocated for GART, it will do the memset for clear, it
will cause restart, because some device still used that for dma. solution
will be:
in second kernel: disable that at first before we try to allocate mem for
it. or in the first kernel: do disable that before shutdown.
Andi/Eric/Alan prefer to second one for clean shutdown in first kernel.
Andi also point out need to consider to AGP enable but mem less 4G case
too.
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add cpu_relax() to cmos_lock() inline function for faster operation on SMT
CPUs and less power consumption on others in case of lock contention (which
probably doesn't happen too often, so admittedly this patch is not too
exciting).
[akpm@linux-foundation.org: Include the header file for cpu_relax()]
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/vmalloc.c: In function 'unmap_kernel_range':
mm/vmalloc.c:75: warning: unused variable 'start'
make it a C function so that the compiler thinks it used its arguments.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function name is set_fixmap(), not fixmap_set() as stated in the comment.
Also fix a typo, punctuation and lower/uppercase a bit.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace the pcspkr private PIT lock by the global PIT lock to serialize the
PIT access all over the place.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some interrupt entry points are currently defined in i8259.c They probably
belong in a header. Right now, their only user is init_IRQ, justifying
their declaration in-file. But when virtualization comes in, we may be
interested in using that functions in late initializations.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On some systems the ACPI NVS area is located in the first 1 MB of RAM and
it is overwritten by the i386 code during the restore after hibernation.
This confuses the ACPI platform firmware that doesn't update the AC adapter
status appropriately as a result
(http://bugzilla.kernel.org/show_bug.cgi?id=7995).
The solution is to register the reserved memory in the first 1 MB as
'nosave', so that swsusp doesn't touch it during the restore. Also, this
has been done on x86_64 for a long time now, so this patch makes the i386
restore code behave like the x86_64 one.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide seperate versions for Calgary and CalIOC2
Also print out the PCIe Root Complex Status on CalIOC2 errors
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Calgary and CalIOC2 share most of the same logic. Introduce struct
cal_chipset_ops for quirks and tce flush logic which are
[akpm@linux-foundation.org: make calgary_chip_ops static]
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Rise CPUs were only very short-lived, and there are no reports of
anyone both owning one and running Linux on it.
Googling for the printk string "CPU: Rise iDragon" didn't find any dmesg
available online.
If it turns out that against all expectations there are actually users
reverting this patch would be easy.
This patch will make the kernel images smaller by a few bytes for all
i386 users.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 59f4e7d572 fixed machine rebooting
on Truxton's machine (when no keyboard was present). But it broke it on
Lee's machine.
The patch reinstates the old (pre-59f4e7d572980a521b7bdba74ab71b21f5995538)
code and if that doesn't work out, try the new,
post-59f4e7d572980a521b7bdba74ab71b21f5995538 code instead.
Cc: Lee Garrett <lee-in-berlin@web.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Background:
/dev/mcelog is typically polled manually. This is less than optimal for
situations where accurate accounting of MCEs is important. Calling
poll() on /dev/mcelog does not work.
Description:
This patch adds support for poll() to /dev/mcelog. This results in
immediate wakeup of user apps whenever the poller finds MCEs. Because
the exception handler can not take any locks, it can not call the wakeup
itself. Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is
caught at the next return from interrupt or exit from idle, calling the
mce_user_notify() routine. This patch also disables the "fake panic"
path of the mce_panic(), because it results in printk()s in the exception
handler and crashy systems.
This patch also does some small cleanup for essentially unused variables,
and moves the user notification into the body of the poller, so it is
only called once per poll, rather than once per CPU.
Result:
Applications can now poll() on /dev/mcelog. When an error is logged
(whether through the poller or through an exception) the applications are
woken up promptly. This should not affect any previous behaviors. If no
MCEs are being logged, there is no overhead.
Alternatives:
I considered simply supporting poll() through the poller and not using
TIF_MCE_NOTIFY at all. However, the time between an uncorrectable error
happening and the user application being notified is *the*most* critical
window for us. Many uncorrectable errors can be logged to the network if
given a chance.
I also considered doing the MCE poll directly from the idle notifier, but
decided that was overkill.
Testing:
I used an error-injecting DIMM to create lots of correctable DRAM errors
and verified that my user app is woken up in sync with the polling interval.
I also used the northbridge to inject uncorrectable ECC errors, and
verified (printk() to the rescue) that the notify routine is called and the
user app does wake up. I built with PREEMPT on and off, and verified
that my machine survives MCEs.
[wli@holomorphy.com: build fix]
Signed-off-by: Tim Hockin <thockin@google.com>
Signed-off-by: William Irwin <bill.irwin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For NUMA emulation, our SLIT should represent the true NUMA topology of the
system but our proximity domain to node ID mapping needs to reflect the
emulated state.
When NUMA emulation has successfully setup fake nodes on the system, a new
function, acpi_fake_nodes() is called. This function determines the proximity
domain (_PXM) for each true node found on the system. It then finds which
emulated nodes have been allocated on this true node as determined by its
starting address. The node ID to PXM mapping is changed so that each fake
node ID points to the PXM of the true node that it is located on.
If the machine failed to register a SLIT, then we assume there is no special
requirement for emulated node affinity so we use the default LOCAL_DISTANCE,
which is newly exported to this code, as our measurement if the emulated nodes
appear in the same PXM. Otherwise, we use REMOTE_DISTANCE.
PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we
can compare node_to_pxm() results in generic code (in this case, the SRAT
code).
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds caching of pgds and puds, pmds, pte. That way we can avoid costly
zeroing and initialization of special mappings in the pgd.
A second quicklist is useful to separate out PGD handling. We can carry the
initialized pgds over to the next process needing them.
Also clean up the pgd_list handling to use regular list macros. There is no
need anymore to avoid the lru field.
Move the add/removal of the pgds to the pgdlist into the constructor /
destructor. That way the implementation is congruent with i386.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Luck, Tony" <tony.luck@intel.com>
Acked-by: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Constrain __supported_pte_mask and NX handling to just the PAE kernel.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When making changes to x86_64 timers, I noticed that touching hpet.h triggered
an unreasonably large rebuild. Untangling it from timex.h quiets the extra
rebuild quite a bit.
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove pit_interrupt_hook as it adds just an extra layer.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This implements new vDSO for x86-64. The concept is similar
to the existing vDSOs on i386 and PPC. x86-64 has had static
vsyscalls before, but these are not flexible enough anymore.
A vDSO is a ELF shared library supplied by the kernel that is mapped into
user address space. The vDSO mapping is randomized for each process
for security reasons.
Doing this was needed for clock_gettime, because clock_gettime
always needs a syscall fallback and having one at a fixed
address would have made buffer overflow exploits too easy to write.
The vdso can be disabled with vdso=0
It currently includes a new gettimeofday implemention and optimized
clock_gettime(). The gettimeofday implementation is slightly faster
than the one in the old vsyscall. clock_gettime is significantly faster
than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
The new calls are generally faster than the old vsyscall.
Advantages over the old x86-64 vsyscalls:
- Extensible
- Randomized
- Cleaner
- Easier to virtualize (the old static address range previously causes
overhead e.g. for Xen because it has to create special page tables for it)
Weak points:
- glibc support still to be written
The VM interface is partly based on Ingo Molnar's i386 version.
Includes compile fix from Joachim Deguara
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc 4.3 supports a new __attribute__((__cold__)) to mark functions cold. Any
path directly leading to a call of this function will be unlikely. And gcc
will try to generate smaller code for the function itself.
Please use with care. The code generation advantage isn't large and in most
cases it is not worth uglifying code with this.
This patch marks some common error functions like panic(), printk()
as cold. This will longer term make many unlikely()s unnecessary, although
we can keep them for now for older compilers.
BUG is not marked cold because there is currently no way to tell
gcc to mark a inline function told.
Also all __init and __exit functions are marked cold. With a non -Os
build this will tell the compiler to generate slightly smaller code
for them. I think it currently only uses less alignments for labels,
but that might change in the future.
One disadvantage over *likely() is that they cannot be easily instrumented
to verify them.
Another drawback is that only the latest gcc 4.3 snapshots support this.
Unfortunately we cannot detect this using the preprocessor. This means older
snapshots will fail now. I don't think that's a problem because they are
unreleased compilers that nobody should be using.
gcc also has a __hot__ attribute, but I don't see any sense in using
this in the kernel right now. But someday I hope gcc will be able
to use more aggressive optimizing for hot functions even in -Os,
if that happens it should be added.
Includes compile fix from Thomas Gleixner.
Cc: Jan Hubicka <jh@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler generally generates reasonable inline code for the simple
cases and for the rest it's better for code size for them to be out of line.
Also there they can be potentially optimized more in the future.
In fact they probably should be in a .S file because they're all pure
assembly, but that's for another day.
Also some code style cleanup on them while I was on it (this seems
to be the last untouched really early Linux code)
This saves ~12k text for a defconfig kernel with gcc 4.1.
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan asked to always use the builtin memcpy on gcc 4.3 mainline because
it should generate better code than the old macro. Let's try it.
Cc: Jan Hubicka <jh@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In acpi_scan_nodes(), we immediately return -1 if acpi_numa <= 0, meaning
we haven't detected any underlying ACPI topology or we have explicitly
disabled its use from the command-line with numa=noacpi.
acpi_table_print_srat_entry() and acpi_table_parse_srat() are only
referenced within drivers/acpi/numa.c, so we can mark them as static and
remove their prototypes from the header file.
Likewise, pxm_to_node_map[] and node_to_pxm_map[] are only used within
drivers/acpi/numa.c, so we mark them as static and remove their externs
from the header file.
The automatic 'result' variable is unused in acpi_numa_init(), so it's
removed.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On x86_64, <asm/ptrace.h> uses __user but doesn't include
<linux/compiler.h>. This could lead to build failures.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
i386 and sparc64 have the identical code to update the cmos clock. Move it
into kernel/time/ntp.c as there are other architectures coming along with the
same requirements.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to make sure, that the clockevent devices are resumed, before
the tick is resumed. The current resume logic does not guarantee this.
Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock
event devices before resuming the tick / oneshot functionality.
Fixup the existing users.
Thanks to Nigel Cunningham for tracking down a long standing thinko,
which affected the jinxed VAIO.
[akpm@linux-foundation.org: xen build fix]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is an variation on the patch sent by Christoph Hellwig which kills
file_count abuse by the Coda kernel module by moving the coda_flush
functionality into coda_release. However part of reason we were using the
coda_flush callback was to allow Coda to pass errors that occur during
writeback from the userspace cache manager back to close().
As Al Viro explained on linux-fsdevel, it is impossible to guarantee that
such errors can in fact be returned back to the caller. There are many
cases where the last reference to a file is not released by the close
system call and it is also impossible to pick some close as a 'last-close'
and delay it until all other references have been destroyed.
The CODA_STORE/CODA_RELEASE upcall combination is clearly a broken design,
and it is better to remove it completely.
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Too many semicolons in this macro.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, bsg doesn't make class backlinks (a process whereby you'd get
a link to bsg in the device directory in the same way you get one for
sg). This is because the bsg device is uninitialised, so the class
device has nothing it can attach to. The fix is to make the bsg device
point to the cdevice of the entity creating the bsg, necessitating
changing the bsg_register_queue() prototype into a form that takes the
generic device.
Acked-by: FUJITA Tomonori <tomof@acm.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: fix section mismatch warning in mdesc.c
[SPARC64]: fix section mismatch warning in pci_sunv4
[SPARC64]: Stop using drivers/char/rtc.c
[SPARC64]: Convert parport to of_platform_driver.
[SPARC]: Implement fb_is_primary_device().
[SPARC64]: Fix virq decomposition.
[SPARC64]: Use KERN_ERR in IRQ manipulation error printks.
[SPARC64]: Do not flood log with failed DS messages.
[SPARC64]: Add proper multicast support to VNET driver.
[SPARC64]: Handle multiple domain-services-port nodes properly.
[SPARC64]: Improve VIO device naming further.
[SPARC]: Make sure dev_archdata is filled in for all devices.
[SPARC]: Define minimal struct dev_archdata, similarly to sparc64.
[SPARC]: Fix serial console device detection.
The current scheme works on static interpretation of text names, which
is wrong.
The output-device setting, for example, must be resolved via an alias
or similar to a full path name to the console device.
Paths also contain an optional set of 'options', which starts with a
colon at the end of the path. The option area is used to specify
which of two serial ports ('a' or 'b') the path refers to when a
device node drives multiple ports. 'a' is assumed if the option
specification is missing.
This was caught by the UltraSPARC-T1 simulator. The 'output-device'
property was set to 'ttya' and we didn't pick upon the fact that this
is an OBP alias set to '/virtual-devices/console'. Instead we saw it
as the first serial console device, instead of the hypervisor console.
The infrastructure is now there to take advantage of this to resolve
the console correctly even in multi-head situations in fbcon too.
Thanks to Greg Onufer for the bug report.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
V4L/DVB (5880): wm8775/wm8739: Fix memory leak when unloading module
V4L/DVB (5877): radio-gemtek-pci: remove unused structure member
V4L/DVB (5871): Conexant 2388x: check for kthread_run
V4L/DVB (5869): Add check for valid control ID to v4l2_ctrl_next.
V4L/DVB (5867): videodev2.h: add missing <sys/time.h> for userspace
V4L/DVB (5866): ivtv: fix DMA timeout when capturing VBI + another stream
V4L/DVB (5865): Remove usage of HZ on ivtv driver, replacing by msecs_to_jiffies
V4L/DVB (5861): Use msecs_to_jiffies instead of HZ on bttv, cx88 and saa7134
V4L/DVB (5860): Use msecs_to_jiffies instead of HZ on some webcam drivers
V4L/DVB (5859): use msecs_to_jiffies on InfraRed RC5 timeout
V4L/DVB (5858): Use msecs_to_jiffies instead of HZ on media/video I2C drivers
V4L/DVB (5857): Use msecs_to_jiffies instead of HZ on radio drivers
V4L/DVB (5855): ivtv: fix Kconfig typo and refer to the driver homepage.
V4L/DVB (5854): ivtv: cleanup of driver messages
V4L/DVB (5853): ivtv: add support to suppress high volume i2c debug messages.
V4L/DVB (5852): ivtv: don't recompile needlessly
V4L/DVB (5851): ivtv: fix missing I2C_ALGOBIT config option
V4L/DVB (5850): ivtv: improve API command debugging
V4L/DVB (5848): Av7110: fix typo
* 'for-2.6.23' of master.kernel.org:/pub/scm/linux/kernel/git/arnd/cell-2.6: (37 commits)
[CELL] spufs: rework list management and associated locking
[CELL] oprofile: add support to OProfile for profiling CELL BE SPUs
[CELL] oprofile: enable SPU switch notification to detect currently active SPU tasks
[CELL] spu_base: locking cleanup
[CELL] cell: indexing of SPUs based on firmware vicinity properties
[CELL] spufs: integration of SPE affinity with the scheduller
[CELL] cell: add placement computation for scheduling of affinity contexts
[CELL] spufs: extension of spu_create to support affinity definition
[CELL] cell: add hardcoded spu vicinity information for QS20
[CELL] cell: add vicinity information on spus
[CELL] cell: add per BE structure with info about its SPUs
[CELL] spufs: use find_first_bit() instead of sched_find_first_bit()
[CELL] spufs: remove unused file argument from spufs_run_spu()
[CELL] spufs: change decrementer restore timing
[CELL] spufs: dont halt decrementer at restore step 47
[CELL] spufs: limit saving MFC_CNTL bits
[CELL] spufs: fix read and write for decr_status file
[CELL] spufs: fix decr_status meanings
[CELL] spufs: remove needless context save/restore code
[CELL] spufs: fix array size of channel index
...
This patch adds the necessary ifdef's to the proc-v7.S code and
defines the v7wbi_tlb_fns macro in pgtable-nommu.h
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When videodev2.h is included by an application, it needs to include
<sys/time.h> for the timeval struct.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
These patches add full SSP/MCU support for the HP Jornada 720
machine. Its needed to handle keyboard, touchscreen, battery
and backlight/lcd.
The main driver exports functions and the header file exports
the command values. When talking to the MCU the general procedure
is to start MCU, send command (using ssp_inout(command)), the
proper reply is always TXDUMMY. After receiving TXDUMMY you can
send the value you wish pushed (for example brightness level).
End with ssp_end() so the spinlock gets unlocked.
Drivers using this havent been implemented yet, but will shortly.
Signed-off-by: Kristoffer Ericson <Kristoffer.ericson@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
With this patch, Kconfig only selects CPU_HAS_ASID for the MMU
case. It also corrects the typo in the v6wbi_tlb_fns definition in
pgtable-nommu.h.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
While bisecting an iop13xx compile failure I noticed that
include/asm-arm/hwcap.h should be included from include/asm-arm/elf.h
outside #ifndef __ASSEMBLY__ since hwcap.h has its own __ASSEMBLY__
protections.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This sorts out the various lists and related locks in the spu code.
In detail:
- the per-node free_spus and active_list are gone. Instead struct spu
gained an alloc_state member telling whether the spu is free or not
- the per-node spus array is now locked by a per-node mutex, which
takes over from the global spu_lock and the per-node active_mutex
- the spu_alloc* and spu_free function are gone as the state change is
now done inline in the spufs code. This allows some more sharing of
code for the affinity vs normal case and more efficient locking
- some little refactoring in the affinity code for this locking scheme
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
From: Maynard Johnson <mpjohn@us.ibm.com>
This patch updates the existing arch/powerpc/oprofile/op_model_cell.c
to add in the SPU profiling capabilities. In addition, a 'cell' subdirectory
was added to arch/powerpc/oprofile to hold Cell-specific SPU profiling code.
Exports spu_set_profile_private_kref and spu_get_profile_private_kref which
are used by OProfile to store private profile information in spufs data
structures.
Also incorporated several fixes from other patches (rrn). Check pointer
returned from kzalloc. Eliminated unnecessary cast. Better error
handling and cleanup in the related area. 64-bit unsigned long parameter
was being demoted to 32-bit unsigned int and eventually promoted back to
unsigned long.
Signed-off-by: Carl Love <carll@us.ibm.com>
Signed-off-by: Maynard Johnson <mpjohn@us.ibm.com>
Signed-off-by: Bob Nelson <rrnelson@us.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
This patch adds support for additional flags at spu_create, which relate
to the establishment of affinity between contexts and contexts to memory.
A fourth, optional, parameter is supported. This parameter represent
a affinity neighbor of the context being created, and is used when defining
SPU-SPU affinity.
Affinity is represented as a doubly linked list of spu_contexts.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds affinity data to each spu instance.
A doubly linked list is created, meant to connect the spus
in the physical order they are placed in the BE. SPUs
near to memory should be marked as having memory affinity.
Adjustments of the fields acording to FW properties is done
in separate patches, one for CPBW, one for Malta (patch for
Malta under testing).
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Addition of a spufs-global "cbe_info" array. Each entry contains information
about one Cell/B.E. node, namelly:
* list of spus (both free and busy spus are in this list);
* list of free spus (replacing the static spu_list from spu_base.c)
* number of spus;
* number of reserved (non scheduleable) spus.
SPE affinity implementation actually requires only access to one spu per
BE node (since it implements its own pointer to walk through the other spus
of the ring) and the number of scheduleable spus (n_spus - non_sched_spus)
However having this more general structure can be useful for other
functionalities, concentrating per-cbe statistics / data.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The decr_status in the LSCSA is confusedly used as two meanings:
* SPU decrementer was running
* SPU decrementer was wrapped as a result of adjust
and the code to set decr_status is missing.
This patch fixes these problems by using the decr_status argument as a
set of flags. This requires a rebuild of the shipped spu_restore code.
Signed-off-by: Masato Noguchi <Masato.Noguchi@jp.sony.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch exports per-context statistics in spufs as long as spu
statistics in sysfs.
It was formed by merging:
"spufs: add spu stats in sysfs" From: Christoph Hellwig
"spufs: add stat file to spufs" From: Christoph Hellwig
"spufs: fix libassist accounting" From: Jeremy Kerr
"spusched: fix spu utilization statistics" From: Luke Browning
And some adjustments by myself, after suggestions on cbe-oss-dev.
Having separate patches was making the review process harder
than it should, as we end up integrating spus and ctx statistics
accounting much more than it was on the first implementation.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The current SPU context saving procedure in SPUFS unexpectedly
restarts MFC when halting decrementer, because MFC_CNTL[Dh] is set
without MFC_CNTL[Sm]. This bug causes, for example, saving broken DMA
queues. Here is a patch to fix the problem.
Signed-off-by: Kazunori Asayama <asayama@sm.sony.co.jp>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
This patch adds support for investigating spus information after a
kernel crash event, through kdump vmcore file.
Implementation is based on xmon code, but the new functionality was
kept independent from xmon.
Signed-off-by: Lucio Jose Herculano Correia <luciojhc@br.ibm.com>
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The pmi driver got simplified by removing support for multiple devices.
As there is no more than one pmi device per maschine, there is no need to
specify the device for listening and sending messages.
This way the caller (cbe_cpufreq) doesn't need to scan the device tree.
When registering the handler on a board without a pmi
interface, pmi.c will just return -ENODEV.
The patch that fixed the breakage of cell_defconfig has been
broken out of the earlier version of this patch. So this is
the version that applies cleanly on top of it.
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
The comparison with ZERO_SIZE_PTR in ZERO_OR_NULL_PTR() needs to be <=
(not just <) so that ZERO_OR_NULL_PTR(ZERO_SIZE_PTR) is 1.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
[ Duh! - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Prevent people from directly including <asm/rwsem.h>.
[IA64] remove time interpolator
[IA64] Convert to generic timekeeping/clocksource
[IA64] refresh some config files for 64K pagesize
[IA64] Delete iosapic_free_rte()
[IA64] fallocate system call
[IA64] Enable percpu vector domain for IA64_DIG
[IA64] Enable percpu vector domain for IA64_GENERIC
[IA64] Support irq migration across domain
[IA64] Add support for vector domain
[IA64] Add mapping table between irq and vector
[IA64] Check if irq is sharable
[IA64] Fix invalid irq vector assumption for iosapic
[IA64] Use dynamic irq for iosapic interrupts
[IA64] Use per iosapic lock for indirect iosapic register access
[IA64] Cleanup lock order in iosapic_register_intr
[IA64] Remove duplicated members in iosapic_rte_info
[IA64] Remove block structure for locking in iosapic.c
Remove time_interpolator code (This is generic code, but
only user was ia64. It has been superseded by the
CONFIG_GENERIC_TIME code).
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Peter Keilty <peter.keilty@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is a merge of Peter Keilty's initial patch (which was
revived by Bob Picco) for this with Hidetoshi Seto's fixes
and scaling improvements.
Acked-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There's currently no destructor for the bsg components. If you insert
and remove the module, you see the bsg devices building up and up. This
patch adds the destructor in the correct place in the transport class so
that the bsg and request queue are removed just before the device
destruction.
Acked-by: FUJITA Tomonori <tomof@acm.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
1. split pxa_cpu_suspend to pxa25x_cpu_suspend and pxa27x_cpu_suspend
and make pxa25x_cpu_pm_enter() and pxa27x_cpu_pm_enter() to invoke
the corresponding _suspend functions, thus remove all those ugly
#ifdef .. #endif out of sleep.S
2. move the declarations of those suspend functions to pm.h
note: this is not a clean enough solution until all the pxa25x and
pxa27x specific part is further removed out of sleep.S, sleep.S is
supposed to contain generic code only
Signed-off-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
1. introduce a structure pxa_cpu_pm_fns for pxa25x/pxa27x specific
operations as follows:
struct pxa_cpu_pm_fns {
int save_size;
void (*save)(unsigned long *);
void (*restore)(unsigned long *);
int (*valid)(suspend_state_t state);
void (*enter)(suspend_state_t state);
}
2. processor specific registers saving and restoring are performed
by calling the corresponding (*save) and (*restore)
3. pxa_cpu_pm_fns->save_size should be initialized to the required
size for processor specific registers saving, the allocated
memory address will be passed to (*save) and (*restore)
memory allocation happens early in pxa_pm_init(), and save_size
should be assigned prior to this (which is usually true, since
pxa_pm_init() happens in device_initcall()
4. there're some redundancies for those SLEEP_SAVE_XXX and related
macros, will be fixed later, one way possible is for the system
devices to handle the specific registers saving and restoring
Signed-off-by: eric miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/ofcons:
Create drivers/of/platform.c
Create linux/of_platorm.h
[SPARC/64] Rename some functions like PowerPC
Begin consolidation of of_device.h
Begin to consolidate of_device.c
Consolidate of_find_node_by routines
Consolidate of_get_next_child
Consolidate of_get_parent
Consolidate of_find_property
Consolidate of_device_is_compatible
Start split out of common open firmware code
Split out common parts of prom.h
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: appletouch - improve powersaving for Geyser3 devices
Input: lifebook - fix an oops on Panasonic CF-18
Input: document intended meaning of KEY_SWITCHVIDEOMODE
Input: switch to using seq_list_xxx helpers
Input: i8042 - give more trust to PNP data on i386
Input: add driver for Fujitsu serial touchscreens
Input: ads7846 - re-check pendown status before reporting events
Input: ads7846 - introduce sample settling delay
Input: xpad - add support for leds on xbox 360 pad
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (26 commits)
sh: intc - add support for SH7750 and its variants
sh: Move entry point code to .text.head.
sh: heartbeat: Shut up resource size warning.
sh: update r2d defconfig and fix SH7751R pci compliation
sh: Many symbol exports for nommu allmodconfig.
sh: zero terminate 8250 platform data for r2d board
sh: cpufreq: Fix up the build for SH-2.
sh: Make on-chip DMA channel selection explicit.
sh: Fix up CPU dependencies for on-chip DMAC.
sh: cpufreq: clock framework support.
sh: Support rate rounding for SH7722 FRQCR clocks.
sh: Implement clk_round_rate() in the clock framework.
sh: Fix up PCI section mismatch warnings.
sh: Wire up fallocate() syscall.
sh: intc - add support for 7780
sh: intc - improve group support
sh: Fix up SH-3 and SH-4 driver dependencies.
sh: push-switch: Correct license string.
sh: cpufreq: Fix driver dependencies and flag as broken.
sh: IPR/INTC2 IRQ setup consolidation.
...
* 'linus' of master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa: (102 commits)
[ALSA] version 1.0.14
[ALSA] remove duplicate Logitech Quickcam USB ID in usbquirks.h
[ALSA] hda-codec - Fix input with STAC92xx
[ALSA] hda-intel: support for iMac 24'' released on 09/2006
[ALSA] hda-codec - Add quirk for Asus P5LD2
[ALSA] snd-ca0106: Add support for X-Fi Extreme Audio.
[ALSA] snd-emu10k1:Enable E-Mu 1616m notebook firmware loading.
[ALSA] snd-emu10k1: Initial support for E-Mu 1616 and 1616m.
[ALSA] cs46xx - Fix PM resume
[ALSA] hda: Enable SPDIF in/out on some stac9205 boards
[ALSA] timer: check for incorrect device state in non-debug compiles, too
[ALSA] snd-aoa-codec-onyx: fix typo
[ALSA] hda-codec - Add quirks for HP dx2200/dx2250
[ALSA] hda-codec - Rename HP model-specific quirks
[ALSA] hda-codec - Add quirk for HP Samba
[ALSA] hda-codec - Add LG LW20 line-in capture source
[ALSA] usb-audio - Fix AC3 with M-Audio Audiophile USB
[ALSA] hda: stac9202 mixer fix
[ALSA] Make s3c24xx_i2s_set_clkdiv() change the correct bits
[ALSA] hda-codec - Add LG LW20 si3054 modem id
...
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh64-2.6:
sh64: Flag sh64_get_page() as __init_refok.
sh64: Move entry point code to .text.head.
sh64: Fix up PCI section mismatch warnings.
sh64: Update cayman defconfig.
sh64: Wire up fallocate() syscall.
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (29 commits)
libata: implement EH fast drain
libata: schedule probing after SError access failure during autopsy
libata: clear HOTPLUG flag after a reset
libata: reorganize ata_ehi_hotplugged()
libata: improve SCSI scan failure handling
libata: quickly trigger SATA SPD down after debouncing failed
libata: improve SATA PHY speed down logic
The SATA controller device ID is different according to
ahci: implement SCR_NOTIFICATION r/w
ahci: make NO_NCQ handling more consistent
libata: make ->scr_read/write callbacks return error code
libata: implement AC_ERR_NCQ
libata: improve EH report formatting
sata_sil24: separate out sil24_do_softreset()
sata_sil24: separate out sil24_exec_polled_cmd()
sata_sil24: replace sil24_update_tf() with sil24_read_tf()
ahci: separate out ahci_do_softreset()
ahci: separate out ahci_exec_polled_cmd()
ahci: separate out ahci_kick_engine()
ahci: use deadline instead of fixed timeout for 1st FIS for SRST
...
Andrew Morton:
[async_memcpy] is very wrong if both ASYNC_TX_KMAP_DST and
ASYNC_TX_KMAP_SRC can ever be set. We'll end up using the same kmap
slot for both src add dest and we get either corrupted data or a BUG.
Evgeniy Polyakov:
Btw, shouldn't it always be kmap_atomic() even if flag is not set.
That pages are usual one returned by alloc_page().
So fix the usage of kmap_atomic and kill the ASYNC_TX_KMAP_DST and
ASYNC_TX_KMAP_SRC flags.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Fix two year old bug in early bootup asm.
[SPARC64]: Update defconfig.
[SPARC64]: Fix log message type in vio_create_one().
[SPARC64]: Tweak assertions in sun4v_build_virq().
[SPARC64]: Tweak kernel log messages in power_probe().
[SPARC64]: Fix handling of multiple vdc-port nodes.
[SPARC64]: Fix device type matching in VIO's devspec_show().
[SPARC64]: Fix MODULE_DEVICE_TABLE() specification in VDC and VNET.
[SPARC]: Add sys_fallocate() entries.
[SPARC64]: Use orderly_poweroff().
Since we have use like ~SLUB_DMA, we ought to have the type
set right in both cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In most cases, when EH is scheduled, all in-flight commands are
aborted causing EH to kick in immediately. However, in some cases
(especially with PMP), it's unclear which commands are affected by the
error condition and although aborting all in-flight commands work, it
isn't optimal and may cause unnecessary disruption. On the other
hand, waiting for in-flight commands to drain themselves can take up
to 30seconds.
This patch implements EH fast drain to handle such situations. It
gives in-flight commands some time to finish up but doesn't wait for
too long. After EH is scheduled, fast drain timer is started and if
no other completion occurs in ATA_EH_FASTDRAIN_INTERVAL all in-flight
commands are aborted. If any completion occurred in the interval, the
port is given another interval to finish up itself.
Currently ATA_EH_FASTDRAIN_INTERVAL is 3 secs which should be enough
for finishing up most commands.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
__ata_ehi_hotplugged() now has no users. Regorganize
ata_ehi_hotplugged() such that a new function ata_ehi_schedule_probe()
deals with scheduling probing. ata_ehi_hotplugged() calls it and
additionally marks hotplug specific flags. ata_ehi_schedule_probe()
will be used laster.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
sata_down_spd_limit() first reads the current SPD from SStatus and
limit the speed to the lower one of one below the current limit or one
below the current SPD in SStatus. SPD may not be accessible or valid
when SPD down is requested making sata_down_spd_limit() fail when it's
most needed.
This patch makes the current SPD cached after each successful reset
and forces GEN I speed (1.5Gbps) if neither of SStatus or the cached
value is valid, so sata_down_spd_limit() is now guaranteed to lower
the speed limit if lower speed is available.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Convert ->scr_read/write callbacks to return error code to better
indicate failure. This will help handling of SCR_NOTIFICATION.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
When an NCQ command fails, all commands in flight are aborted and the
offending one is reported using log page 10h. Depending on controller
characteristics and LLD implementation, all commands may appear as
having a device error due to shared TF status making it hard to
determine what's actually going on.
This patch adds AC_ERR_NCQ, marks the command reported by log page 10h
with it and print extra "<F>" after the error report for the command
to help distinguishing the offending command.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Requiring LLDs to format multiple error description messages properly
doesn't work too well. Help LLDs a bit by making ata_ehi_push_desc()
insert ", " on each invocation. __ata_ehi_push_desc() is the raw
version without the automatic separator.
While at it, make ehi_desc interface proper functions instead of
macros.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add @is_cmd to ata_tf_to_fis(). This controls bit 7 of the second
byte which tells the device whether this H2D FIS is for a command or
not. This cleans up ahci a bit and will be used by PMP.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch converts the cpu specific 7750 setup code to use the
new intc controller. Many new vectors are added and multiple
processor variants including 7091, 7750, 7750s, 7750r, 7751 and
7751r should all have the correct vectors hooked up.
IRLM interrupts can be enabled using ipr_irq_enable_irlm() which
now is marked as __init.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Fixed PM resume of cs46xx devices. It now restores properly the DSP
image and kick-off the DSP.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
* adding 8 more 32-bit capture channels (total of 16) for emu1010 cards
* adding some code comments and card details description
Signed-off-by: Pavel Hofman <dustin@seznam.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
Add support for Cyrix/NatSemi Geode SC5530 (VSA1).
The driver is snd-cs5530.
Signed-off-by Ash Willis <ashwillis@programmer.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
This patch adds the support of mute for front channels of M-Audio
Revolution 7.1 (the DAC AK4381 features a mute bit).
Signed-off-by: Pavel Hofman <dustin@seznam.cz>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
This patch adds I2C ID for the LM4857 audio amp and corrects the spacing
of the WM8731, WM8750 and WM8753 ID's.
Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jaroslav Kysela <perex@suse.cz>
I changed the naming to be more obvious---unfortunately the HRM
doesn't specify these.
Moreover the numbering is changed to be zero indexed as this is more
natural.
Adjust all callers.
Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add definitions for RDPROOF, WRPROOF and PDCFBYTE bits of the Mode
Register in the updated MMC controller found on the AT91SAM9260 and
AT91SAM9263 processors.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Avoid the virt_to_bus()/bus_to_virt() warnings in floppy.c caused
by the (useless) double conversion to/from bus addresses.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In order for this driver to be shared across the iop architectures the
iop3xx and iop13xx header files are modified to present a common interface
for the iop_wdt driver.
Details:
* iop13xx supports disabling the timer while iop3xx does not. This requires
a few 'compatibility' definitions in include/asm-arm/hardware/iop3xx.h to
preclude adding #ifdef CONFIG_ARCH_IOP13XX blocks to the driver code.
* The heartbeat interval is derived from the internal bus clock rate, so this
this patch also exports the tick rate to the iop_wdt driver.
Cc: Curt Bruns <curt.e.bruns@intel.com>
Cc: Peter Milne <peter.milne@d-tacq.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
arch/arm/boot/compressed/misc.o: In function `valid_user_regs':
misc.c:(.text+0x74): undefined reference to `elf_hwcap'
This triggers after the various elf_hwcap cleanups in:
f884b1cf57d1cbbd6b41
include/asm-arm/arch-iop13xx/uncompress.h calls cpu_relax while spinning on
a register value. cpu_relax requires processor.h->ptrace.h->hwcap.h
'elf_hwcap' is defined as an extern, but since the uncompressor does not
link against arch/arm/kernel/setup.c 'elf_hwcap' remains undefined.
Fix is to open code the cpu_relax() call as barrier().
Cc: Lennert Buytenhek <kernel@wantstofly.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds the basic support for the em7210 board. It is similar to
the iq31244 board and can be found on Intel "Baxter Creek" ss4000e nas.
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
If we have two processes with different ioprio_class, but the same
ioprio_data, their async requests will fall into the same queue. I guess
such behavior is not expected, because it's not right to put real-time
requests and best-effort requests in the same queue.
The attached patch fixes the problem by introducing additional *cfqq
fields on cfqd, pointing to per-(class,priority) async queues.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This is an optional component of the clock framework. However,
as we're going to be using this in the cpufreq drivers, add
support for it to the framework.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The "id" property in vdc-port nodes are not unique, they
are all zero. Therefore assign ID's using the parent's
"cfg-handle" property which will be unique.
Signed-off-by: David S. Miller <davem@davemloft.net>
and populate it with the common parts from PowerPC and Sparc[64].
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Move common stuff from asm-powerpc/of_platform.h to here and
move the common bits from asm-sparc*/of_device.h here as well.
Create asm-sparc*/of_platform.h and move appropriate parts of
of_device.h to them.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This is to make the of merge easier. Also rename of_bus_type.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
This just moves the common stuff from the arch of_device.h files to
linux/of_device.h.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This consolidates the routines of_find_node_by_path, of_find_node_by_name,
of_find_node_by_type and of_find_compatible_device. Again, the comparison
of strings are done differently by Sparc and PowerPC and also these add
read_locks around the iterations.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This requires creating dummy of_node_{get,put} routines for sparc and
sparc64. It also adds a read_lock around the parent accesses.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
The only change here is that a readlock is taken while the property list
is being traversed on Sparc where it was not taken previously.
Also, Sparc uses strcasecmp to compare property names while PowerPC
uses strcmp.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
The only difference here is that Sparc uses strncmp to match compatibility
names while PowerPC uses strncasecmp.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This creates drivers/of/base.c (depending on CONFIG_OF) and puts
the first trivially common bits from the prom.c files into it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This patch converts the cpu specific 7780 setup code to use the
new intc controller. Many new vectors are added and also support for
external interrupt sense configuration. So with this patch it is now
possible to configure external interrupt pins as edge or level
triggered using set_irq_type().
No external interrupts are registered by default.
Use plat_irq_setup_pins() to select between IRQ or IRL mode.
This patch also fixes the Alarm IRQ for the RTC.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch unifies the cpu specific interrupt setup functions for
interrupt controller blocks such as ipr, intc2 and intc. There is no
point in having separate functions for each interrupt controller, so
let's clean this up.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch cleans up solution engine 7722 specific interrupt code.
The main purpose is to replace the mux function with use of
set_irq_chained_handler() and replace hard coded register poking
code with set_irq_type(). The board specific interrupts are also
moved to start from SE7722_FPGA_IRQ_BASE.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch converts the cpu specific 7722 setup code to use the
new intc controller. Many new vectors are added and also support
for external interrupt sense configuration. So with this patch
it is now possible to configure external interrupt pins as edge
or level triggered using set_irq_type().
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This is the second version of the shared interrupt controller patch
for the sh architecture, fixing up handling of intc_reg_fns[].
The three main advantages with this controller over the existing
ones are:
- Both priority (ipr) and bitmap (intc2) registers are
supported
- External pin sense configuration is supported, ie edge
vs level triggered
- CPU/Board specific code maps 1:1 with datasheet for
easy verification
This controller can easily coexist with the current IPR and INTC2
controllers, but the idea is that CPUs/Boards should be moved over
to this controller over time so we have a single code base to
maintain.
Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This creates linux/of.h and includes asm/prom.h from it.
We also include linux/of.h from asm/prom.h while we transition.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* Add ATA_PIO[0-6] defines to <linux/ata.h>.
* Add ->pio_mask field to ide_pci_device_t and ide_hwif_t.
* Add PIO masks to host drivers.
<linux/ata.h> change ACK-ed by Jeff Garzik <jeff@garzik.org>.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ->host_flags to ide_hwif_t to store ide_pci_device_t.host_flags,
assign it in setup-pci.c:ide_pci_setup_ports().
* Add IDE_HFLAG_PIO_NO_{BLACKLIST,DOWNGRADE} to ide_pci_device_t.host_flags
and teach ide_get_best_pio_mode() about them. Also remove needless
!drive->id check while at it (drive->id is always present).
* Convert amd74xx, via82cxxx and ide-timing.h to use ide_get_best_pio_mode()
and then remove no longer needed ide_find_best_pio_mode().
There should be no functionality changes caused by this patch.
Acked-by: Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Drop no longer needed "PIO data" argument from ide_get_best_pio_mode()
and convert all users accordingly.
* Remove no longer needed ide_pio_data_t.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add ide_pio_cycle_time() helper.
* Use it in ali14xx/ht6560b/qd65xx/cmd64{0,x}/sl82c105 and pmac host drivers
(previously cycle time given by the device was only used for "pio" == 255).
* Remove no longer needed ide_pio_data_t.cycle_time field.
v2:
* Fix "ata_" prefix (Noticed by Jeff).
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Rename ide_pci_device_t.flags to ide_pci_device_t.host_flags
and IDEPCI_FLAG_ISA_PORTS flag to IDE_HFLAG_ISA_PORTS.
* Add IDE_HFLAG_SINGLE flag for single channel devices.
* Convert core code and all IDE PCI drivers to use IDE_HFLAG_SINGLE
and remove no longer needed ide_pci_device_t.channels field.
v2:
* Fix issues noticed by Sergei:
- correct code alignment in scc_pata.c
- s/IDE_HFLAG_SINGLE/~IDE_HFLAG_SINGLE/ in serverworks.c
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Print info about overriding PIO mode in ide_get_best_pio_mode().
* Remove info about overriding PIO mode from cmd64{0,x} host drivers.
* Remove no longer needed ide_pio_data_t.overridden field.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/selinux-2.6:
SELinux: use SECINITSID_NETMSG instead of SECINITSID_UNLABELED for NetLabel
SELinux: enable dynamic activation/deactivation of NetLabel/SELinux enforcement
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Fix inode size update before data write in xfs_setattr
[XFS] Allow punching holes to free space when at ENOSPC
[XFS] Implement ->page_mkwrite in XFS.
[FS] Implement block_page_mkwrite.
Manually fix up conflict with Nick's VM fault handling patches in
fs/xfs/linux-2.6/xfs_file.c
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, CONFIG_X86_CMPXCHG64 both enables boot-time checking of
the cmpxchg64b feature and enables compilation of the set_64bit() family.
Since the option is dependent on PAE, and since KVM depends on set_64bit(),
this effectively disables KVM on i386 nopae.
Simplify by removing the config option altogether: the boot check is made
dependent on CONFIG_X86_PAE directly, and the set_64bit() family is exposed
without constraints. It is up to users to check for the feature flag (KVM
does not as virtualiation extensions imply its existence).
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.linux-nfs.org/pub/linux/nfs-2.6:
NFSv4: handle lack of clientaddr in option string
NFSv4: debug print ntohl(status) in nfs client callback xdr code
SUNRPC: Clean up the sillyrename code
NFS: Introduce struct nfs_removeargs+nfs_removeres
NFS: Use dentry->d_time to store the parent directory verifier.
SUNRPC: move bkl locking and xdr proc invocation into a common helper
NFSv4: Fix the nfsv4 readlink reply buffer alignment
NFSv4: Fix the readdir reply buffer alignment
NFSv4: More NFSv4 xdr cleanups
NFSv4: Try to recover from getfh failures in nfs4_xdr_dec_open
NFSv4: 'constify' lookup arguments.
NFSv4: Don't fail nfs4_xdr_dec_open if decode_restorefh() failed
NFSv4: Fix open state recovery
NFSD/SUNRPC: Fix the automatic selection of RPCSEC_GSS
* 'release' of git://lm-sensors.org/kernel/mhoffman/hwmon-2.6: (44 commits)
i2c: Delete the i2c-isa pseudo bus driver
hwmon: refuse to load abituguru driver on non-Abit boards
hwmon: fix Abit Uguru3 driver detection on some motherboards
hwmon/w83627ehf: Be quiet when no chip is found
hwmon/w83627ehf: No need to initialize fan_min
hwmon/w83627ehf: Export the thermal sensor types
hwmon/w83627ehf: Enable VBAT monitoring
hwmon/w83627ehf: Add support for the VID inputs
hwmon/w83627ehf: Fix timing issues
hwmon/w83627ehf: Add error messages for two error cases
hwmon/w83627ehf: Convert to a platform driver
hwmon/w83627ehf: Update the Kconfig entry
make coretemp_device_remove() static
hwmon: Add LM93 support
hwmon: Improve the pwmN_enable documentation
hwmon/smsc47b397: Don't report missing fans as spinning at 82 RPM
hwmon: Add support for newer uGuru's
hwmon/f71805f: Add temperature-tracking fan control mode
hwmon/w83627ehf: Preserve speed reading when changing fan min
hwmon: fix detection of abituguru volt inputs
...
Manual fixup of trivial conflict in MAINTAINERS file
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
[PATCH] sched: implement cpu_clock(cpu) high-speed time source
[PATCH] sched: fix the all pinned logic in load_balance_newidle()
[PATCH] sched: fix newly idle load balance in case of SMT
[PATCH] sched: sched_cacheflush is now unused
When a CONFIG_USER_NS=n and a user tries to unshare some namespace other
than the user namespace, the dummy copy_user_ns returns NULL rather than
the old_ns.
This value then gets assigned to task->nsproxy->user_ns, so that a
subsequent setuid, which uses task->nsproxy->user_ns, causes a NULL
pointer deref.
Fix this by returning old_ns.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sys_fallocate for ia64. This uses an empty slot #1303 erroneously
marked as reserved for move_pages (which had already been allocated
as syscall #1276)
Signed-Off-By: Dave Chinner <dgc@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Implement the cpu_clock(cpu) interface for kernel-internal use:
high-speed (but slightly incorrect) per-cpu clock constructed from
sched_clock().
This API, unused at the moment, will be used in the future by blktrace,
by the softlockup-watchdog, by printk and by lockstat.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since Ingo's recent scheduler rewrite which was merged as commit
0437e109e1 sched_cacheflush is unused.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix a couple of bugs:
- Don't rely on the parent dentry still being valid when the call completes.
Fixes a race with shrink_dcache_for_umount_subtree()
- Don't remove the file if the filehandle has been labelled as stale.
Fix a couple of inefficiencies
- Remove the global list of sillyrenamed files. Instead we can cache the
sillyrename information in the dentry->d_fsdata
- Move common code from unlink_setup/unlink_done into fs/nfs/unlink.c
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We need a common structure for setting up an unlink() rpc call in order to
fix the asynchronous unlink code.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since every invocation of xdr encode or decode functions takes the BKL now,
there's a lot of redundant lock_kernel/unlock_kernel pairs that we can pull
out into a common function.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Kristian Høgsberg <krh@redhat.com>
collapsed with fw-sbp2 patch "Drop cast to non-const char * in host
template initialization." from Kristian Høgsberg
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
pci_ids.h needs two of the AMD NB device-ids namely, Addressmap and the Memory
Controller devices
This patch adds those to the pci_id.h include file
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change error check and clear variable from an atomic to an int
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's a driver for the Intel 3000 and 3010 memory controllers,
relative to today's Sourceforge code drop. This has only had light
testing (I've yet to actually see it handle a memory error) but it
detects my hardware correctly.
Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provides a way for NMI reported errors on x86 to notify the EDAC
subsystem pending ECC errors by writing to a software state variable.
Here's the reworked patch. I added an EDAC stub to the kernel so we can
have variables that are in the kernel even if EDAC is a module. I also
implemented the idea of using the chip driver to select error detection
mode via module parameter and eliminate the kernel compile option.
Please review/test. Thx!
Also, I only made changes to some of the chipset drivers since I am
unfamiliar with the other ones. We can add similar changes as we go.
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the code for the "lg.ko" module, which allows lguest guests to
be launched.
[akpm@linux-foundation.org: update for futex-new-private-futexes]
[akpm@linux-foundation.org: build fix]
[jmorris@namei.org: lguest: use hrtimers]
[akpm@linux-foundation.org: x86_64 build fix]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lguest is a simple hypervisor for Linux on Linux. Unlike kvm it doesn't need
VT/SVM hardware. Unlike Xen it's simply "modprobe and go". Unlike both, it's
5000 lines and self-contained.
Performance is ok, but not great (-30% on kernel compile). But given its
hackability, I expect this to improve, along with the paravirt_ops code which
it supplies a complete example for. There's also a 64-bit version being
worked on and other craziness.
But most of all, lguest is awesome fun! Too much of the kernel is a big ball
of hair. lguest is simple enough to dive into and hack, plus has some warts
which scream "fork me!".
This patch:
This is the code and headers required to make an i386 kernel an lguest guest.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Share a little common code, reverse the arguments for consistency, drop the
unnecessary "inline", and lowercase the name.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
EX_RDONLY is only called in one place; just put it there.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page-writeback accounting is presently performed in the page-flags macros.
This is inconsistent and a bit ugly and makes it awkward to implement
per-backing_dev under-writeback page accounting.
So move this accounting down to the callsite(s).
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove is_in_rom() function. It doesn't actually serve the purpose it was
intended to. If you look at the use of it _access_ok() (which is the only use
of it) then it is obvious that most of memory is marked as access_ok. No
point having is_in_rom() then, so remove it.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the m68knommu irq handling to use the generic irq framework.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The print_stack_trace macro in stacktrace.h has a wrong number of
arguments, fix it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__acquire
|
lock _____
| \
| __contended
| |
| wait
| _______/
|/
|
__acquired
|
__release
|
unlock
We measure acquisition and contention bouncing.
This is done by recording a cpu stamp in each lock instance.
Contention bouncing requires the cpu stamp to be set on acquisition. Hence we
move __acquired into the generic path.
__acquired is then used to measure acquisition bouncing by comparing the
current cpu with the old stamp before replacing it.
__contended is used to measure contention bouncing (only useful for preemptable
locks)
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- update the copyright notices
- use the default hash function
- fix a thinko in a BUILD_BUG_ON
- add a WARN_ON to spot inconsitent naming
- fix a termination issue in /proc/lock_stat
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce the core lock statistics code.
Lock statistics provides lock wait-time and hold-time (as well as the count
of corresponding contention and acquisitions events). Also, the first few
call-sites that encounter contention are tracked.
Lock wait-time is the time spent waiting on the lock. This provides insight
into the locking scheme, that is, a heavily contended lock is indicative of
a too coarse locking scheme.
Lock hold-time is the duration the lock was held, this provides a reference for
the wait-time numbers, so they can be put into perspective.
1)
lock
2)
... do stuff ..
unlock
3)
The time between 1 and 2 is the wait-time. The time between 2 and 3 is the
hold-time.
The lockdep held-lock tracking code is reused, because it already collects locks
into meaningful groups (classes), and because it is an existing infrastructure
for lock instrumentation.
Currently lockdep tracks lock acquisition with two hooks:
lock()
lock_acquire()
_lock()
... code protected by lock ...
unlock()
lock_release()
_unlock()
We need to extend this with two more hooks, in order to measure contention.
lock_contended() - used to measure contention events
lock_acquired() - completion of the contention
These are then placed the following way:
lock()
lock_acquire()
if (!_try_lock())
lock_contended()
_lock()
lock_acquired()
... do locked stuff ...
unlock()
lock_release()
_unlock()
(Note: the try_lock() 'trick' is used to avoid instrumenting all platform
dependent lock primitive implementations.)
It is also possible to toggle the two lockdep features at runtime using:
/proc/sys/kernel/prove_locking
/proc/sys/kernel/lock_stat
(esp. turning off the O(n^2) prove_locking functionaliy can help)
[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: nuke unneeded ifdefs]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the lockdep infrastructure to track lock contention and other lock
statistics.
It tracks lock contention events, and the first four unique call-sites that
encountered contention.
It also measures lock wait-time and hold-time in nanoseconds. The minimum and
maximum times are tracked, as well as a total (which together with the number
of event can give the avg).
All statistics are done per lock class, per write (exclusive state) and per read
(shared state).
The statistics are collected per-cpu, so that the collection overhead is
minimized via having no global cachemisses.
This new lock statistics feature is independent of the lock dependency checking
traditionally done by lockdep; it just shares the lock tracking code. It is
also possible to enable both and runtime disabled either component - thereby
avoiding the O(n^2) lock chain walks for instance.
This patch:
raw_spinlock_t should not use lockdep (and doesn't) since lockdep itself
relies on it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar information can easily be obtained with strace -c.
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The sb_info structure only contains a single pointer to the character device,
there is no need for the added indirection.
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We ignore signals for about 30 seconds to give userspace a chance to see the
upcall. As we did not block signals we ended up in a busy loop for the
remainder of the period when a signal is received.
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes the i386 linker script and the asm-generic macro it uses so that
ELF note sections with SHF_ALLOC set are linked into the kernel image along
with other read-only data. The PT_NOTE also points to their location.
This paves the way for putting useful build-time information into ELF notes
that can be found easily later in a kernel memory dump.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds an interface to set/reset flags which determines each memory
segment should be dumped or not when a core file is generated.
/proc/<pid>/coredump_filter file is provided to access the flags. You can
change the flag status for a particular process by writing to or reading from
the file.
The flag status is inherited to the child process when it is created.
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch changes mm_struct.dumpable to a pair of bit flags.
set_dumpable() converts three-value dumpable to two flags and stores it into
lower two bits of mm_struct.flags instead of mm_struct.dumpable.
get_dumpable() behaves in the opposite way.
[akpm@linux-foundation.org: export set_dumpable]
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stackable file systems, among others, frequently need to lookup paths or
path components starting from an arbitrary point in the namespace
(identified by a dentry and a vfsmount). Currently, such file systems use
lookup_one_len, which is frowned upon [1] as it does not pass the lookup
intent along; not passing a lookup intent, for example, can trigger BUG_ON's
when stacking on top of NFSv4.
The first patch introduces a new lookup function to allow lookup starting
from an arbitrary point in the namespace. This approach has been suggested
by Christoph Hellwig [2].
The second patch changes sunrpc to use vfs_path_lookup.
The third patch changes nfsctl.c to use vfs_path_lookup.
The fourth patch marks link_path_walk static.
The fifth, and last patch, unexports path_walk because it is no longer
unnecessary to call it directly, and using the new vfs_path_lookup is
cleaner.
For example, the following snippet of code, looks up "some/path/component"
in a directory pointed to by parent_{dentry,vfsmnt}:
err = vfs_path_lookup(parent_dentry, parent_vfsmnt,
"some/path/component", 0, &nd);
if (!err) {
/* exits */
...
/* once done, release the references */
path_release(&nd);
} else if (err == -ENOENT) {
/* doesn't exist */
} else {
/* other error */
}
VFS functions such as lookup_create can be used on the nameidata structure
to pass the create intent to the file system.
Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
the old mm into the new mm.
We create the new mm before the binfmt code runs, and place the new stack at
the very top of the address space. Once the binfmt code runs and figures out
where the stack should be, we move it downwards.
It is a bit peculiar in that we have one task with two mm's, one of which is
inactive.
[a.p.zijlstra@chello.nl: limit stack size]
Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-arch@vger.kernel.org>
Cc: Hugh Dickins <hugh@veritas.com>
[bunk@stusta.de: unexport bprm_mm_init]
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The purpose of audit_bprm() is to log the argv array to a userspace daemon at
the end of the execve system call. Since user-space hasn't had time to run,
this array is still in pristine state on the process' stack; so no need to
copy it, we can just grab it from there.
In order to minimize the damage to audit_log_*() copy each string into a
temporary kernel buffer first.
Currently the audit code requires that the full argument vector fits in a
single packet. So currently it does clip the argv size to a (sysctl) limit,
but only when execve auditing is enabled.
If the audit protocol gets extended to allow for multiple packets this check
can be removed.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ollie Wild <aaw@google.com>
Cc: <linux-audit@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
New arch macro STACK_TOP_MAX it gives the larges valid stack address for the
architecture in question.
It differs from STACK_TOP in that it will not distinguish between
personalities but will always return the largest possible address.
This is used to create the initial stack on execve, which we will move down to
the proper location once the binfmt code has figured out where that is.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.
One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.
This patch:
Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I realise jprobes are a razor-blades-included type of interface, but that
doesn't mean we can't try and make them safer to use. This guy I know once
wrote code like this:
struct jprobe jp = { .kp.symbol_name = "foo", .entry = "jprobe_foo" };
And then his kernel exploded. Oops.
This patch adds an arch hook, arch_deref_entry_point() (I don't like it
either) which takes the void * in a struct jprobe, and gives back the text
address that it represents.
We can then use that in register_jprobe() to check that the entry point we're
passed is actually in the kernel text, rather than just some random value.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything
useful - so remove it ..
I've left a do-nothing version so that out-of-tree jprobes code will still
compile without modifications.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently jprobe.entry is a kprobe_opcode_t *, but that's a lie. On some
platforms it doesn't point to an opcode at all, it points to a function
descriptor.
It's really a pointer to something that the arch code can turn into a function
entry point. And that's what actually happens, none of the generic code ever
looks at jprobe.entry, it's only ever dereferenced by arch code.
So just make it a void *.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename some file_ra_state variables and remove some accessors.
It results in much simpler code.
Kudos to Rusty!
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Split ondemand readahead interface into two functions. I think this makes it
a little clearer for non-readahead experts (like Rusty).
Internally they both call ondemand_readahead(), but the page argument is
changed to an obvious boolean flag.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Share the same page flag bit for PG_readahead and PG_reclaim.
One is used only on file reads, another is only for emergency writes. One
is used mostly for fresh/young pages, another is for old pages.
Combinations of possible interactions are:
a) clear PG_reclaim => implicit clear of PG_readahead
it will delay an asynchronous readahead into a synchronous one
it actually does _good_ for readahead:
the pages will be reclaimed soon, it's readahead thrashing!
in this case, synchronous readahead makes more sense.
b) clear PG_readahead => implicit clear of PG_reclaim
one(and only one) page will not be reclaimed in time
it can be avoided by checking PageWriteback(page) in readahead first
c) set PG_reclaim => implicit set of PG_readahead
will confuse readahead and make it restart the size rampup process
it's a trivial problem, and can mostly be avoided by checking
PageWriteback(page) first in readahead
d) set PG_readahead => implicit set of PG_reclaim
PG_readahead will never be set on already cached pages.
PG_reclaim will always be cleared on dirtying a page.
so not a problem.
In summary,
a) we get better behavior
b,d) possible interactions can be avoided
c) racy condition exists that might affect readahead, but the chance
is _really_ low, and the hurt on readahead is trivial.
Compound pages also use PG_reclaim, but for now they do not interact with
reclaim/readahead code.
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the old readahead algorithm.
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Steven Pratt <slpratt@austin.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a minimal readahead algorithm that aims to replace the current one.
It is more flexible and reliable, while maintaining almost the same behavior
and performance. Also it is full integrated with adaptive readahead.
It is designed to be called on demand:
- on a missing page, to do synchronous readahead
- on a lookahead page, to do asynchronous readahead
In this way it eliminated the awkward workarounds for cache hit/miss,
readahead thrashing, retried read, and unaligned read. It also adopts the
data structure introduced by adaptive readahead, parameterizes readahead
pipelining with `lookahead_index', and reduces the current/ahead windows to
one single window.
HEURISTICS
The logic deals with four cases:
- sequential-next
found a consistent readahead window, so push it forward
- random
standalone small read, so read as is
- sequential-first
create a new readahead window for a sequential/oversize request
- lookahead-clueless
hit a lookahead page not associated with the readahead window,
so create a new readahead window and ramp it up
In each case, three parameters are determined:
- readahead index: where the next readahead begins
- readahead size: how much to readahead
- lookahead size: when to do the next readahead (for pipelining)
BEHAVIORS
The old behaviors are maximally preserved for trivial sequential/random reads.
Notable changes are:
- It no longer imposes strict sequential checks.
It might help some interleaved cases, and clustered random reads.
It does introduce risks of a random lookahead hit triggering an
unexpected readahead. But in general it is more likely to do good
than to do evil.
- Interleaved reads are supported in a minimal way.
Their chances of being detected and proper handled are still low.
- Readahead thrashings are better handled.
The current readahead leads to tiny average I/O sizes, because it
never turn back for the thrashed pages. They have to be fault in
by do_generic_mapping_read() one by one. Whereas the on-demand
readahead will redo readahead for them.
OVERHEADS
The new code reduced the overheads of
- excessively calling the readahead routine on small sized reads
(the current readahead code insists on seeing all requests)
- doing a lot of pointless page-cache lookups for small cached files
(the current readahead only turns itself off after 256 cache hits,
unfortunately most files are < 1MB, so never see that chance)
That accounts for speedup of
- 0.3% on 1-page sequential reads on sparse file
- 1.2% on 1-page cache hot sequential reads
- 3.2% on 256-page cache hot sequential reads
- 1.3% on cache hot `tar /lib`
However, it does introduce one extra page-cache lookup per cache miss, which
impacts random reads slightly. That's 1% overheads for 1-page random reads on
sparse file.
PERFORMANCE
The basic benchmark setup is
- 2.6.20 kernel with on-demand readahead
- 1MB max readahead size
- 2.9GHz Intel Core 2 CPU
- 2GB memory
- 160G/8M Hitachi SATA II 7200 RPM disk
The benchmarks show that
- it maintains the same performance for trivial sequential/random reads
- sysbench/OLTP performance on MySQL gains up to 8%
- performance on readahead thrashing gains up to 3 times
iozone throughput (KB/s): roughly the same
==========================================
iozone -c -t1 -s 4096m -r 64k
2.6.20 on-demand gain
first run
" Initial write " 61437.27 64521.53 +5.0%
" Rewrite " 47893.02 48335.20 +0.9%
" Read " 62111.84 62141.49 +0.0%
" Re-read " 62242.66 62193.17 -0.1%
" Reverse Read " 50031.46 49989.79 -0.1%
" Stride read " 8657.61 8652.81 -0.1%
" Random read " 13914.28 13898.23 -0.1%
" Mixed workload " 19069.27 19033.32 -0.2%
" Random write " 14849.80 14104.38 -5.0%
" Pwrite " 62955.30 65701.57 +4.4%
" Pread " 62209.99 62256.26 +0.1%
second run
" Initial write " 60810.31 66258.69 +9.0%
" Rewrite " 49373.89 57833.66 +17.1%
" Read " 62059.39 62251.28 +0.3%
" Re-read " 62264.32 62256.82 -0.0%
" Reverse Read " 49970.96 50565.72 +1.2%
" Stride read " 8654.81 8638.45 -0.2%
" Random read " 13901.44 13949.91 +0.3%
" Mixed workload " 19041.32 19092.04 +0.3%
" Random write " 14019.99 14161.72 +1.0%
" Pwrite " 64121.67 68224.17 +6.4%
" Pread " 62225.08 62274.28 +0.1%
In summary, writes are unstable, reads are pretty close on average:
access pattern 2.6.20 on-demand gain
Read 62085.61 62196.38 +0.2%
Re-read 62253.49 62224.99 -0.0%
Reverse Read 50001.21 50277.75 +0.6%
Stride read 8656.21 8645.63 -0.1%
Random read 13907.86 13924.07 +0.1%
Mixed workload 19055.29 19062.68 +0.0%
Pread 62217.53 62265.27 +0.1%
aio-stress: roughly the same
============================
aio-stress -l -s4096 -r128 -t1 -o1 knoppix511-dvd-cn.iso
aio-stress -l -s4096 -r128 -t1 -o3 knoppix511-dvd-cn.iso
2.6.20 on-demand delta
sequential 92.57s 92.54s -0.0%
random 311.87s 312.15s +0.1%
sysbench fileio: roughly the same
=================================
sysbench --test=fileio --file-io-mode=async --file-test-mode=rndrw \
--file-total-size=4G --file-block-size=64K \
--num-threads=001 --max-requests=10000 --max-time=900 run
threads 2.6.20 on-demand delta
first run
1 59.1974s 59.2262s +0.0%
2 58.0575s 58.2269s +0.3%
4 48.0545s 47.1164s -2.0%
8 41.0684s 41.2229s +0.4%
16 35.8817s 36.4448s +1.6%
32 32.6614s 32.8240s +0.5%
64 23.7601s 24.1481s +1.6%
128 24.3719s 23.8225s -2.3%
256 23.2366s 22.0488s -5.1%
second run
1 59.6720s 59.5671s -0.2%
8 41.5158s 41.9541s +1.1%
64 25.0200s 23.9634s -4.2%
256 22.5491s 20.9486s -7.1%
Note that the numbers are not very stable because of the writes.
The overall performance is close when we sum all seconds up:
sum all up 495.046s 491.514s -0.7%
sysbench oltp (trans/sec): up to 8% gain
========================================
sysbench --test=oltp --oltp-table-size=10000000 --oltp-read-only \
--mysql-socket=/var/run/mysqld/mysqld.sock \
--mysql-user=root --mysql-password=readahead \
--num-threads=064 --max-requests=10000 --max-time=900 run
10000-transactions run
threads 2.6.20 on-demand gain
1 62.81 64.56 +2.8%
2 67.97 70.93 +4.4%
4 81.81 85.87 +5.0%
8 94.60 97.89 +3.5%
16 99.07 104.68 +5.7%
32 95.93 104.28 +8.7%
64 96.48 103.68 +7.5%
5000-transactions run
1 48.21 48.65 +0.9%
8 68.60 70.19 +2.3%
64 70.57 74.72 +5.9%
2000-transactions run
1 37.57 38.04 +1.3%
2 38.43 38.99 +1.5%
4 45.39 46.45 +2.3%
8 51.64 52.36 +1.4%
16 54.39 55.18 +1.5%
32 52.13 54.49 +4.5%
64 54.13 54.61 +0.9%
That's interesting results. Some investigations show that
- MySQL is accessing the db file non-uniformly: some parts are
more hot than others
- It is mostly doing 4-page random reads, and sometimes doing two
reads in a row, the latter one triggers a 16-page readahead.
- The on-demand readahead leaves many lookahead pages (flagged
PG_readahead) there. Many of them will be hit, and trigger
more readahead pages. Which might save more seeks.
- Naturally, the readahead windows tend to lie in hot areas,
and the lookahead pages in hot areas is more likely to be hit.
- The more overall read density, the more possible gain.
That also explains the adaptive readahead tricks for clustered random reads.
readahead thrashing: 3 times better
===================================
We boot kernel with "mem=128m single", and start a 100KB/s stream on every
second, until reaching 200 streams.
max throughput min avg I/O size
2.6.20: 5MB/s 16KB
on-demand: 15MB/s 140KB
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Steven Pratt <slpratt@austin.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extend struct file_ra_state to support the on-demand readahead logic. Also
define some helpers for it.
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Steven Pratt <slpratt@austin.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new page flag: PG_readahead.
It acts as a look-ahead mark, which tells the page reader: Hey, it's time to
invoke the read-ahead logic. For the sake of I/O pipelining, don't wait until
it runs out of cached pages!
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Cc: Steven Pratt <slpratt@austin.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix type issue reported by latest 'sparse': kiocb.ki_flags should be
"unsigned long" (not "long"), to match bitop type signature.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
unregister_chrdev() does not return meaningful value. This patch makes it
return void like most unregister_* functions.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move "debug during resume from s2ram" into the variable we already use
for real-mode flags to simplify code. It also closes nasty trap for
the user in acpi_sleep_setup; order of parameters actually mattered there,
acpi_sleep=s3_bios,s3_mode doing something different from
acpi_sleep=s3_mode,s3_bios.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a feature allowing the user to make the system beep during a resume from
suspend to RAM, on x86_64 and i386.
This is useful for the users with broken resume from RAM, so that they can
verify if the control reaches the kernel after a wake-up event.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce the pm_power_off_prepare() callback that can be registered by the
interested platforms in analogy with pm_idle() and pm_power_off(), used for
preparing the system to power off (needed by ACPI).
This allows us to drop acpi_sysclass and device_acpi that are only defined in
order to register the ACPI power off preparation callback, which is needed by
pm_power_off() registered in a much different way.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make it possible to register hibernation and suspend notifiers, so that
subsystems can perform hibernation-related or suspend-related operations that
should not be carried out by device drivers' .suspend() and .resume()
routines.
[akpm@linux-foundation.org: build fixes]
[akpm@linux-foundation.org: cleanups]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kernel threads should not have TIF_FREEZE set when user space processes are
being frozen, since otherwise some of them might be frozen prematurely.
To prevent this from happening we can (1) make exit_mm() unset TIF_FREEZE
unconditionally just after clearing tsk->mm and (2) make try_to_freeze_tasks()
check if p->mm is different from zero and PF_BORROWED_MM is unset in p->flags
when user space processes are to be frozen.
Namely, when user space processes are being frozen, we only should set
TIF_FREEZE for tasks that have p->mm different from NULL and don't have
PF_BORROWED_MM set in p->flags. For this reason task_lock() must be used to
prevent try_to_freeze_tasks() from racing with use_mm()/unuse_mm(), in which
p->mm and p->flags.PF_BORROWED_MM are changed under task_lock(p). Also, we
need to prevent the following scenario from happening:
* daemonize() is called by a task spawned from a user space code path
* freezer checks if the task has p->mm set and the result is positive
* task enters exit_mm() and clears its TIF_FREEZE
* freezer sets TIF_FREEZE for the task
* task calls try_to_freeze() and goes to the refrigerator, which is wrong at
that point
This requires us to acquire task_lock(p) before p->flags.PF_BORROWED_MM and
p->mm are examined and release it after TIF_FREEZE is set for p (or it turns
out that TIF_FREEZE should not be set).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
At least on some machines it is necessary to prepare the ACPI firmware for the
restoration of the system memory state from the hibernation image if the
"platform" mode of hibernation has been used. Namely, in that cases we need
to disable the GPEs before replacing the "boot" kernel with the "frozen"
kernel (cf. http://bugzilla.kernel.org/show_bug.cgi?id=7887). After the
restore they will be re-enabled by hibernation_ops->finish(), but if the
restore fails, they have to be re-enabled by the restore code explicitly.
For this purpose we can introduce two additional hibernation operations,
called pre_restore() and restore_cleanup() and call them from the restore code
path. Still, they should be called if the "platform" mode of hibernation has
been used, so we need to pass the information about the hibernation mode from
the "frozen" kernel to the "boot" kernel in the image header.
Apparently, we can't drop the disabling of GPEs before the restore because of
Bug #7887 . We also can't do it unconditionally, because the GPEs wouldn't
have been enabled after a successful restore if the suspend had been done in
the 'shutdown' or 'reboot' mode.
In principle we could (and probably should) unconditionally disable the GPEs
before each snapshot creation *and* before the restore, but then we'd have to
unconditionally enable them after the snapshot creation as well as after the
restore (or restore failure) Still, for this purpose we'd need to modify
acpi_enter_sleep_state_prep() and acpi_leave_sleep_state() and we'd have to
introduce some mechanism synchronizing the disablind/enabling of the GPEs with
the device drivers' .suspend()/.resume() routines and with
disable_/enable_nonboot_cpus(). However, this would have affected the
suspend (ie. s2ram) code as well as the hibernation, which I'd like to avoid
in this patch series.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alloc_zeroed_user_highpage() has no in-tree users and it is not exported.
As it is not exported, it can simply be removed.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer. This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).
[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change ->fault prototype. We now return an int, which contains
VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
FAULT_RET_ code tells the VM whether a page was found, whether it has been
locked, and potentially other things. This is not quite the way he wanted
it yet, but that's changed in the next patch (which requires changes to
arch code).
This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
that a page is locked which requires filemap_nopage to go away (because we
can no longer remain backward compatible without that flag), but we were
going to do that anyway.
struct fault_data is renamed to struct vm_fault as Linus asked. address
is now a void __user * that we should firmly encourage drivers not to use
without really good reason.
The page is now returned via a page pointer in the vm_fault struct.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.
->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping. The hitch here
is that the ->nopage handler didn't pass down enough information (ie. pgoff).
But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).
Having the populate handler install the pte itself is likewise a nasty thing
to be doing.
This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn. Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.
The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.
After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache. Seems like a fringe functionality anyway.
NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no
users have hit mainline yet.
[akpm@linux-foundation.org: cleanup]
[randy.dunlap@oracle.com: doc. fixes for readahead]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the race between invalidate_inode_pages and do_no_page.
Andrea Arcangeli identified a subtle race between invalidation of pages from
pagecache with userspace mappings, and do_no_page.
The issue is that invalidation has to shoot down all mappings to the page,
before it can be discarded from the pagecache. Between shooting down ptes to
a particular page, and actually dropping the struct page from the pagecache,
do_no_page from any process might fault on that page and establish a new
mapping to the page just before it gets discarded from the pagecache.
The most common case where such invalidation is used is in file truncation.
This case was catered for by doing a sort of open-coded seqlock between the
file's i_size, and its truncate_count.
Truncation will decrease i_size, then increment truncate_count before
unmapping userspace pages; do_no_page will read truncate_count, then find the
page if it is within i_size, and then check truncate_count under the page
table lock and back out and retry if it had subsequently been changed (ptl
will serialise against unmapping, and ensure a potentially updated
truncate_count is actually visible).
Complexity and documentation issues aside, the locking protocol fails in the
case where we would like to invalidate pagecache inside i_size. do_no_page
can come in anytime and filemap_nopage is not aware of the invalidation in
progress (as it is when it is outside i_size). The end result is that
dangling (->mapping == NULL) pages that appear to be from a particular file
may be mapped into userspace with nonsense data. Valid mappings to the same
place will see a different page.
Andrea implemented two working fixes, one using a real seqlock, another using
a page->flags bit. He also proposed using the page lock in do_no_page, but
that was initially considered too heavyweight. However, it is not a global or
per-file lock, and the page cacheline is modified in do_no_page to increment
_count and _mapcount anyway, so a further modification should not be a large
performance hit. Scalability is not an issue.
This patch implements this latter approach. ->nopage implementations return
with the page locked if it is possible for their underlying file to be
invalidated (in that case, they must set a special vm_flags bit to indicate
so). do_no_page only unlocks the page after setting up the mapping
completely. invalidation is excluded because it holds the page lock during
invalidation of each page (and ensures that the page is not mapped while
holding the lock).
This also allows significant simplifications in do_no_page, because we have
the page locked in the right place in the pagecache from the start.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Create a new NetLabel KAPI interface, netlbl_enabled(), which reports on the
current runtime status of NetLabel based on the existing configuration. LSMs
that make use of NetLabel, i.e. SELinux, can use this new function to determine
if they should perform NetLabel access checks. This patch changes the
NetLabel/SELinux glue code such that SELinux only enforces NetLabel related
access checks when netlbl_enabled() returns true.
At present NetLabel is considered to be enabled when there is at least one
labeled protocol configuration present. The result is that by default NetLabel
is considered to be disabled, however, as soon as an administrator configured
a CIPSO DOI definition NetLabel is enabled and SELinux starts enforcing
NetLabel related access controls - including unlabeled packet controls.
This patch also tries to consolidate the multiple "#ifdef CONFIG_NETLABEL"
blocks into a single block to ease future review as recommended by Linus.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: James Morris <jmorris@namei.org>
Many filesystems need a ->page-mkwrite callout to correctly
set up pages that have been written to by mmap. This is especially
important when mmap is writing into holes as it allows filesystems
to correctly account for and allocate space before the mmap
write is allowed to proceed.
Protection against truncate races is provided by locking the page
and checking to see whether the page mapping is correct and whether
it is beyond EOF so we don't end up allowing allocations beyond
the current EOF or changing EOF as a result of a mmap write.
SGI-PV: 940392
SGI-Modid: 2.6.x-xfs-melb:linux:29146a
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
* 'for-linus' of git://linux-nfs.org/~bfields/linux:
locks: fix vfs_test_lock() comment
locks: make posix_test_lock() interface more consistent
nfs: disable leases over NFS
gfs2: stop giving out non-cluster-coherent leases
locks: export setlease to filesystems
locks: provide a file lease method enabling cluster-coherent leases
locks: rename lease functions to reflect locks.c conventions
locks: share more common lease code
locks: clean up lease_alloc()
locks: convert an -EINVAL return to a BUG
leases: minor break_lease() comment clarification
Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or
presence of a conflict lock by setting fl_type to, respectively, F_UNLCK
or something other than F_UNLCK, the return value is no longer needed.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Currently leases are only kept locally, so there's no way for a distributed
filesystem to enforce them against multiple clients. We're particularly
interested in the case of nfsd exporting a cluster filesystem, in which
case nfsd needs cluster-coherent leases in order to implement delegations
correctly.
Also add some documentation.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
We've been using the convention that vfs_foo is the function that calls
a filesystem-specific foo method if it exists, or falls back on a
generic method if it doesn't; thus vfs_foo is what is called when some
other part of the kernel (normally lockd or nfsd) wants to get a lock,
whereas foo is what filesystems call to use the underlying local
functionality as part of their lock implementation.
So rename setlease to vfs_setlease (which will call a
filesystem-specific setlease after a later patch) and __setlease to
setlease.
Also, vfs_setlease need only be GPL-exported as long as it's only needed
by lockd and nfsd.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
This interface allows the ability to write the majority of a driver in
userspace with only a very small shell of a driver in the kernel itself.
It uses a char device and sysfs to interact with a userspace process to
process interrupts and control memory accesses.
See the docbook documentation for more details on how to use this
interface.
From: Hans J. Koch <hjk@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This defines a dev_vdbg() call, which is enabled with -DVERBOSE_DEBUG.
When enabled, dev_vdbg() acts just like dev_dbg(). When disabled, it is a
NOP ... just like dev_dbg() without -DDEBUG. The specific code was moved
out of a USB patch, but lots of drivers have similar support.
That is, code can now be written to use an additional level of debug
output, selected at compile time. Many driver authors have found this
idiom to be very useful. A typical usage model is for "normal" debug
messages to focus on fault paths and not be very "chatty", so that those
messages can be left on during normal operation without much of a
performance or syslog load. On the other hand "verbose" messages would be
noisy enough that they wouldn't normally be enabled; they might even affect
timings enough to change system or driver behavior.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as933) removes the deprecated dpm_runtime_suspend() and
dpm_runtime_resume() routines from the PM core. The only user of
those routines is the PCMCIA ds driver; local replacements are added.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allows the uevent file to handle any type of uevent action to be
triggered by userspace instead of just the "add" uevent.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Introduce API to dynamically register and unregister multicast groups.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow kicking listeners out of a multicast group when necessary
(for example if that group is going to be removed.)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow changing the number of groups for a netlink family
after it has been created, use RCU to protect the listeners
bitmap keeping netlink_has_listeners() lock-free.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Patrick McHardy <kaber@trash.net>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TSEC/eTSEC can detect the interface to the PHY automatically,
but it isn't able to detect whether the RGMII connection needs internal
delay. So we need to detect that change in the device tree, propagate
it to the platform data, and then check it if we're in RGMII. This fixes
a bug on the 8641D HPCN board where the Vitesse PHY doesn't use the delay
for RGMII.
Signed-off-by: Andy Fleming <afleming@freescale.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
[AVR32] Initialize phy_mask for both macb devices
[AVR32] Fix atomic_add_unless() and atomic_sub_unless()
[AVR32] Correct misspelled CONFIG_BLK_DEV_INITRD variable.
[AVR32] Fix build error in parse_tag_rdimg()
[AVR32] Don't wire up macb0 unless SW6 is in default position
[AVR32] Wire up SSC platform device 0 as TX on ATSTK1000 board
[AVR32] Add Atmel SSC driver platform device to AT32AP architecture
[AVR32] Remove optimization of unaligned word loads
[AVR32] Make STK1000 mux settings configurable
[AVR32] CPU frequency scaling for AT32AP
[AVR32] Split SM device into PM, RTC, WDT and EIC
[AVR32] faster avr32 unaligned access
These functions depend on "result" being initalized to 0, but "result"
is not included as an input constraint to the inline assembly block
following its initialization, only as an output constraint. Thus gcc
thinks it doesn't need to initialize it, so result ends up undefined
if the "unless" condition is true.
This fixes an oops in sunrpc where the faulty atomics caused
rpciod_up() to not start the workqueue as it should.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
This patch adds register definitions, clocks and IRQs to the platform devices.
Signed-off-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
If we let unaligned word loads bypass the generic unaligned handling,
gcc may combine it with a swap.b instruction and turn it into a ldwsp
instruction, which does not work with unaligned addresses.
Revert the optimization to prevent the RNDIS driver from crashing.
Hopefully we'll figure something out later (it may be better to do the
optimization in gcc.)
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Split the SM platform device into separate platform devices for PM,
RTC, WDT and EIC. This is more correct according to the documentation
and allows us to simplify the code a little.
Also turn the EIC driver into a real platform driver.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com>
Use a more conventional implementation for unaligned access, and include
an AT32AP-specific optimization: the CPU will handle unaligned words.
The result is always faster and smaller for 8, 16, and 32 bit values.
For 64 bit quantities, it's presumably larger.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
* 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (126 commits)
V4L/DVB (5847): Clean up schedule_timeout calls in cpia2 and ivtv code
V4L/DVB (5846): Clean up setting state and scheduling timeouts
V4L/DVB (5844): ivtv: add high volume debugging flag
V4L/DVB (5843): ivtv: fix missing signal_pending check.
V4L/DVB (5842): ivtv: Add locking to ensure stream setup is atomic.
V4L/DVB (5841): tveeprom: add support for Philips FQ1216LME MK3 tuner.
V4L/DVB (5840): fix dst and cx24123: tune() callback changed signess for delay
V4L/DVB (5838): dvb-core: Fix signedness warnings (gcc 4.1.1, kernel 2.6.22)
V4L/DVB (5837): stv0299: Fix signedness warning (gcc 4.1.1, kernel 2.6.22)
V4L/DVB (5836): dvb-ttpci: re-initialize aspect ratio and pan scan after arm crash
V4L/DVB (5835): saa7146/dvb-ttpci: Fix signedness warnings (gcc 4.1.1, kernel 2.6.22)
V4L/DVB (5834): dvb-core: fix signedness warnings and const stripping
V4L/DVB (5832): ir-common: optimize bit extract function
V4L/DVB (5831): stradis: use ARRAY_SIZE
V4L/DVB (5829): Firmware extract and loading for opera dvb-usb update
V4L/DVB (5828): Kconfig: Added GemTek USB radio and removed experimental dependency.
V4L/DVB (5826): Usbvision: video mux cleanup
V4L/DVB (5825): Alter the tuner type for the WinTV USB UK PAL model.
V4L/DVB (5824): Usbvision: Hauppauge WinTV USB SECAM_L fix
V4L/DVB (5821): Saa7134: add remote control support for LifeView FlyDVB-S LR300
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: extent macros cleanup
Fix compilation with EXT_DEBUG, also fix leXX_to_cpu conversions.
ext4: remove extra IS_RDONLY() check
ext4: Use is_power_of_2()
Use zero_user_page() in ext4 where possible
ext4: Remove 65000 subdirectory limit
ext4: Expand extra_inodes space per the s_{want,min}_extra_isize fields
ext4: Add nanosecond timestamps
jbd2: Move jbd2-debug file to debugfs
jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG
ext4: Set the journal JBD2_FEATURE_INCOMPAT_64BIT on large devices
ext4: Make extents code sanely handle on-disk corruption
ext4: copy i_flags to inode flags on write
ext4: Enable extents by default
Change on-disk format to support 2^15 uninitialized extents
write support for preallocated blocks
fallocate support in ext4
sys_fallocate() implementation on i386, x86_64 and powerpc
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits)
[NETFILTER]: xt_connlimit needs to depend on nf_conntrack
[NETFILTER]: ipt_iprange.h must #include <linux/types.h>
[IrDA]: Fix IrDA build failure
[ATM]: nicstar needs virt_to_bus
[NET]: move __dev_addr_discard adjacent to dev_addr_discard for readability
[NET]: merge dev_unicast_discard and dev_mc_discard into one
[NET]: move dev_mc_discard from dev_mcast.c to dev.c
[NETLINK]: negative groups in netlink_setsockopt
[PPPOL2TP]: Reset meta-data in xmit function
[PPPOL2TP]: Fix use-after-free
[PKT_SCHED]: Some typo fixes in net/sched/Kconfig
[XFRM]: Fix crash introduced by struct dst_entry reordering
[TCP]: remove unused argument to cong_avoid op
[ATM]: [idt77252] Rename CONFIG_ATM_IDT77252_SEND_IDLE to not resemble a Kconfig variable
[ATM]: [drivers] ioremap balanced with iounmap
[ATM]: [lanai] sram_test_word() must be __devinit
[ATM]: [nicstar] Replace C code with call to ARRAY_SIZE() macro.
[ATM]: Eliminate dead config variable CONFIG_BR2684_FAST_TRANS.
[ATM]: Replacing kmalloc/memset combination with kzalloc.
[NET]: gen_estimator deadlock fix
...
Move tuner callback function pointers out of struct tuner, into
struct tuner_operations.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Individual tuner drivers are now allocating memory themselves for
their own private data structures. This changeset adds a release
callback to the tuner operations, so that newer drivers that may
require more complex data structures may release this private data
themselves.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Create private data struct for device specific private data.
Signed-off-by: Michael Krufky <mkrufky@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Set vio->desc_buf to NULL after freeing.
[SPARC]: Mark sparc and sparc64 as not having virt_to_bus
[SPARC64]: Fix reset handling in VNET driver.
[SPARC64]: Handle reset events in vio_link_state_change().
[SPARC64]: Handle LDC resets properly in domain-services driver.
[SPARC64]: Massively simplify VIO device layer and support hot add/remove.
[SPARC64]: Simplify VNET probing.
[SPARC64]: Simplify VDC device probing.
[SPARC64]: Add basic infrastructure for MD add/remove notification.
This patch adds support for SAS Management Protocol (SMP) passthrough
support via bsg. aic94xx can use this.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The sas transport class attaches one bsg device to every SAS object
(host, device, expander, etc). LLDs can define a function to handle
SMP requests via sas_function_template::smp_handler.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Adds PCI_VENDOR_ID_BROCADE macro in include/linux/pci_ids.h file. This macro
is used in MPT Fusion FC drivers to support Brocade branded FC controllers
signed-off-by: Sathya Prakash <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This one was noticed by Gilbert Wu of Adaptec:
The libata core actually does the DMA mapping for you, so there has to
be an exception in the device drivers that *don't* do dma mapping for
ATA commands. However, since we've already done this, libsas must now
dma map any ATA commands that it wishes to issue ... and yes, this is a
horrible mess.
Additionally, the test in aic94xx for ATA protocols isn't quite right.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
ATA devices need special handling for sas_task_abort. If the ATA command
came from SCSI, then we merely need to tell SCSI to abort the scsi_cmnd.
However, internal commands require a bit more work--we need to fill the qc
with the appropriate error status and complete the command, and eventually
post_internal will issue the actual ABORT TASK.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch adds a new field, lldd_task, to ata_queued_cmd so that libata
users such as libsas can associate some data with a qc. The particular
ambition with this patch is to associate a sas_task with a qc; that way,
if libata decides to timeout a command, we can come back (in
sas_ata_post_internal) and abort the sas task.
One question remains: Is it necessary to reset the phy on error, or will
the libata error handler take care of it? (Assuming that one is written,
of course.) This patch, as it is today, works well enough to clean
things up when an ATA device probe attempt fails halfway through the probe,
though I'm not sure this is always the right thing to do.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is a respin of my earlier patch that migrates the ATA support code
into a separate file. For now, the controversial linking bits have
been removed per James Bottomley's request for a patch that contains
only the migration diffs, which means that libsas continues to require
libata. I intend to address that problem in a separate patch.
This patch is against the aic94xx-sas-2.6 git tree, and it has been
sanity tested on my x206m with Seagate SATA and SAS disks without
uncovering any new problems.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Hook the scsi_host_template functions in libsas to delegate
functionality to libata when appropriate.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Misc code changes and merge fixes and update for libata->drivers/ata
move
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
An experimental patch for Xen allows guests to place their vcpu_info
structs anywhere. We try to use this to place the vcpu_info into the
PDA, which allows direct access.
If this works, then switch to using direct access operations for
irq_enable, disable, save_fl and restore_fl.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Keir Fraser <keir@xensource.com>
The block device frontend driver allows the kernel to access block
devices exported exported by a virtual machine containing a physical
block device driver.
Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jens Axboe <axboe@kernel.dk>
This communicates with the machine control software via a registry
residing in a controlling virtual machine. This allows dynamic
creation, destruction and modification of virtual device
configurations (network devices, block devices and CPUS, to name some
examples).
[ Greg, would you mind giving this a review? Thanks -J ]
Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Greg KH <greg@kroah.com>
Add Xen 'grant table' driver which allows granting of access to
selected local memory pages by other virtual machines and,
symmetrically, the mapping of remote memory pages which other virtual
machines have granted access to.
This driver is a prerequisite for many of the Xen virtual device
drivers, which grant the 'device driver domain' restricted and
temporary access to only those memory pages that are currently
involved in I/O operations.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Implement a Xen back-end for hvc console.
* * *
Add early printk support via hvc console, enable using
"earlyprintk=xen" on the kernel command line.
From: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Olof Johansson <olof@lixom.net>
This is a fairly straightforward Xen implementation of smp_ops.
Xen has its own IPI mechanisms, and has no dependency on any
APIC-based IPI. The smp_ops hooks and the flush_tlb_others pv_op
allow a Xen guest to avoid all APIC code in arch/i386 (the only apic
operation is a single apic_read for the apic version number).
One subtle point which needs to be addressed is unpinning pagetables
when another cpu may have a lazy tlb reference to the pagetable. Xen
will not allow an in-use pagetable to be unpinned, so we must find any
other cpus with a reference to the pagetable and get them to shoot
down their references.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Add a new definition for PG_owner_priv_1 to define PG_pinned on Xen
pagetable pages.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Xen implements interrupts in terms of event channels. Each guest
domain gets 1024 event channels which can be used for a variety of
purposes, such as Xen timer events, inter-domain events,
inter-processor events (IPI) or for real hardware IRQs.
Within the kernel, we map the event channels to IRQs, and implement
the whole interrupt handling using a Xen irq_chip.
Rather than setting NR_IRQ to 1024 under PARAVIRT in order to
accomodate Xen, we create a dynamic mapping between event channels and
IRQs. Ideally, Linux will eventually move towards dynamically
allocating per-irq structures, and we can use a 1:1 mapping between
event channels and irqs.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric W. Biederman <ebiederm@xmission.com>
This patch is a rollup of all the core pieces of the Xen
implementation, including:
- booting and setup
- pagetable setup
- privileged instructions
- segmentation
- interrupt flags
- upcalls
- multicall batching
BOOTING AND SETUP
The vmlinux image is decorated with ELF notes which tell the Xen
domain builder what the kernel's requirements are; the domain builder
then constructs the address space accordingly and starts the kernel.
Xen has its own entrypoint for the kernel (contained in an ELF note).
The ELF notes are set up by xen-head.S, which is included into head.S.
In principle it could be linked separately, but it seems to provoke
lots of binutils bugs.
Because the domain builder starts the kernel in a fairly sane state
(32-bit protected mode, paging enabled, flat segments set up), there's
not a lot of setup needed before starting the kernel proper. The main
steps are:
1. Install the Xen paravirt_ops, which is simply a matter of a
structure assignment.
2. Set init_mm to use the Xen-supplied pagetables (analogous to the
head.S generated pagetables in a native boot).
3. Reserve address space for Xen, since it takes a chunk at the top
of the address space for its own use.
4. Call start_kernel()
PAGETABLE SETUP
Once we hit the main kernel boot sequence, it will end up calling back
via paravirt_ops to set up various pieces of Xen specific state. One
of the critical things which requires a bit of extra care is the
construction of the initial init_mm pagetable. Because Xen places
tight constraints on pagetables (an active pagetable must always be
valid, and must always be mapped read-only to the guest domain), we
need to be careful when constructing the new pagetable to keep these
constraints in mind. It turns out that the easiest way to do this is
use the initial Xen-provided pagetable as a template, and then just
insert new mappings for memory where a mapping doesn't already exist.
This means that during pagetable setup, it uses a special version of
xen_set_pte which ignores any attempt to remap a read-only page as
read-write (since Xen will map its own initial pagetable as RO), but
lets other changes to the ptes happen, so that things like NX are set
properly.
PRIVILEGED INSTRUCTIONS AND SEGMENTATION
When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
This means that it is more privileged than user-mode in ring 3, but it
still can't run privileged instructions directly. Non-performance
critical instructions are dealt with by taking a privilege exception
and trapping into the hypervisor and emulating the instruction, but
more performance-critical instructions have their own specific
paravirt_ops. In many cases we can avoid having to do any hypercalls
for these instructions, or the Xen implementation is quite different
from the normal native version.
The privileged instructions fall into the broad classes of:
Segmentation: setting up the GDT and the GDT entries, LDT,
TLS and so on. Xen doesn't allow the GDT to be directly
modified; all GDT updates are done via hypercalls where the new
entries can be validated. This is important because Xen uses
segment limits to prevent the guest kernel from damaging the
hypervisor itself.
Traps and exceptions: Xen uses a special format for trap entrypoints,
so when the kernel wants to set an IDT entry, it needs to be
converted to the form Xen expects. Xen sets int 0x80 up specially
so that the trap goes straight from userspace into the guest kernel
without going via the hypervisor. sysenter isn't supported.
Kernel stack: The esp0 entry is extracted from the tss and provided to
Xen.
TLB operations: the various TLB calls are mapped into corresponding
Xen hypercalls.
Control registers: all the control registers are privileged. The most
important is cr3, which points to the base of the current pagetable,
and we handle it specially.
Another instruction we treat specially is CPUID, even though its not
privileged. We want to control what CPU features are visible to the
rest of the kernel, and so CPUID ends up going into a paravirt_op.
Xen implements this mainly to disable the ACPI and APIC subsystems.
INTERRUPT FLAGS
Xen maintains its own separate flag for masking events, which is
contained within the per-cpu vcpu_info structure. Because the guest
kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
ignored (and must be, because even if a guest domain disables
interrupts for itself, it can't disable them overall).
(A note on terminology: "events" and interrupts are effectively
synonymous. However, rather than using an "enable flag", Xen uses a
"mask flag", which blocks event delivery when it is non-zero.)
There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
are implemented to manage the Xen event mask state. The only thing
worth noting is that when events are unmasked, we need to explicitly
see if there's a pending event and call into the hypervisor to make
sure it gets delivered.
UPCALLS
Xen needs a couple of upcall (or callback) functions to be implemented
by each guest. One is the event upcalls, which is how events
(interrupts, effectively) are delivered to the guests. The other is
the failsafe callback, which is used to report errors in either
reloading a segment register, or caused by iret. These are
implemented in i386/kernel/entry.S so they can jump into the normal
iret_exc path when necessary.
MULTICALL BATCHING
Xen provides a multicall mechanism, which allows multiple hypercalls
to be issued at once in order to mitigate the cost of trapping into
the hypervisor. This is particularly useful for context switches,
since the 4-5 hypercalls they would normally need (reload cr3, update
TLS, maybe update LDT) can be reduced to one. This patch implements a
generic batching mechanism for hypercalls, which gets used in many
places in the Xen code.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Ian Pratt <ian.pratt@xensource.com>
Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Cc: Adrian Bunk <bunk@stusta.de>
Add Xen interface header files. These are taken fairly directly from
the Xen tree, but somewhat rearranged to suit the kernel's conventions.
Define macros and inline functions for doing hypercalls into the
hypervisor.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
The tsc-based get_scheduled_cycles interface is not a good match for
Xen's runstate accounting, which reports everything in nanoseconds.
This patch replaces this interface with a sched_clock interface, which
matches both Xen and VMI's requirements.
In order to do this, we:
1. replace get_scheduled_cycles with sched_clock
2. hoist cycles_2_ns into a common header
3. update vmi accordingly
One thing to note: because sched_clock is implemented as a weak
function in kernel/sched.c, we must define a real function in order to
override this weak binding. This means the usual paravirt_ops
technique of using an inline function won't work in this case.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: john stultz <johnstul@us.ibm.com>
In a virtual environment, device drivers such as legacy IDE will waste
quite a lot of time probing for their devices which will never appear.
This helper function allows a paravirt implementation to lay claim to
the whole iomem and ioport space, thereby disabling all device drivers
trying to claim IO resources.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Allocate/release a chunk of vmalloc address space:
alloc_vm_area reserves a chunk of address space, and makes sure all
the pagetables are constructed for that address range - but no pages.
free_vm_area releases the address space range.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ian Pratt <ian.pratt@xensource.com>
Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: "Jan Beulich" <JBeulich@novell.com>
Cc: "Andi Kleen" <ak@muc.de>
Make globally leave_mm visible, specifically so that Xen can use it to
shoot-down lazy uses of cr3.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
When running with CONFIG_PARAVIRT, we may want lots of IRQs even if
there's no IO APIC.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Add a hook so that the paravirt backend knows when the allocator is
ready. This is useful for the obvious reason that the allocator is
available, but the other side-effect of having the bootmem allocator
available is that each page now has an associated "struct page".
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
It's useful to know which mm is allocating a pagetable. Xen uses this
to determine whether the pagetable being added to is pinned or not.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Use existing elfnote.h to generate vsyscall notes, rather than doing
it locally. Changes elfnote.h a bit to suit, since this is the first
asm user, and it wasn't quite right.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.com>
Rather than using a tri-state integer for the wait flag in
call_usermodehelper_exec, define a proper enum, and use that. I've
preserved the integer values so that any callers I've missed should
still work OK.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>
Various pieces of code around the kernel want to be able to trigger an
orderly poweroff. This pulls them together into a single
implementation.
By default the poweroff command is /sbin/poweroff, but it can be set
via sysctl: kernel/poweroff_cmd. This is split at whitespace, so it
can include command-line arguments.
This patch replaces four other instances of invoking either "poweroff"
or "shutdown -h now": two sbus drivers, and acpi thermal
management.
sparc64 has its own "powerd"; still need to determine whether it should
be replaced by orderly_poweroff().
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Len Brown <lenb@kernel.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David S. Miller <davem@davemloft.net>
Rather than having hundreds of variations of call_usermodehelper for
various pieces of usermode state which could be set up, split the
info allocation and initialization from the actual process execution.
This means the general pattern becomes:
info = call_usermodehelper_setup(path, argv, envp); /* basic state */
call_usermodehelper_<SET EXTRA STATE>(info, stuff...); /* extra state */
call_usermodehelper_exec(info, wait); /* run process and free info */
This patch introduces wrappers for all the existing calling styles for
call_usermodehelper_*, but folds their implementations into one.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: Bj?rn Steinbrink <B.Steinbrink@gmx.de>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
argv_split() is a helper function which takes a string, splits it at
whitespace, and returns a NULL-terminated argv vector. This is
deliberately simple - it does no quote processing of any kind.
[ Seems to me that this is something which is already being done in
the kernel, but I couldn't find any other implementations, either to
steal or replace. Keep an eye out. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Add a kstrndup function, modelled on strndup. Like strndup this
returns a string copied into its own allocated memory, but it copies
no more than the specified number of bytes from the source.
Remove private strndup() from irda code.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@mandriva.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Panagiotis Issaris <takis@issaris.org>
Cc: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
This is a reimplementation of the zs driver for the serial subsystem. Any
resemblance to the old driver is purely coincidential. ;-) I do hope I got
the handling of modem lines right -- better do not tackle me about the
issue unless you feel too good...
Any users of the old driver: please note the numbers of the serial lines
have now been swapped, i.e. ttyS0 <-> ttyS1 and ttyS2 <-> ttyS3. It has
to do with the modem lines mentioned above; basically the port A in a given
chip has to be initialised before the port B if you want to use the latter
as the serial console (which is usually the case), as operations on modem
lines of the serial line associated with the port B access both ports (see
the comment at the top of the driver for the details of wiring used).
Please update your scripts.
This is also the reason each SCC now requests an IRQ once only (as seen in
"/proc/interrupts") -- the handler takes care of both ports at once as the
line associated with the port B has to take status update interrupts from
both ports (and yet the line of the port A takes its own for itself too).
The old driver never got it right...
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
early_serial_setup was removed from serial.h, but forgot to put in
serial_8250.h
Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kill UBI's homegrown endianess handling and replace it with
the standard kernel endianess handling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
This patch adds support to ext4 for allowing more than 65000
subdirectories. Currently the maximum number of subdirectories is capped
at 32000.
If we exceed 65000 subdirectories in an htree directory it sets the
inode link count to 1 and no longer counts subdirectories. The
directory link count is not actually used when determining if a
directory is empty, as that only counts subdirectories and not regular
files that might be in there.
A EXT4_FEATURE_RO_COMPAT_DIR_NLINK flag has been added and it is set if
the subdir count for any directory crosses 65000. A later fsck will clear
EXT4_FEATURE_RO_COMPAT_DIR_NLINK if there are no longer any directory
with >65000 subdirs.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We need to make sure that existing ext3 filesystems can also avail the
new fields that have been added to the ext4 inode. We use
s_want_extra_isize and s_min_extra_isize to decide by how much we should
expand the inode. If EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature is set
then we expand the inode by max(s_want_extra_isize, s_min_extra_isize ,
sizeof(ext4_inode) - EXT4_GOOD_OLD_INODE_SIZE) bytes. Actually it is
still an open question about whether users should be able to set
s_*_extra_isize smaller than the known fields or not.
This patch also adds the functionality to expand inodes to include the
newly added fields. We start by trying to expand by s_want_extra_isize
bytes and if its fails we try to expand by s_min_extra_isize bytes. This
is done by changing the i_extra_isize if enough space is available in
the inode and no EAs are present. If EAs are present and there is enough
space in the inode then the EAs in the inode are shifted to make space.
If enough space is not available in the inode due to the EAs then 1 or
more EAs are shifted to the external EA block. In the worst case when
even the external EA block does not have enough space we inform the user
that some EA would need to be deleted or s_min_extra_isize would have to
be reduced.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This patch adds nanosecond timestamps for ext4. This involves adding
*time_extra fields to the ext4_inode to extend the timestamps to
64-bits. Creation time is also added by this patch.
These extended fields will fit into an inode if the filesystem was
formatted with large inodes (-I 256 or larger) and there are currently
no EAs consuming all of the available space. For new inodes we always
reserve enough space for the kernel's known extended fields, but for
inodes created with an old kernel this might not have been the case. So
this patch also adds the EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE feature
flag(ro-compat so that older kernels can't create inodes with a smaller
extra_isize). which indicates if the fields fitting inside
s_min_extra_isize are available or not. If the expansion of inodes if
unsuccessful then this feature will be disabled. This feature is only
enabled if requested by the sysadmin.
None of the extended inode fields is critical for correct filesystem
operation.
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Kalpak Shah <kalpak@clusterfs.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The jbd2-debug file used to be located in /proc/sys/fs/jbd2-debug, but it
incorrectly used create_proc_entry() instead of the sysctl routines, and
no proc entry was ever created.
Instead of fixing this we might as well move the jbd2-debug file to
debugfs which would be the preferred location for this kind of tunable.
The new location is now /sys/kernel/debug/jbd2/jbd2-debug.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When the JBD code was forked to create the new JBD2 code base, the
references to CONFIG_JBD_DEBUG where never changed to
CONFIG_JBD2_DEBUG. This patch fixes that.
Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Propagate flags such as S_APPEND, S_IMMUTABLE, etc. from i_flags into
ext4-specific i_flags. Quota code changes these flags on quota files
(to make it harder for sysadmin to screw himself) and these changes were
not correctly propagated into the filesystem.
(This is a forward port patch from ext3)
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
This change was suggested by Andreas Dilger.
This patch changes the EXT_MAX_LEN value and extent code which marks/checks
uninitialized extents. With this change it will be possible to have
initialized extents with 2^15 blocks (earlier the max blocks we could have
was 2^15 - 1). This way we can have better extent-to-block alignment.
Now, maximum number of blocks we can have in an initialized extent is 2^15
and in an uninitialized extent is 2^15 - 1.
Signed-off-by: Amit Arora <aarora@in.ibm.com>
ipt_iprange.h must #include <linux/types.h> since it uses __be32.
This patch fixes kernel Bugzilla #7604.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because this function is only called by unregister_netdevice,
this moving could make this non-global function static,
and also remove its declaration in netdevice.h;
Any further, function __dev_addr_discard is also just called by
dev_mc_discard and dev_unicast_discard, keeping this two functions
both in one c file could make __dev_addr_discard also static
and remove its declaration in netdevice.h;
Futhermore, the sequential call to dev_unicast_discard and then
dev_mc_discard in unregister_netdevice have a similar mechanism that:
(netif_tx_lock_bh / __dev_addr_discard / netif_tx_unlock_bh),
they should merged into one to eliminate duplicates in acquiring and
releasing the dev->_xmit_lock, this would be done in my following patch.
Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XFRM expects xfrm_dst->u.next to be same pointer as dst->next, which
was broken by the dst_entry reordering in commit 1e19e02c~, causing
an oops in xfrm_bundle_ok when walking the bundle upwards.
Kill xfrm_dst->u.next and change the only user to use dst->next instead.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
None of the existing TCP congestion controls use the rtt value pased
in the ca_ops->cong_avoid interface. Which is lucky because seq_rtt
could have been -1 when handling a duplicate ack.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create and destroy VIO devices in response to MD update events. These
run synchronously inside of the MD update mutex so the VIO layer
doesn't need to do internal locking of any sort.
Signed-off-by: David S. Miller <davem@davemloft.net>
And add dummy handlers for the VIO device layer. These will be filled
in with real code after the vdc, vnet, and ds drivers are reworked to
have simpler dependencies on the VIO device tree.
Signed-off-by: David S. Miller <davem@davemloft.net>
These serial touchscreens are found on some Fujitsu lifebook
P-series laptops, and the B6210. Using this requires a new
version of inputattach and doing:
inputattach -fjt /dev/ttyS0
Big thanks to Stephen Hemminger for testing it and making it
work on his B6210 laptop.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Pendown status from the PENIRQ pin is currently read only at the beginning
of a sample set. If the pen is lifted just after sampling has began then
sampled values become wrong.
This patch adds an optional platform penirq_recheck_delay attribute. If
non-zero, samples are only reported to the input subsystem if PENIRQ is
still active that long after the samples taken.
Signed-off-by: Semih Hazar <semih.hazar@indefia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
The ads7846 driver has support for filtering, but when the chip gets
deselected between samples this causes noise. This patch adds support
for an optional settling delay time, so that two consecutive samples
will be taken with the specified delay time apart. This ensures that
the chip won't be deselected, so the noise won't appear.
Filtering can still be done, but will have less work to do since each
time a new sample is taken the same delay applies.
Signed-off-by: Semih Hazar <semih.hazar@indefia.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This patch adds write support to the uninitialized extents that get
created when a preallocation is done using fallocate(). It takes care of
splitting the extents into multiple (upto three) extents and merging the
new split extents with neighbouring ones, if possible.
Signed-off-by: Amit Arora <aarora@in.ibm.com>
This patch implements ->fallocate() inode operation in ext4. With this
patch users of ext4 file systems will be able to use fallocate() system
call for persistent preallocation. Current implementation only supports
preallocation for regular files (directories not supported as of date)
with extent maps. This patch does not support block-mapped files currently.
Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of
now.
Signed-off-by: Amit Arora <aarora@in.ibm.com>
fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called ->fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.
Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
and ppc). Patches for s390(x) and ia64 are already available from
previous posts, but it was decided that they should be added later
once fallocate is in the mainline. Hence not including those patches
in this take.
2. Changes to glibc,
a) to support fallocate() system call
b) to make posix_fallocate() and posix_fallocate64() call fallocate()
Signed-off-by: Amit Arora <aarora@in.ibm.com>
With the slab zeroing allocations cleanups Christoph stubbed in a generic
kzalloc(), which was missed on SLOB. Follow the SLAB/SLUB changes and
kill off the __kzalloc() wrapper that SLOB was using.
Reported-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'bsg' of git://git.kernel.dk/data/git/linux-2.6-block:
bsg: fix missing space in version print
Don't define empty struct bsg_class_device if !CONFIG_BLK_DEV_BSG
bsg: Kconfig updates
bsg: minor cleanup
bsg: device hash table cleanup
bsg: fix initialization error handling bugs
bsg: mark FUJITA Tomonori as bsg maintainer
bsg: convert to dynamic major
bsg: address various review comments
... or we end up with header include order problems from hell.
E.g. on m68k this is 100% fatal - local_irq_enable() there
wants preempt_count(), which wants task_struct fields, which
we won't have when we are in smp.h pulled from sched.h.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
users to it. This is done because we want to avoid bugs in the future
where we check for only effective fsuid of the current task against a
file's owning uid, without simultaneously checking for CAP_FOWNER as
well, thus violating its semantics.
[ XFS uses special macros and structures, and in general looked ...
untouchable, so we leave it alone -- but it has been looked over. ]
The (current->fsuid != inode->i_uid) check in generic_permission() and
exec_permission_lite() is left alone, because those operations are
covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: Serge E. Hallyn <serge@hallyn.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (80 commits)
KVM: Use CPU_DYING for disabling virtualization
KVM: Tune hotplug/suspend IPIs
KVM: Keep track of which cpus have virtualization enabled
SMP: Allow smp_call_function_single() to current cpu
i386: Allow smp_call_function_single() to current cpu
x86_64: Allow smp_call_function_single() to current cpu
HOTPLUG: Adapt thermal throttle to CPU_DYING
HOTPLUG: Adapt cpuset hotplug callback to CPU_DYING
HOTPLUG: Add CPU_DYING notifier
KVM: Clean up #includes
KVM: Remove kvmfs in favor of the anonymous inodes source
KVM: SVM: Reliably detect if SVM was disabled by BIOS
KVM: VMX: Remove unnecessary code in vmx_tlb_flush()
KVM: MMU: Fix Wrong tlb flush order
KVM: VMX: Reinitialize the real-mode tss when entering real mode
KVM: Avoid useless memory write when possible
KVM: Fix x86 emulator writeback
KVM: Add support for in-kernel pio handlers
KVM: VMX: Fix interrupt checking on lightweight exit
KVM: Adds support for in-kernel mmio handlers
...
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Clean away some code inside some non-existent CONFIG ifdefs
[IA64] ar.itc access must really be after xtime_lock.sequence has been read
[IA64] correctly count CPU objects in the ia64/sn hwperf interface
[IA64] arbitary speed tty ioctl support
[IA64] use machvec=dig on hpzx1 platforms
Verify that types would match for assignment (under sizeof, so we are safe from
side effects or any code actually getting generated), then explicitly cast
everywhere to the fixed-sized types. Kills a bunch of bogus warnings about
constants being truncated (gcc, sparse), finds a pile of endianness problems
hidden by old noise (sparse).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... fortunately, termios and ktermios there are identical, so no
run-time breakage happened.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bitmap_unplug only ever returns 0, so it may as well be void. Two callers try
to print a message if it returns non-zero, but that message is already printed
by bitmap_file_kick.
write_page returns an error which is not consistently checked. It always
causes BITMAP_WRITE_ERROR to be set on an error, and that can more
conveniently be checked.
When the return of write_page is checked, an error causes bitmap_file_kick to
be called - so move that call into write_page - and protect against recursive
calls into bitmap_file_kick.
bitmap_update_sb returns an error that is never checked.
So make these 'void' and be consistent about checking the bit.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't use 'unsigned' variable to track sync vs non-sync IO, as the only thing
we want to do with them is a signed comparison, and fix up the comment which
had become quite wrong.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add fb_append_extra_logo(), to append extra lines of logos below the standard
Linux logo.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-By: James Simmons <jsimmons@infradead.org>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No memory allocation was done for the pseudo_palette. Allocate one for it.
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Acked-by: "Maciej W. Rozycki" <macro@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows for proper console unregistration via the VT layer, and updates
the FB layer to use it. This makes debugging new console drivers much easier,
since you can properly clean them up before unloading.
[adaplas]
unregister_framebuffer() is typically called as part of the driver's
module_exit(). Doing so otherwise will freeze the machine as the VT layer is
holding reference counts on fbcon, and fbcon on the driver. With this change,
it allows unregister_framebuffer() to be called safely anywhere as needed.
Additions from the original: If multiple drivers are used by fbcon, and if
one of them unregisters, a driver will take over the consoles vacated by the
outgoing one (via set_con2fb_map). Once only the outgoing driver remains,
then fbcon will unbind from the VT layer (if CONFIG_HW_CONSOLE_UNBINDING is
set to y).
It is important that these drivers implement fb_open() and fb_release()
just to ensure that no other process is using the driver. Likewise, these
drivers _must_ check the return value of unregister_framebuffer().
[akpm@linux-foundation.org: make fbcon_unbind() stub inline]
Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow fbcon to select the primary display adapter using the
fb_is_primary_device() arch-specific helper. If a a primary adapter is
detected, fbcon will unbind the old adapter from the VT layer, then rebind
using the new adapter. This requires that bind_/unbind_con_driver() be made
public.
Because this feature may produce unexpected behavior (from the user's POV),
this must be explicitly enabled in Kconfig.
[akpm@linux-foundation.org: export unbind_con_driver]
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add function helper, fb_is_primary_device(). Given struct fb_info, it will
return a nonzero value if the device is the primary display.
Currently, only the i386 is supported where the function checks for the
IORESOURCE_ROM_SHADOW flag.
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move arch-specific bits of fb_mmap() to their respective subdirectories
[bob.picco@hp.com: efi_range_is_wc is referenced but not declared]
[bunk@stusta.de: fix include/asm-m68k/fb.h]
Signed-off-by: Antonino Daplas <adaplas@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow root squashing to vary per-pseudoflavor, so that you can (for example)
allow root access only when sufficiently strong security is in use.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We could return some sort of error in the case where someone asks for secinfo
on an export without the secinfo= option set--that'd be no worse than what
we've been doing. But it's not really correct. So, hack up an approximate
secinfo response in that case--it may not be complete, but it'll tell the
client at least one acceptable security flavor.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement the secinfo operation.
(Thanks to Usha Ketineni wrote an earlier version of this support.)
Cc: Usha Ketineni <uketinen@us.ibm.com>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow readonly access to vary depending on the pseudoflavor, using the flag
passed with each pseudoflavor in the export downcall. The rest of the flags
are ignored for now, though some day we might also allow id squashing to vary
based on the flavor.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the first actual use of the secinfo information by using it to return
nfserr_wrongsec when an export is found that doesn't allow the flavor used on
this request.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want it to be possible for users to restrict exports both by IP address and
by pseudoflavor. The pseudoflavor information has previously been passed
using special auth_domains stored in the rq_client field. After the preceding
patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so
now we use rq_client for the ip information, as auth_null and auth_unix do.
However, we keep around the special auth_domain in the rq_gssclient field for
backwards compatibility purposes, so we can still do upcalls using the old
"gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an
appropriate export. This allows us to continue supporting old mountd.
In fact, for this first patch, we always use the "gss/pseudoflavor"
auth_domain (and only it) if it is available; thus rq_client is ignored in the
auth_gss case, and this patch on its own makes no change in behavior; that
will be left to later patches.
Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap
upcall by a dummy value--no version of idmapd has ever used it, and it's
unlikely anyone really wants to perform idmapping differently depending on the
where the client is (they may want to perform *credential* mapping
differently, but that's a different matter--the idmapper just handles id's
used in getattr and setattr). But I'm updating the idmapd code anyway, just
out of general backwards-compatibility paranoia.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Split the callers of exp_get_by_name(), exp_find(), and exp_parent() into
those that are processing requests and those that are doing other stuff (like
looking up filehandles for mountd).
No change in behavior, just a (fairly pointless, on its own) cleanup.
(Note this has the effect of making nfsd_cross_mnt() pass rqstp->rq_client
instead of exp->ex_client into exp_find_by_name(). However, the two should
have the same value at this point.)
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We're passing three arguments to exp_pseudoroot, two of which are just fields
of the svc_rqst. Soon we'll want to pass in a third field as well. So let's
just give up and pass in the whole struct svc_rqst.
Also sneak in some minor style cleanups while we're at it.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We add a list of pseudoflavors to each export downcall, which will be used
both as a list of security flavors allowed on that export, and (in the order
given) as the list of pseudoflavors to return on secinfo calls.
This patch parses the new downcall information and adds it to the export
structure, but doesn't use it for anything yet.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new field to the svc_rqst structure to record the pseudoflavor that the
request was made with. For now we record the pseudoflavor but don't use it
for anything.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One more incremental delegation policy improvement: don't give out a
delegation on a file if conflicting access has previously required that a
delegation be revoked on that file. (In practice we'll forget about the
conflict when the struct nfs4_file is removed on close, so this is of limited
use for now, though it should at least solve a temporary problem with
self-conflicts on write opens from the same client.)
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Our original NFSv4 delegation policy was to give out a read delegation on any
open when it was possible to.
Since the lifetime of a delegation isn't limited to that of an open, a client
may quite reasonably hang on to a delegation as long as it has the inode
cached. This becomes an obvious problem the first time a client's inode cache
approaches the size of the server's total memory.
Our first quick solution was to add a hard-coded limit. This patch makes a
mild incremental improvement by varying that limit according to the server's
total memory size, allowing at most 4 delegations per megabyte of RAM.
My quick back-of-the-envelope calculation finds that in the worst case (where
every delegation is for a different inode), a delegation could take about
1.5K, which would make the worst case usage about 6% of memory. The new limit
works out to be about the same as the old on a 1-gig server.
[akpm@linux-foundation.org: Don't needlessly bloat vmlinux]
[akpm@linux-foundation.org: Make it right for highmem machines]
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It looks like Al Viro gutted this header file five years ago and it hasn't
been touched since.
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NFS4_FHSIZE is measured in bytes, not 4-byte words, so much more space than
necessary is being allocated for struct nfs4_cb_recall.
I should have wondered why this structure was so much larger than it needed to
be!
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Both lockd and (in the nfsv4 case) nfsd enforce a "grace period" after reboot,
during which clients may reclaim locks from the previous server instance, but
may not acquire new locks.
Currently the lockd and nfsd enforce grace periods of different lengths. This
may cause problems when we reboot a server with both v2/v3 and v4 clients.
For example, if the lockd grace period is shorter (as is likely the case),
then a v3 client might acquire a new lock that conflicts with a lock already
held (but not yet reclaimed) by a v4 client.
This patch calculates a lease time that lockd and nfsd can both use.
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently NFSD calls directly into filesystems through the export_operations
structure. I plan to change this interface in various ways in later patches,
and want to avoid the export of the default operations to NFSD, so this patch
adds two simple exportfs_encode_fh/exportfs_decode_fh helpers for NFSD to call
instead of poking into exportfs guts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the exportfs interface was added the expectation was that filesystems
provide an operation to convert from a file handle to an inode/dentry, but it
kept a backwards compat option that still calls into iget.
Calling into iget from non-filesystem code is very bad, because it gives too
little information to filesystem, and simply crashes if the filesystem doesn't
implement the ->read_inode routine.
Fortunately there are only two filesystems left using this fallback: efs and
jfs. This patch moves a copy of export_iget to each of those to implement the
get_dentry method.
While this is a temporary increase of lines of code in the kernel it allows
for a much cleaner interface and important code restructuring in later
patches.
[akpm@linux-foundation.org: add jfs_get_inode_flags() declaration]
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
currently the export_operation structure and helpers related to it are in
fs.h. fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.
[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The CAPI 2.0 driver uses a semaphore as mutex. Use the mutex API instead of
the (binary) semaphore.
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Acked-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Quicc Engine enabled mpc83xx CPU's has a somewhat different HW interface to
the SPI controller. This patch adds a qe_mode knob that sees to that
needed adaptions are performed.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for the Infineon TLE62x0 series of low-side driver chips, such
as the TLE6220 or TLE6230. These can be viewed as output GPIOs specialized
for power switching applications. The driver provides a userspace
interface to those GPIOs, and to the switch status they provide.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add CRC7 routines, used for example in MMC over SPI communication.
Kerneldoc updates
[akpm@linux-foundation.org: fix funny mix of const and non-const]
Signed-off-by: Jan Nikitenko <jan.nikitenko@gmail.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new spi->mode bit: SPI_3WIRE, for chips where the SI and SO signals
are shared (and which are thus only half duplex). Update the LM70 driver
to require support for that hardware mode from the controller.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minor SPI controller driver updates: make the setup() methods reject
spi->mode bits they don't support, by masking aginst the inverse of bits
they *do* support. This insures against misbehavior later when new mode
bits get added.
Most controllers can't support SPI_LSB_FIRST; more handle SPI_CS_HIGH.
Support for all four SPI clock/transfer modes is routine.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sparse now warns if one compares pointers with integers. However, there are
false positives, like:
fs/filesystems.c:72:2: warning: Using plain integer as NULL pointer
Every time BUG_ON(ptr) is used, ptr is checked against integer zero. Avoid
that and save ~70 false positives from allyesconfig run.
mentioned by Al.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make arguments of timespec_equal() const struct timespec.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KSYM_NAME_LEN is peculiar in that it does not include the space for the
trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
buffer. This is nonsense and error-prone. Moreover, when the caller
forgets that it's very likely to subtly bite back by corrupting the stack
because the last position of the buffer is always cleared to zero.
This patch increments KSYM_NAME_LEN by one and updates code accordingly.
* off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
is fixed.
* Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
MODULE_NAME_LEN was treated as if it didn't include space for the
trailing '\0'. Fix it.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Paulo Marques <pmarques@grupopie.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a driver for the SB1250 DUART, a dual serial port implementation
included in the Broadcom family of SOCs descending from the SiByte SB1250
MIPS64 chip multiprocessor. It is a new implementation replacing the
old-fashioned driver currently present in the linux-mips.org tree. It
supports all the usual features one would expect from a(n asynchronous)
serial driver, including modem line control (as far as hardware supports it
-- there is edge detection logic missing from the DCD and RI lines and the
driver does not implement polling of these lines at the moment), the serial
console, BREAK transmission and reception, including the magic SysRq. The
receive FIFO threshold is not maintained though.
The driver was tested with a SWARM board which uses a BCM1250 SOC (which is
dual MIPS64 CMP) and has both ports of the single DUART implemented wired
externally. Both were tested. Testing included using the ports as
terminal lines at 1200bps (which is the ports minimum), 115200bps and a
couple of random speeds inbetween. The modem lines were verified to
operate correctly. No testing was performed with a use as a network
interface, like with SLIP or PPP.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The CHILD_MAX macro in limits.h should not be there. It claims to be the
limit on processes a user can own, but its value is wrong for that.
There is no constant value, but a variable resource limit (RLIMIT_NPROC).
Nothing in the kernel uses CHILD_MAX.
The proper thing to do according to POSIX is not to define CHILD_MAX at all.
The sysconf (_SC_CHILD_MAX) implementation works by calling getrlimit.
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The OPEN_MAX macro in limits.h should not be there. It claims to be the
limit on file descriptors in a process, but its value is wrong for that.
There is no constant value, but a variable resource limit (RLIMIT_NOFILE).
Nothing in the kernel uses OPEN_MAX except things that are wrong to do so.
I've submitted other patches to remove those uses.
The proper thing to do according to POSIX is not to define OPEN_MAX at all.
The sysconf (_SC_OPEN_MAX) implementation works by calling getrlimit.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The OPEN_MAX constant is an arbitrary number with no useful relation to
anything. Nothing should be using it. SCM_MAX_FD is just an arbitrary
constant and it should be clear that its value is chosen in net/scm.h
and not actually derived from anything else meaningful in the system.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Put WARN_ON and fixed all callers of unregister_blkdev(). Now we can make
unregister_blkdev return void.
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a proper prototype for proc_nr_files() in include/linux/fs.h
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Identical implementations of PTRACE_POKEDATA go into generic_ptrace_pokedata()
function.
AFAICS, fix bug on xtensa where successful PTRACE_POKEDATA will nevertheless
return EPERM.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The bounce buffer logic is included on systems that do not need it. If a
system does not have zones like ZONE_DMA and ZONE_HIGHMEM that can lead to
the use of bounce buffers then there is no need to reserve memory pools etc
etc. This is true f.e. for SGI Altix.
Also nicifies the Makefile and gets rid of the tricky "and" there.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Detect slab objects being passed to the page oriented functions of the VM.
It is not sufficient to simply return NULL because the functions calling
page_mapping may depend on other items of the page_struct also to be setup
properly. Moreover slab object may not be properly aligned. The page
oriented functions of the VM expect to operate on page aligned, page sized
objects. Operations on object straddling page boundaries may only affect the
objects partially which may lead to surprising results.
It is better to detect eventually remaining uses and eliminate them.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It becomes now easy to support the zeroing allocs with generic inline
functions in slab.h. Provide inline definitions to allow the continued use of
kzalloc, kmem_cache_zalloc etc but remove other definitions of zeroing
functions from the slab allocators and util.c.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add #ifdefs around data structures only needed if debugging is compiled into
SLUB.
Add inlines to small functions to reduce code size.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Define ZERO_OR_NULL_PTR macro to be able to remove the checks from the
allocators. Move ZERO_SIZE_PTR related stuff into slab.h.
Make ZERO_SIZE_PTR work for all slab allocators and get rid of the
WARN_ON_ONCE(size == 0) that is still remaining in SLAB.
Make slub return NULL like the other allocators if a too large memory segment
is requested via __kmalloc.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I can never remember what the function to register to receive VM pressure
is called. I have to trace down from __alloc_pages() to find it.
It's called "set_shrinker()", and it needs Your Help.
1) Don't hide struct shrinker. It contains no magic.
2) Don't allocate "struct shrinker". It's not helpful.
3) Call them "register_shrinker" and "unregister_shrinker".
4) Call the function "shrink" not "shrinker".
5) Reduce the 17 lines of waffly comments to 13, but document it properly.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Chinner <dgc@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we are out of memory of a suitable size we enter reclaim. The current
reclaim algorithm targets pages in LRU order, which is great for fairness at
order-0 but highly unsuitable if you desire pages at higher orders. To get
pages of higher order we must shoot down a very high proportion of memory;
>95% in a lot of cases.
This patch set adds a lumpy reclaim algorithm to the allocator. It targets
groups of pages at the specified order anchored at the end of the active and
inactive lists. This encourages groups of pages at the requested orders to
move from active to inactive, and active to free lists. This behaviour is
only triggered out of direct reclaim when higher order pages have been
requested.
This patch set is particularly effective when utilised with an
anti-fragmentation scheme which groups pages of similar reclaimability
together.
This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch which forms
the foundation. Credit to Mel Gorman for sanitity checking.
Mel said:
The patches have an application with hugepage pool resizing.
When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can
be resized with greater reliability. Testing on a desktop machine with 2GB
of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own
was very slow as the success rate was quite low. Without lumpy-reclaim,
each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages.
With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical.
[akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup]
[bunk@stusta.de: static declarations for internal functions]
[a.p.zijlstra@chello.nl: initial lumpy V2 implementation]
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the kernelcore= parameter for x86.
Once all patches are applied, a new command-line parameter exist and a new
sysctl. This patch adds the necessary documentation.
From: Yasunori Goto <y-goto@jp.fujitsu.com>
When "kernelcore" boot option is specified, kernel can't boot up on ia64
because of an infinite loop. In addition, the parsing code can be handled
in an architecture-independent manner.
This patch uses common code to handle the kernelcore= parameter. It is
only available to architectures that support arch-independent zone-sizing
(i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will
ignore the boot parameter.
[bunk@stusta.de: make cmdline_parse_kernelcore() static]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huge pages are not movable so are not allocated from ZONE_MOVABLE. However,
as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it
can be used to satisfy hugepage allocations even when the system has been
running a long time. This allows an administrator to resize the hugepage pool
at runtime depending on the size of ZONE_MOVABLE.
This patch adds a new sysctl called hugepages_treat_as_movable. When a
non-zero value is written to it, future allocations for the huge page pool
will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not
introduce additional external fragmentation of note as huge pages are always
the largest contiguous block we care about.
[akpm@linux-foundation.org: various fixes]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following 8 patches against 2.6.20-mm2 create a zone called ZONE_MOVABLE
that is only usable by allocations that specify both __GFP_HIGHMEM and
__GFP_MOVABLE. This has the effect of keeping all non-movable pages within a
single memory partition while allowing movable allocations to be satisfied
from either partition. The patches may be applied with the list-based
anti-fragmentation patches that groups pages together based on mobility.
The size of the zone is determined by a kernelcore= parameter specified at
boot-time. This specifies how much memory is usable by non-movable
allocations and the remainder is used for ZONE_MOVABLE. Any range of pages
within ZONE_MOVABLE can be released by migrating the pages or by reclaiming.
When selecting a zone to take pages from for ZONE_MOVABLE, there are two
things to consider. First, only memory from the highest populated zone is
used for ZONE_MOVABLE. On the x86, this is probably going to be ZONE_HIGHMEM
but it would be ZONE_DMA on ppc64 or possibly ZONE_DMA32 on x86_64. Second,
the amount of memory usable by the kernel will be spread evenly throughout
NUMA nodes where possible. If the nodes are not of equal size, the amount of
memory usable by the kernel on some nodes may be greater than others.
By default, the zone is not as useful for hugetlb allocations because they are
pinned and non-migratable (currently at least). A sysctl is provided that
allows huge pages to be allocated from that zone. This means that the huge
page pool can be resized to the size of ZONE_MOVABLE during the lifetime of
the system assuming that pages are not mlocked. Despite huge pages being
non-movable, we do not introduce additional external fragmentation of note as
huge pages are always the largest contiguous block we care about.
Credit goes to Andy Whitcroft for catching a large variety of problems during
review of the patches.
This patch creates an additional zone, ZONE_MOVABLE. This zone is only usable
by allocations which specify both __GFP_HIGHMEM and __GFP_MOVABLE. Hot-added
memory continues to be placed in their existing destination as there is no
mechanism to redirect them to a specific zone.
[y-goto@jp.fujitsu.com: Fix section mismatch of memory hotplug related code]
[akpm@linux-foundation.org: various fixes]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is often known at allocation time whether a page may be migrated or not.
This patch adds a flag called __GFP_MOVABLE and a new mask called
GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated
using the page migration mechanism or reclaimed by syncing with backing
storage and discarding.
An API function very similar to alloc_zeroed_user_highpage() is added for
__GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The
flags used by alloc_zeroed_user_highpage() are not changed because it would
change the semantics of an existing API. After this patch is applied there
are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
be marked deprecated if this patch is merged.
Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
shmem.c to keep all flag modifications to inode->mapping in the
shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of
Hugh Dickens.
Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
concept. Credit to Hugh Dickens for catching issues with shmem swap vector
and ramfs allocations.
[akpm@linux-foundation.org: build fix]
[hugh@veritas.com: __GFP_ZERO cleanup]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nobody is using ptep_test_and_clear_dirty and ptep_clear_flush_dirty. Remove
the functions from all architectures.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The last user of ptep_establish in mm/ is long gone. Remove the architecture
primitive as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support for IRQ migration across vector domain.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add fundamental support for multiple vector domain. There still exists
only one vector domain even with this patch. IRQ migration across
domain is not supported yet by this patch.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add mapping tables between irqs and vectors, and its management code.
This is necessary for supporting multiple vector domain because 1:1
mapping between irq and vector will be changed to n:1.
The irq == vector relationship between irqs and vectors is explicitly
remained for percpu interrupts, platform interrupts, isa IRQs and
vectors assigned using assign_irq_vector() because some programs might
depend on it.
And I should consider the following problem.
When pci drivers enabled/disabled devices dynamically, its irq number
is changed to the different one. Therefore, suspend/resume code may
happen problem.
To fix this problem, I bound gsi to irq.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Don't define an empty struct bsg_class_device if !CONFIG_BLK_DEV_BSG.
It's embedded in struct request_queue, but there we have
#if defined(CONFIG_BLK_DEV_BSG)
struct bsg_class_device bsg_dev;
#endif
anyway.
Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (209 commits)
[POWERPC] Create add_rtc() function to enable the RTC CMOS driver
[POWERPC] Add H_ILLAN_ATTRIBUTES hcall number
[POWERPC] xilinxfb: Parameterize xilinxfb platform device registration
[POWERPC] Oprofile support for Power 5++
[POWERPC] Enable arbitary speed tty ioctls and split input/output speed
[POWERPC] Make drivers/char/hvc_console.c:khvcd() static
[POWERPC] Remove dead code for preventing pread() and pwrite() calls
[POWERPC] Remove unnecessary #undef printk from prom.c
[POWERPC] Fix typo in Ebony default DTS
[POWERPC] Check for NULL ppc_md.init_IRQ() before calling
[POWERPC] Remove extra return statement
[POWERPC] pasemi: Don't auto-select CONFIG_EMBEDDED
[POWERPC] pasemi: Rename platform
[POWERPC] arch/powerpc/kernel/sysfs.c: Move NUMA exports
[POWERPC] Add __read_mostly support for powerpc
[POWERPC] Modify sched_clock() to make CONFIG_PRINTK_TIME more sane
[POWERPC] Create a dummy zImage if no valid platform has been selected
[POWERPC] PS3: Bootwrapper support.
[POWERPC] powermac i2c: Use mutex
[POWERPC] Schedule removal of arch/ppc
...
Fixed up conflicts manually in:
Documentation/feature-removal-schedule.txt
arch/powerpc/kernel/pci_32.c
arch/powerpc/kernel/pci_64.c
include/asm-powerpc/pci.h
and asked the powerpc people to double-check the result..
Convert the macb driver to use the generic PHY layer in
drivers/net/phy.
Signed-off-by: Frederic RODO <f.rodo@til-technologies.fr>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This reverts commit 29578624e3.
Ingo Molnar reports complete breakage with his e1000 card (no
networking, card reports transmit timeouts), and bisected it down to
this commit. Let's figure out what went wrong, but not keep breaking
machines until we do.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Olaf Kirch <olaf.kirch@oracle.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits)
[PATCH] ocfs2: zero_user_page conversion
ocfs2: Support xfs style space reservation ioctls
ocfs2: support for removing file regions
ocfs2: update truncate handling of partial clusters
ocfs2: btree support for removal of arbirtrary extents
ocfs2: Support creation of unwritten extents
ocfs2: support writing of unwritten extents
ocfs2: small cleanup of ocfs2_write_begin_nolock()
ocfs2: btree changes for unwritten extents
ocfs2: abstract btree growing calls
ocfs2: use all extent block suballocators
ocfs2: plug truncate into cached dealloc routines
ocfs2: simplify deallocation locking
ocfs2: harden buffer check during mapping of page blocks
ocfs2: shared writeable mmap
ocfs2: factor out write aops into nolock variants
ocfs2: rework ocfs2_buffered_write_cluster()
ocfs2: take ip_alloc_sem during entire truncate
ocfs2: Add "preferred slot" mount option
[KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c
...
* 'bsg' of git://git.kernel.dk/data/git/linux-2.6-block: (25 commits)
bsg: Kconfig updates
bsg: add SCSI transport-level request support
bsg: add bidi support
add a struct request pointer to the request structure
bsg: fix the deadlock on discarding done commands
bsg: fix a blocking read bug
bsg: minor bug fixes
improve bsg device allocation
bind bsg to all SCSI devices
bsg: bind bsg to request_queue instead of gendisk
bsg: add a request_queue argument to scsi_cmd_ioctl()
bsg: simplify __bsg_alloc_command failpath
bsg: add cheasy error checks for sysfs stuff
Add queue resizing support
Replace s32, u32 and u64 with __s32, __u32 and __u64 in bsg.h for userspace
bsg: silence a bogus gcc warning
bsg: style cleanup
bsg: use u32 etc instead of uint32_t
bsg: add SG_IO to SG v4
bsg: replace SG v3 with SG v4
...
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
splice: direct splicing updates ppos twice
more ACSI removal
umem: Fix match of pci_ids in umem driver
umem: Remove references to dead CONFIG_MM_MAP_MEMORY variable
remove the documentation for the legacy CDROM drivers
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: (26 commits)
[SPARC64]: Fix UP build.
[SPARC64]: dr-cpu unconfigure support.
[SERIAL]: Fix console write locking in sparc drivers.
[SPARC64]: Give more accurate errors in dr_cpu_configure().
[SPARC64]: Clear cpu_{core,sibling}_map[] in smp_fill_in_sib_core_maps()
[SPARC64]: Fix leak when DR added cpu does not bootup.
[SPARC64]: Add ->set_affinity IRQ handlers.
[SPARC64]: Process dr-cpu events in a kthread instead of workqueue.
[SPARC64]: More sensible udelay implementation.
[SPARC64]: SMP build fixes.
[SPARC64]: mdesc.c needs linux/mm.h
[SPARC64]: Fix build regressions added by dr-cpu changes.
[SPARC64]: Unconditionally register vio_bus_type.
[SPARC64]: Initial LDOM cpu hotplug support.
[SPARC64]: Fix setting of variables in LDOM guest.
[SPARC64]: Fix MD property lifetime bugs.
[SPARC64]: Abstract out mdesc accesses for better MD update handling.
[SPARC64]: Use more mearningful names for IRQ registry.
[SPARC64]: Initial domain-services driver.
[SPARC64]: Export powerd facilities for external entities.
...
* master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (68 commits)
sh: sh-rtc support for SH7709.
sh: Revert __xdiv64_32 size change.
sh: Update r7785rp defconfig.
sh: Export div symbols for GCC 4.2 and ST GCC.
sh: fix race in parallel out-of-tree build
sh: Kill off dead mach.c for hp6xx.
sh: hd64461.h cleanup and added comments.
sh: Update the alignment when 4K stacks are used.
sh: Add a .bss.page_aligned section for 4K stacks.
sh: Don't let SH-4A clobber SH-4 CFLAGS.
sh: Add parport stub for SuperIO ports.
sh: Drop -Wa,-dsp for DSP tuning.
sh: Update dreamcast defconfig.
fb: pvr2fb: A few more __devinit annotations for PCI.
fb: pvr2fb: Fix up section mismatch warnings.
sh: Select IPR-IRQ for SH7091.
sh: Correct __xdiv64_32/div64_32 return value size.
sh: Fix timer-tmu build for SH-3.
sh: Add cpu and mach links to CLEAN_FILES.
sh: Preliminary support for the SH-X3 CPU.
...
This is a patch that speeds up statfs. It is very simple - the "overhead"
calculation, which takes a huge amount of time for large filesystems, never
changes unless the size of the filesystem itself changes. That means we can
store it in memory and only recalculate if the filesystem has been resized
(almost never).
It also fixes a minor problem that we never update the on-disk superblock free
blocks/inodes counts until the filesystem is unmounted. While not fatal, we
may as well update that on disk when we have the information, and it makes
things like debugfs and dumpe2fs report a bit more accurate info.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a patch that speeds up statfs. It is very simple - the "overhead"
calculation, which takes a huge amount of time for large filesystems, never
changes unless the size of the filesystem itself changes. That means we can
store it in memory and only recalculate if the filesystem has been resized
(almost never).
It also fixes a minor problem that we never update the on-disk superblock free
blocks/inodes counts until the filesystem is unmounted. While not fatal, we
may as well update that on disk when we have the information, and it makes
things like debugfs and dumpe2fs report a bit more accurate info.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a patch that speeds up statfs. It is very simple - the "overhead"
calculation, which takes a huge amount of time for large filesystems, never
changes unless the size of the filesystem itself changes. That means we can
store it in memory and only recalculate if the filesystem has been resized
(almost never).
It also fixes a minor problem that we never update the on-disk superblock free
blocks/inodes counts until the filesystem is unmounted. While not fatal, we
may as well update that on disk when we have the information, and it makes
things like debugfs and dumpe2fs report a bit more accurate info.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change cancel_work_sync() and cancel_delayed_work_sync() to return a boolean
indicating whether the work was actually cancelled. A zero return value means
that the work was not pending/queued.
Without that kind of change it is not possible to avoid flush_workqueue()
sometimes, see the next patch as an example.
Also, this patch unifies both functions and kills the (unlikely) busy-wait
loop.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Imho, the current naming of cancel_xxx workqueue functions is very confusing.
cancel_delayed_work()
cancel_rearming_delayed_work()
cancel_rearming_delayed_workqueue() // obsolete
cancel_work_sync()
This looks as if the first 2 functions differ in "type" of their argument
which is not true any longer, nowadays the difference is the behaviour.
The semantics of cancel_rearming_delayed_work(dwork) was changed
significantly, it doesn't require that dwork rearms itself, and cancels dwork
synchronously.
Rename it to cancel_delayed_work_sync(). This matches cancel_delayed_work()
and cancel_work_sync(). Re-create cancel_rearming_delayed_work() as a simple
inline obsolete wrapper, like cancel_rearming_delayed_workqueue().
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Jarek Poplawski <jarkao2@o2.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The UMSDOS filesystem was removed back in 2.6.11, but some tiny bits stuck
around. This patch removes the few remaining leftovers. The only things
left behind after this are the entries in the CREDITS file and the ioctl
number in Documentation/ioctl-number.txt as documentation.
This third (hopefully final) version of the patch doesn't edit the
arch/um/config.release file, since Jeff Dike pointed out to me that it
should die completely, and asked me to remove it from my patch as he'll
send in a seperate patch removing the file completely.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current generic bug implementation has a call to dump_stack() in case a
WARN_ON(whatever) gets hit. Since report_bug(), which calls dump_stack(),
gets called from an exception handler we can do better: just pass the
pt_regs structure to report_bug() and pass it to show_regs() in case of a
warning. This will give more debug informations like register contents,
etc... In addition this avoids some pointless lines that dump_stack()
emits, since it includes a stack backtrace of the exception handler which
is of no interest in case of a warning. E.g. on s390 the following lines
are currently always present in a stack backtrace if dump_stack() gets
called from report_bug():
[<000000000001517a>] show_trace+0x92/0xe8)
[<0000000000015270>] show_stack+0xa0/0xd0
[<00000000000152ce>] dump_stack+0x2e/0x3c
[<0000000000195450>] report_bug+0x98/0xf8
[<0000000000016cc8>] illegal_op+0x1fc/0x21c
[<00000000000227d6>] sysc_return+0x0/0x10
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a rather bizarre thing to have inlined in io.h. Stick it in lib/
instead.
While we're there, despaghetti it a bit, and fix its off-by-one behaviour when
passed a zero length.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This follows a suggestion from Chuck Ebbert on how to make seccomp
absolutely zerocost in schedule too. The only remaining footprint of
seccomp is in terms of the bzImage size that becomes a few bytes (perhaps
even a few kbytes) larger, measure it if you care in the embedded.
Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reduces the memory footprint and it enforces that only the current
task can enable seccomp on itself (this is a requirement for a
strightforward [modulo preempt ;) ] TIF_NOTSC implementation).
Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While working on unshare support for the network namespace I noticed we
were putting clone flags in an int. Which is weird because the syscall
uses unsigned long and we at least need an unsigned to properly hold all of
the unshare flags.
So to make the code consistent, this patch updates the code to use
unsigned long instead of int for the clone flags in those places
where we get it wrong today.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One common problem with 32 bit system call and ioctl emulation is the
different alignment rules between i386 and 64 bit machines. A number of
drivers work around this by marking the compat structures as
'attribute((packed))', which is not the right solution because it breaks
all the non-x86 architectures that want to use the same compat code.
Hopefully, this patch improves the situation, it introduces two new types,
compat_u64 and compat_s64. These are defined on all architectures to have
the same size and alignment as the 32 bit version of u64 and s64.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vasily Tarasov <vtaras@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
aa95387774 removed the implementation of
lock_cpu_hotplug_interruptible and all users of it. This stub definition
for !CONFIG_HOTPLUG_CPU was left over -- kill it now.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove not only the references to Cobalt NVRAM, but the header file as
well.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Tim Hockin <thockin@hockin.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch enables the unshare of user namespaces.
It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
resets the current user_struct and adds a new root user (uid == 0)
For now, unsharing the user namespace allows a process to reset its
user_struct accounting and uid 0 in the new user namespace should be contained
using appropriate means, for instance selinux
The plan, when the full support is complete (all uid checks covered), is to
keep the original user's rights in the original namespace, and let a process
become uid 0 in the new namespace, with full capabilities to the new
namespace.
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Andrew Morgan <agm@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Basically, it will allow a process to unshare its user_struct table,
resetting at the same time its own user_struct and all the associated
accounting.
A new root user (uid == 0) is added to the user namespace upon creation.
Such root users have full privileges and it seems that theses privileges
should be controlled through some means (process capabilities ?)
The unshare is not included in this patch.
Changes since [try #4]:
- Updated get_user_ns and put_user_ns to accept NULL, and
get_user_ns to return the namespace.
Changes since [try #3]:
- moved struct user_namespace to files user_namespace.{c,h}
Changes since [try #2]:
- removed struct user_namespace* argument from find_user()
Changes since [try #1]:
- removed struct user_namespace* argument from find_user()
- added a root_user per user namespace
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Andrew Morgan <agm@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
deactivate the unshare of the uts and ipc namespaces and do not improve
performance.
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Acked-by: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add TTY input auditing, used to audit system administrator's actions. This is
required by various security standards such as DCID 6/3 and PCI to provide
non-repudiation of administrator's actions and to allow a review of past
actions if the administrator seems to overstep their duties or if the system
becomes misconfigured for unknown reasons. These requirements do not make it
necessary to audit TTY output as well.
Compared to an user-space keylogger, this approach records TTY input using the
audit subsystem, correlated with other audit events, and it is completely
transparent to the user-space application (e.g. the console ioctls still
work).
TTY input auditing works on a higher level than auditing all system calls
within the session, which would produce an overwhelming amount of mostly
useless audit events.
Add an "audit_tty" attribute, inherited across fork (). Data read from TTYs
by process with the attribute is sent to the audit subsystem by the kernel.
The audit netlink interface is extended to allow modifying the audit_tty
attribute, and to allow sending explanatory audit events from user-space (for
example, a shell might send an event containing the final command, after the
interactive command-line editing and history expansion is performed, which
might be difficult to decipher from the TTY input alone).
Because the "audit_tty" attribute is inherited across fork (), it would be set
e.g. for sshd restarted within an audited session. To prevent this, the
audit_tty attribute is cleared when a process with no open TTY file
descriptors (e.g. after daemon startup) opens a TTY.
See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a
more detailed rationale document for an older version of this patch.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Miloslav Trmac <mitr@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Paul Fulghum <paulkf@microgate.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Steve Grubb <sgrubb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently we handle spurious IRQ activity based upon seeing a lot of
invalid interrupts, and we clear things back on the base of lots of valid
interrupts.
Unfortunately in some cases you get legitimate invalid interrupts caused by
timing asynchronicity between the PCI bus and the APIC bus when disabling
interrupts and pulling other tricks. In this case although the spurious
IRQs are not a problem our unhandled counters didn't clear and they act as
a slow running timebomb. (This is effectively what the serial port/tty
problem that was fixed by clearing counters when registering a handler
showed up)
It's easy enough to add a second parameter - time. This means that if we
see a regular stream of harmless spurious interrupts which are not harming
processing we don't go off and do something stupid like disable the IRQ
after a month of running. OTOH lockups and performance killers show up a
lot more than 10/second
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make available to the user the following task and process performance
statistics:
* Involuntary Context Switches (task_struct->nivcsw)
* Voluntary Context Switches (task_struct->nvcsw)
Statistics information is available from:
1. taskstats interface (Documentation/accounting/)
2. /proc/PID/status (task only).
This data is useful for detecting hyperactivity patterns between processes.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Maxim Uvarov <muvarov@ru.mvista.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Jonathan Lim <jlim@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drop <linux/isicom.h> from being exported to user space since it would
be only an empty file.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the no longer used sonypi_camera_command().
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes dead keys and copy/paste of non-ASCII characters in UTF-8
mode on Linux console. See more details about the original patch at:
http://chris.heathens.co.nz/linux/utf8.html
Already posted on
(Oldest) http://lkml.org/lkml/2003/5/31/148http://lkml.org/lkml/2005/12/24/69
(Recent) http://lkml.org/lkml/2006/8/7/75
[bunk@stusta.de: make drivers/char/selection.c:store_utf8() static]
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Cc: Alexander E. Patrakov <patrakov@ums.usu.ru>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I forgot to remove capability.h from mm.h while removing sched.h! This
patch remedies that, because the only inline function which was using
CAP_something was made out of line.
Cross-compile tested without regressions on:
all powerpc defconfigs
all mips defconfigs
all m68k defconfigs
all arm defconfigs
all ia64 defconfigs
alpha alpha-allnoconfig alpha-defconfig alpha-up
arm
i386 i386-allnoconfig i386-defconfig i386-up
ia64 ia64-allnoconfig ia64-defconfig ia64-up
m68k
mips
parisc parisc-allnoconfig parisc-defconfig parisc-up
powerpc powerpc-up
s390 s390-allnoconfig s390-defconfig s390-up
sparc sparc-allnoconfig sparc-defconfig sparc-up
sparc64 sparc64-allnoconfig sparc64-defconfig sparc64-up
um-x86_64
x86_64 x86_64-allnoconfig x86_64-defconfig x86_64-up
as well as my two usual configs.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Part two in the O_CLOEXEC saga: adding support for file descriptors received
through Unix domain sockets.
The patch is once again pretty minimal, it introduces a new flag for recvmsg
and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit
is not used otherwise but the networking people will know better.
This new flag is not recognized by recvfrom and recv. These functions cannot
be used for that purpose and the asymmetry this introduces is not worse than
the already existing MSG_CMSG_COMPAT situations.
The patch must be applied on the patch which introduced O_CLOEXEC. It has to
remove static from the new get_unused_fd_flags function but since scm.c cannot
live in a module the function still hasn't to be exported.
Here's a test program to make sure the code works. It's so much longer than
the actual patch...
#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>
#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif
#ifndef MSG_CMSG_CLOEXEC
# define MSG_CMSG_CLOEXEC 0x40000000
#endif
int
main (int argc, char *argv[])
{
if (argc > 1)
{
int fd = atol (argv[1]);
printf ("child: fd = %d\n", fd);
if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
{
puts ("file descriptor valid in child");
return 1;
}
return 0;
}
struct sockaddr_un sun;
strcpy (sun.sun_path, "./testsocket");
sun.sun_family = AF_UNIX;
char databuf[] = "hello";
struct iovec iov[1];
iov[0].iov_base = databuf;
iov[0].iov_len = sizeof (databuf);
union
{
struct cmsghdr hdr;
char bytes[CMSG_SPACE (sizeof (int))];
} buf;
struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
.msg_control = buf.bytes,
.msg_controllen = sizeof (buf) };
struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN (sizeof (int));
msg.msg_controllen = cmsg->cmsg_len;
pid_t child = fork ();
if (child == -1)
error (1, errno, "fork");
if (child == 0)
{
int sock = socket (PF_UNIX, SOCK_STREAM, 0);
if (sock < 0)
error (1, errno, "socket");
if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
error (1, errno, "bind");
if (listen (sock, SOMAXCONN) < 0)
error (1, errno, "listen");
int conn = accept (sock, NULL, NULL);
if (conn == -1)
error (1, errno, "accept");
*(int *) CMSG_DATA (cmsg) = sock;
if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
error (1, errno, "sendmsg");
return 0;
}
/* For a test suite this should be more robust like a
barrier in shared memory. */
sleep (1);
int sock = socket (PF_UNIX, SOCK_STREAM, 0);
if (sock < 0)
error (1, errno, "socket");
if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
error (1, errno, "connect");
unlink (sun.sun_path);
*(int *) CMSG_DATA (cmsg) = -1;
if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
error (1, errno, "recvmsg");
int fd = *(int *) CMSG_DATA (cmsg);
if (fd == -1)
error (1, 0, "no descriptor received");
char fdname[20];
snprintf (fdname, sizeof (fdname), "%d", fd);
execl ("/proc/self/exe", argv[0], fdname, NULL);
puts ("execl failed");
return 1;
}
[akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The problem is as follows: in multi-threaded code (or more correctly: all
code using clone() with CLONE_FILES) we have a race when exec'ing.
thread #1 thread #2
fd=open()
fork + exec
fcntl(fd,F_SETFD,FD_CLOEXEC)
In some applications this can happen frequently. Take a web browser. One
thread opens a file and another thread starts, say, an external PDF viewer.
The result can even be a security issue if that open file descriptor
refers to a sensitive file and the external program can somehow be tricked
into using that descriptor.
Just adding O_CLOEXEC support to open() doesn't solve the whole set of
problems. There are other ways to create file descriptors (socket,
epoll_create, Unix domain socket transfer, etc). These can and should be
addressed separately though. open() is such an easy case that it makes not
much sense putting the fix off.
The test program:
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif
int
main (int argc, char *argv[])
{
int fd;
if (argc > 1)
{
fd = atol (argv[1]);
printf ("child: fd = %d\n", fd);
if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
{
puts ("file descriptor valid in child");
return 1;
}
return 0;
}
fd = open ("/proc/self/exe", O_RDONLY | O_CLOEXEC);
printf ("in parent: new fd = %d\n", fd);
char buf[20];
snprintf (buf, sizeof (buf), "%d", fd);
execl ("/proc/self/exe", argv[0], buf, NULL);
puts ("execl failed");
return 1;
}
[kyle@parisc-linux.org: parisc fix]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a flag in /proc/timer_stats to indicate deferrable timers. This will
let developers/users to differentiate between types of tiemrs in
/proc/timer_stats.
Deferrable timer and normal timer will appear in /proc/timer_stats as below.
10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
Also version of timer_stats changes from v0.1 to v0.2
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Continuing the work started in 411f0f3edc ...
This enables code with a dma path, that compiles away, to build without
requiring additional code factoring. It also prevents code that calls
dma_alloc_coherent and dma_free_coherent from linking whereas previously
the code would hit a BUG() at run time. Finally, it allows archs that set
!HAS_DMA to delete their asm/dma-mapping.h file.
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: James Bottomley <James.Bottomley@SteelEye.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: <geert@linux-m68k.org>
Cc: <zippel@linux-m68k.org>
Cc: <spyro@f2s.com>
Cc: <ysato@users.sourceforge.jp>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the obviously unnecessary includes of <linux/spinlock.h> under the
include/linux/ directory, and fix the couple errors that are introduced as
a result of that.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the following warnings.
fs/fat/dir.c: In function 'fat_parse_long':
include/linux/msdos_fs.h:294: warning: array subscript is above array bounds
include/linux/msdos_fs.h:295: warning: array subscript is above array bounds
include/linux/msdos_fs.h:295: warning: array subscript is above array bounds
The ->name is defined as "name[8], ext[3]", but fat_checksum() uses
those as name[11]. There is no actual problem, but it's not a good manner.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
per-cpu counters presently must iterate over all possible CPUs in the
exhaustive percpu_counter_sum().
But it can be much better to only iterate over the presently-online CPUs. To
do this, we must arrange for an offlined CPU's count to be spilled into the
counter's central count.
We can do this for all percpu_counters in the machine by linking them into a
single global list and walking that list at CPU_DEAD time.
(I hope. Might have race windows in which the percpu_counter_sum() count is
inaccurate?)
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gcc-4.3:
fs/fuse/dir.c: In function 'parse_dirfile':
fs/fuse/dir.c:833: warning: cast from pointer to integer of different size
fs/fuse/dir.c:835: warning: cast from pointer to integer of different size
[miklos@szeredi.hu: use offsetof]
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The I2O driver uses two semaphores as mutexes. Use the mutex API instead of
the (binary) semaphores.
Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Without this a tty write could block if a previous blocking tty write was
in progress on the same tty and blocked by a line discipline or hardware
event. Originally found and reported by Dave Johnson.
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 411187fb05 caused boot time to move and
process start times to become invalid after suspend. Using boot based time
for those restores the old behaviour and fixes the issue.
[akpm@linux-foundation.org: little cleanup]
Signed-off-by: Tomas Janousek <tjanouse@redhat.com>
Cc: Tomas Smetana <tsmetana@redhat.com>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The commits
411187fb05 (GTOD: persistent clock support)
c1d370e167 (i386: use GTOD persistent clock
support)
changed the monotonic time so that it no longer jumps after resume, but it's
not possible to use it for boot time and process start time calculations then.
Also, the uptime no longer increases during suspend.
I add a variable to track the wall_to_monotonic changes, a function to get the
real boot time and a function to get the boot based time from the monotonic
one.
[akpm@linux-foundation.org: remove exports, add comment]
Signed-off-by: Tomas Janousek <tjanouse@redhat.com>
Cc: Tomas Smetana <tsmetana@redhat.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before calling init_hwif_default, ide_unregister gets lock ide_lock and
disables irq. init_hwif_default calls ide_default_io_base which calls
pci_get_device and later pci_get_subsys tries to apply for semaphore
pci_bus_sem and goes to sleep.
Mostly, pci_get_device should be called when irq is turned on.
ide_default_io_base just needs find if list pci_devices is empty.
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a write_trylock_irqsave() implementation. Similar in style to
the implementation of spin_trylock_irqsave() in mainline.
Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix following races:
===========================================
1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
meanwhile. Or, more generically, system call done on /proc file, method
supplied by module is called, module dissapeares meanwhile.
pde = create_proc_entry()
if (!pde)
return -ENOMEM;
pde->write_proc = ...
open
write
copy_from_user
pde = create_proc_entry();
if (!pde) {
remove_proc_entry();
return -ENOMEM;
/* module unloaded */
}
*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()
remove_proc_entry vfs_read
proc_kill_inodes [check ->f_op validness]
[check ->f_op->read validness]
[verify_area, security permissions checks]
->f_op = NULL;
if (file->f_op->read)
/* ->f_op dereference, boom */
NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set ->owner for them, so proxying for
directories may be unneeded.
NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
If your in-tree module uses something else, yell on me. Full audit pending.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adding the defines/macros activates the existing code in the tty layer
and allows this platform to use the arbitary speed ioctl setting layer
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For some reason, I was using kmalloc instead of get_free_pages for kernel
stacks.
Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the needed constants and bits. The actual code is already in the tty
layer and turned on by the definitions
Signed-off-by: Alan Cox <alan@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>