Name, phys and uniq are quite often constant strings in moules implementing
particular input device. If a module unregisters input device and then gets
unloaded, the device could still be present in memory (pinned via sysfs),
but aforementioned members would point to some random memory. Set them all
to NULL when unregistering so sysfs handlers won't try dereferencing them.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
This reverts commits
3e3318dee0 [PATCH] swsusp: x86_64 mark special saveable/unsaveable pages
b6370d96e0 [PATCH] swsusp: i386 mark special saveable/unsaveable pages
ce4ab0012b [PATCH] swsusp: add architecture special saveable pages support
because not only do they apparently cause page faults on x86, the
infrastructure doesn't compile on powerpc.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* git://git.linux-nfs.org/pub/linux/nfs-2.6: (51 commits)
nfs: remove nfs_put_link()
nfs-build-fix-99
git-nfs-build-fixes
Merge branch 'odirect'
NFS: alloc nfs_read/write_data as direct I/O is scheduled
NFS: Eliminate nfs_get_user_pages()
NFS: refactor nfs_direct_free_user_pages
NFS: remove user_addr, user_count, and pos from nfs_direct_req
NFS: "open code" the NFS direct write rescheduler
NFS: Separate functions for counting outstanding NFS direct I/Os
NLM: Fix reclaim races
NLM: sem to mutex conversion
locks.c: add the fl_owner to nlm_compare_locks
NFS: Display the chosen RPCSEC_GSS security flavour in /proc/mounts
NFS: Split fs/nfs/inode.c
NFS: Fix typo in nfs_do_clone_mount()
NFS: Fix compile errors introduced by referrals patches
NFSv4: Ensure that referral mounts bind to a reserved port
NFSv4: A root pathname is sent as a zero component4
NFSv4: Follow a referral
...
* master.kernel.org:/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (244 commits)
V4L/DVB (4210b): git-dvb: tea575x-tuner build fix
V4L/DVB (4210a): git-dvb versus matroxfb
V4L/DVB (4209): Added some BTTV PCI IDs for newer boards
Fixes some sync issues between V4L/DVB development and GIT
V4L/DVB (4206): Cx88-blackbird: always set encoder height based on tvnorm->id
V4L/DVB (4205): Merge tda9887 module into tuner.
V4L/DVB (4203): Explicitly set the enum values.
V4L/DVB (4202): allow selecting CX2341x port mode
V4L/DVB (4200): Disable bitrate_mode when encoding mpeg-1.
V4L/DVB (4199): Add cx2341x-specific control array to cx2341x.c
V4L/DVB (4198): Avoid newer usages of obsoleted experimental MPEGCOMP API
V4L/DVB (4197): Port new MPEG API to saa7134-empress with saa6752hs
V4L/DVB (4196): Port cx88-blackbird to the new MPEG API.
V4L/DVB (4193): Update cx2341x fw encoding API doc.
V4L/DVB (4192): Use control helpers for saa7115, cx25840, msp3400.
V4L/DVB (4191): Add CX2341X MPEG encoder module.
V4L/DVB (4190): Add helper functions for control processing to v4l2-common.
V4L/DVB (4189): Add videodev support for VIDIOC_S/G/TRY_EXT_CTRLS.
V4L/DVB (4188): Add new MPEG control/ioctl definitions to videodev2.h
V4L/DVB (4186): Add support for the DNTV Live! mini DVB-T card.
...
Stringify does what it was told to do.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In current 2.6.17 implementation, signal_struct refered from task_struct is
used for per-process data structure. The pacct facility also uses it as a
per-process data structure to store stime, utime, minflt, majflt. But those
members are saved in __exit_signal(). It's too late.
For example, if some threads exits at same time, pacct facility has a
possibility to drop accountings for a part of those threads. (see, the
following 'The results of original 2.6.17 kernel') I think accounting
information should be completely collected into the per-process data structure
before writing out an accounting record.
This patch fixes this matter. Accumulation of stime, utime, minflt and majflt
are done before generating accounting record.
[mingo@elte.hu: fix acct_collect() siglock bug found by lockdep]
Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When pacct facility generate an 'ac_flag' field in accounting record, it
refers a task_struct of the thread which died last in the process. But any
other task_structs are ignored.
Therefore, pacct facility drops ASU flag even if root-privilege operations are
used by any other threads except the last one. In addition, AFORK flag is
always set when the thread of group-leader didn't die last, although this
process has called execve() after fork().
We have a same matter in ac_exitcode. The recorded ac_exitcode is an exit
code of the last thread in the process. There is a possibility this exitcode
is not the group leader's one.
The pacct facility need an i/o operation when an accounting record is
generated. There is a possibility to wake OOM killer up. If OOM killer is
activated, it kills some processes to make them release process memory
regions.
But acct_process() is called in the killed processes context before calling
exit_mm(), so those processes cannot release own memory. In the results, any
processes stop in this point and it finally cause a system stall.
Add support for SyncLink GT2 adapter to driver.
Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add custom HDLC idle pattern feature.
It allows the user to specify an arbitrary 8 or 16 bit repeating pattern on
the transmit data pin between HDLC frames.
In most cases the idle pattern is continuous ones or flags as supported by off
the shelf synchronous controllers and defined in the ISO3309 standard. Some
applications (radio/satellite modems, connections to legacy military hardware)
require non-standard patterns.
Signed-off-by: Paul Fulghum <paulkf@microgate.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move kthread API kernel-doc from kthread.h to kthread.c & fix it.
Add kthread API to kernel-api DocBook.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement kasprintf, a kernel version of asprintf. This allocates the
memory required for the formatted string, including the trailing '\0'.
Returns NULL on allocation failure.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix kernel-doc formatting in ktime.h and hrtimer.[ch] files.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When the linkat() syscall was added the flag parameter was added in the
last minute but it wasn't used so far. The following patch should change
that. My tests show that this is all that's needed.
If OLDNAME is a symlink setting the flag causes linkat to follow the
symlink and create a hardlink with the target. This is actually the
behavior POSIX demands for link() as well but Linux wisely does not do
this. With this flag (which will most likely be in the next POSIX
revision) the programmer can choose the behavior, defaulting to the safe
variant. As a side effect it is now possible to implement a
POSIX-compliant link(2) function for those who are interested.
touch file
ln -s file symlink
linkat(fd, "symlink", fd, "newlink", 0)
-> newlink is hardlink of symlink
linkat(fd, "symlink", fd, "newlink", AT_SYMLINK_FOLLOW)
-> newlink is hardlink of file
The value of AT_SYMLINK_FOLLOW is determined by the definition we already
use in glibc.
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update loop.c to use a kthread instead of a deprecated kernel_thread for
loop devices.
[akpm@osdl.org: don't change the thread's name]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add synchronous request interruption. This is needed for file locking
operations which have to be interruptible. However filesystem may implement
interruptibility of other operations (e.g. like NFS 'intr' mount option).
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds POSIX file locking support to the fuse interface.
This implementation doesn't keep any locking state in kernel. Unlocking on
close() is handled by the FLUSH message, which now contains the lock owner id.
Mandatory locking is not supported. The filesystem may enfoce mandatory
locking in userspace if needed.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches add POSIX file locking to the fuse interface.
Additional changes ralated to this are:
- asynchronous interrupt of requests by SIGKILL no longer supported
- separate control filesystem, instead of using sysfs objects
- add support for synchronously interrupting requests
Details are documented in Documentation/filesystems/fuse.txt throughout the
patches.
This patch:
Have fuse.h use MISC_MAJOR rather than a hardcoded '10'.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The table is empty, why does it still exist?
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
RTC: Add exported function rtc_year_days() to calculate the tm_yday value.
Signed-off-by: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for the v3020 RTC from EM Microelectronic.
The v3020 RTC is designed to be connected on a bus using only one data bit.
Since any data bit may be used, it is necessary to specify this to the
driver by passing a struct v3020_platform_data pointer (see
include/linux/rtc-v3020.h) to the driver.
Part of the following code comes from the kernel patchs produced by
Compulab for their products. The original file (available here:
http://raph.people.8d.com/misc/emv3020.c) was released under the terms of
the GPL license.
[akpm@osdl.org: cleanups]
Signed-off-by: Raphael Assenat <raph@raphnet.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Import genrtc's RTC UIE emulation (CONFIG_GEN_RTC_X) to rtc-dev driver with
slight adjustments/refinements. This makes UIE-less rtc drivers work
better with programs doing read/poll on /dev/rtc, such as hwclock. This
emulation should not harm rtc drivers with UIE support, since
rtc_dev_ioctl() calls underlaying rtc driver's ioctl() first.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
A few days ago Arjan signaled a lockdep red flag on epoll locks, and
precisely between the epoll's device structure lock (->lock) and the wait
queue head lock (->lock).
Like I explained in another email, and directly to Arjan, this can't happen
in reality because of the explicit check at eventpoll.c:592, that does not
allow to drop an epoll fd inside the same epoll fd. Since lockdep is
working on per-structure locks, it will never be able to know of policies
enforced in other parts of the code.
It was decided time ago of having the ability to drop epoll fds inside
other epoll fds, that triggers a very trick wakeup operations (due to
possibly reentrant callback-driven wakeups) handled by the
ep_poll_safewake() function. While looking again at the code though, I
noticed that all the operations done on the epoll's main structure wait
queue head (->wq) are already protected by the epoll lock (->lock), so that
locked-style functions can be used to manipulate the ->wq member. This
makes both a lock-acquire save, and lockdep happy.
Running totalmess on my dual opteron for a while did not reveal any problem
so far:
http://www.xmailserver.org/totalmess.c
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On UP, this:
cpumask_t mask = node_to_cpumask(numa_node_id());
for_each_cpu_mask(cpu, mask)
does this:
mm/readahead.c: In function `node_readahead_aging':
mm/readahead.c:850: warning: unused variable `mask'
which is unpleasantly fixed by this:
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert the ext3 in-kernel filesystem blocks to ext3_fsblk_t. Convert the
rest of all unsigned long type in-kernel filesystem blocks to ext3_fsblk_t,
and replace the printk format string respondingly.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some of the in-kernel ext3 block variable type are treated as signed 4 bytes
int type, thus limited ext3 filesystem to 8TB (4kblock size based). While
trying to fix them, it seems quite confusing in the ext3 code where some
blocks are filesystem-wide blocks, some are group relative offsets that need
to be signed value (as -1 has special meaning). So it seem saner to define
two types of physical blocks: one is filesystem wide blocks, another is
group-relative blocks. The following patches clarify these two types of
blocks in the ext3 code, and fix the type bugs which limit current 32 bit ext3
filesystem limit to 8TB.
With this series of patches and the percpu counter data type changes in the mm
tree, we are able to extend exts filesystem limit to 16TB.
This work is also a pre-request for the recent >32 bit ext3 work, and makes
the kernel to able to address 48 bit ext3 block a lot easier: Simply redefine
ext3_fsblk_t from unsigned long to sector_t and redefine the format string for
ext3 filesystem block corresponding.
Two RFC with a series patches have been posted to ext2-devel list and have
been reviewed and discussed:
http://marc.theaimsgroup.com/?l=ext2-devel&m=114722190816690&w=2http://marc.theaimsgroup.com/?l=ext2-devel&m=114784919525942&w=2
Patches are tested on both 32 bit machine and 64 bit machine, <8TB ext3 and
>8TB ext3 filesystem(with the latest to be released e2fsprogs-1.39). Tests
includes overnight fsx, tiobench, dbench and fsstress.
This patch:
Defines ext3_fsblk_t and ext3_grpblk_t, and the printk format string for
filesystem wide blocks.
This patch classifies all block group relative blocks, and ext3_fsblk_t blocks
occurs in the same function where used to be confusing before. Also include
kernel bug fixes for filesystem wide in-kernel block variables. There are
some fileystem wide blocks are treated as int/unsigned int type in the kernel
currently, especially in ext3 block allocation and reservation code. This
patch fixed those bugs by converting those variables to ext3_fsblk_t(unsigned
long) type.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Driver for the simple parallel port interface on the Asix AX88796 chip on
an platform_bus.
[akpm@osdl.org: x86_64 build fix]
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a patch from Alan that fixes a real ide-cd.c regression causing
bogus "Media Check" failures for perfectly valid Fedora install ISOs, on
certain CD-ROM drives.
This is a forward port to 2.6.16 (from RHEL) of the minimal changes for the
end of media problem. It may not be sufficient for some controllers
(promise notably) and it does not touch the locking so the error path
locking is as horked as in mainstream.
From: Ingo Molnar <mingo@elte.hu>
I have ported the patch to 2.6.17-rc4 and tested it by provoking
end-of-media IO errors with an unaligned ISO image. Unlike the vanilla
kernel, the patched kernel interpreted the error condition correctly with
512 byte granularity:
hdc: command error: status=0x51 { DriveReady SeekComplete Error }
hdc: command error: error=0x54 { AbortedCommand LastFailedSense=0x05 }
ide: failed opcode was: unknown
ATAPI device hdc:
Error: Illegal request -- (Sense key=0x05)
Illegal mode for this track or incompatible medium -- (asc=0x64, ascq=0x00)
The failed "Read 10" packet command was:
"28 00 00 04 fb 78 00 00 06 00 00 00 00 00 00 00 "
end_request: I/O error, dev hdc, sector 1306080
Buffer I/O error on device hdc, logical block 163260
Buffer I/O error on device hdc, logical block 163261
Buffer I/O error on device hdc, logical block 163262
the unpatched kernel produces an incorrect error dump:
hdc: command error: status=0x51 { DriveReady SeekComplete Error }
hdc: command error: error=0x54 { AbortedCommand LastFailedSense=0x05 }
ide: failed opcode was: unknown
end_request: I/O error, dev hdc, sector 1306080
Buffer I/O error on device hdc, logical block 163260
hdc: command error: status=0x51 { DriveReady SeekComplete Error }
hdc: command error: error=0x54 { AbortedCommand LastFailedSense=0x05 }
ide: failed opcode was: unknown
end_request: I/O error, dev hdc, sector 1306088
Buffer I/O error on device hdc, logical block 163261
hdc: command error: status=0x51 { DriveReady SeekComplete Error }
hdc: command error: error=0x54 { AbortedCommand LastFailedSense=0x05 }
ide: failed opcode was: unknown
end_request: I/O error, dev hdc, sector 1306096
Buffer I/O error on device hdc, logical block 163262
I do not have the right type of CD-ROM drive to reproduce the end-of-media
data corruption bug myself, but this same patch in RHEL solved it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Cc: Jens Axboe <axboe@suse.de>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use loop "cursor" instead of loop "counter" for list iterator descriptions.
They are not counters, they are pointers or positions.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kernel-doc:
Put all short function descriptions on one line or if they are too long,
omit the short description & add a Description: section for them.
Change some list iterator descriptions to use "current" point instead of
"existing" point.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- proper prototypes for the following functions:
- ctrl_alt_del() (in include/linux/reboot.h)
- getrusage() (in include/linux/resource.h)
- make the following needlessly global functions static:
- kernel_restart_prepare()
- kernel_kexec()
[akpm@osdl.org: compile fix]
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently printk is no use for early debugging because it refuses to
actually print anything to the console unless
cpu_online(smp_processor_id()) is true.
The stated explanation is that console drivers may require per-cpu
resources, or otherwise barf, because the system is not yet setup
correctly. Fair enough.
However some console drivers might be quite happy running early during
boot, in fact we have one, and so it'd be nice if printk understood that.
So I added a flag (which I would have called CON_BOOT, but that's taken)
called CON_ANYTIME, which indicates that a console is happy to be called
anytime, even if the cpu is not yet online.
Tested on a Power 5 machine, with both a CON_ANYTIME driver and a bogus
console driver that BUG()s if called while offline. No problems AFAICT.
Built for i386 UP & SMP.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make two needlessly global functions static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ufs super block contains some statistic about file systems, like amount of
directories, free blocks, inodes and so on.
UFS1 hold this information in one location and uses 32bit integers for such
information, UFS2 hold statistic in another location and uses 64bit integers.
There is transition variant, if UFS1 has type 44BSD and flags field in super
block has some special value this mean that we work with statistic like UFS2
does. and this also means that nobody care about old(UFS1) statistic.
So if start fsck against such file system, after usage linux ufs driver, it
found error: at now only UFS1 like statistic is updated.
This patch should fix this. Also it contains some minor cleanup: CodingSytle
and remove unused variables.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Super block of UFS usually has size >512, because of fragment size may be 512,
this cause some problems.
Currently, there are two methods to work with ufs super block:
1) split structure which describes ufs super blocks into structures with
size <=512
2) use one structure which describes ufs super block, and hope that array
of "buffer_head" which holds "super block", has such construction:
bh[n]->b_data + bh[n]->b_size == bh[n + 1]->b_data
The second variant may cause some problems in the future, and usage of two
variants cause unnecessary code duplication.
This patch remove the second variant. Also patch contains some CodingStyle
fixes.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch make little optimization of ufs_find_entry like "ext2" does. Save
number of page and reuse it again in the next call.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently to turn on debug mode "user" has to edit ~10 files, to turn off he
has to do it again.
This patch introduce such changes:
1)turn on(off) debug messages via ".config"
2)remove unnecessary duplication of code
3)make "UFSD" macros more similar to function
4)fix some compiler warnings
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are two ugly macros in ufs code:
#define UCPI_UBH ((struct ufs_buffer_head *)ucpi)
#define USPI_UBH ((struct ufs_buffer_head *)uspi)
when uspi looks like
struct {
struct ufs_buffer_head ;
}
and USPI_UBH has some sence,
ucpi looks like
struct {
struct not_ufs_buffer_head;
}
To prevent bugs in future, this patch convert macros to inline function and
fix "ucpi" structure.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change function in fs/ufs/dir.c and fs/ufs/namei.c to work with pages
instead of straight work with blocks. It fixed such bugs:
* for i in `seq 1 1000`; do touch $i; done - crash system
* mkdir create directory without "." and ".." entries
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
First of all some necessary notes about UFS by it self: To avoid waste of disk
space the tail of file consists not from blocks (which is ordinary big enough,
16K usually), it consists from fragments(which is ordinary 2K). When file is
growing its tail occupy 1 fragment, 2 fragments... At some stage decision to
allocate whole block is made and all fragments are moved to one block.
How this situation was handled before:
ufs_prepare_write
->block_prepare_write
->ufs_getfrag_block
->...
->ufs_new_fragments:
bh = sb_bread
bh->b_blocknr = result + i;
mark_buffer_dirty (bh);
This is wrong solution, because:
- it didn't take into consideration that there is another cache: "inode page
cache"
- because of sb_getblk uses not b_blocknr, (it uses page->index) to find
certain block, this breaks sb_getblk.
How this situation is handled now: we go though all "page inode cache", if
there are no such page in cache we load it into cache, and change b_blocknr.
Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains a total rewrite of the backlight infrastructure for
portable Apple computers. Backward compatibility is retained. A sysfs
interface allows userland to control the brightness with more steps than
before. Userland is allowed to upload a brightness curve for different
monitors, similar to Mac OS X.
[akpm@osdl.org: add needed exports]
Signed-off-by: Michael Hanselmann <linux-kernel@hansmi.ch>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hooks for calling vma specific migration functions
With this patch a vma may define a vma->vm_ops->migrate function. That
function may perform page migration on its own (some vmas may not contain page
structs and therefore cannot be handled by regular page migration. Pages in a
vma may require special preparatory treatment before migration is possible
etc) . Only mmap_sem is held when the migration function is called. The
migrate() function gets passed two sets of nodemasks describing the source and
the target of the migration. The flags parameter either contains
MPOL_MF_MOVE which means that only pages used exclusively by
the specified mm should be moved
or
MPOL_MF_MOVE_ALL which means that pages shared with other processes
should also be moved.
The migration function returns 0 on success or an error condition. An error
condition will prevent regular page migration from occurring.
On its own this patch cannot be included since there are no users for this
functionality. But it seems that the uncached allocator will need this
functionality at some point.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove VM_LOCKED before remap_pfn range from device drivers and get rid of
VM_SHM.
remap_pfn_range() already sets VM_IO. There is no need to set VM_SHM since
it does nothing. VM_LOCKED is of no use since the remap_pfn_range does not
place pages on the LRU. The pages are therefore never subject to swap
anyways. Remove all the vm_flags settings before calling remap_pfn_range.
After removing all the vm_flag settings no use of VM_SHM is left. Drop it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert a few stragglers over to for_each_possible_cpu(), remove
for_each_cpu().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix various problems with nfs4 disabled. And various other things.
In file included from fs/nfs/inode.c:50:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
fs/nfs/inode.c: In function 'init_once':
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'open_states'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'delegation_state'
fs/nfs/inode.c:1116: error: 'struct nfs_inode' has no member named 'rwsem'
distcc[26452] ERROR: compile fs/nfs/inode.c on g5/64 failed
make[1]: *** [fs/nfs/inode.o] Error 1
make: *** [fs/nfs/inode.o] Error 2
make: *** Waiting for unfinished jobs....
In file included from fs/nfs/nfs3xdr.c:26:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
distcc[26486] ERROR: compile fs/nfs/nfs3xdr.c on g5/64 failed
make[1]: *** [fs/nfs/nfs3xdr.o] Error 1
make: *** [fs/nfs/nfs3xdr.o] Error 2
In file included from fs/nfs/nfs3proc.c:24:
fs/nfs/internal.h:24: error: static declaration of 'nfs_do_refmount' follows non-static declaration
include/linux/nfs_fs.h:320: error: previous declaration of 'nfs_do_refmount' was here
fs/nfs/internal.h:65: warning: 'struct nfs4_fs_locations' declared inside parameter list
fs/nfs/internal.h:65: warning: its scope is only this definition or declaration, which is probably not what you want
fs/nfs/internal.h: In function 'nfs4_path':
fs/nfs/internal.h:97: error: 'struct nfs_server' has no member named 'mnt_path'
distcc[26469] ERROR: compile fs/nfs/nfs3proc.c on bix/32 failed
make[1]: *** [fs/nfs/nfs3proc.o] Error 1
make: *** [fs/nfs/nfs3proc.o] Error 2
**FAILED**
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andreas Gruenbacher <agruen@suse.de>
Cc: Andy Adamson <andros@citi.umich.edu>
Cc: Chuck Lever <cel@netapp.com>
Cc: David Howells <dhowells@redhat.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Manoj Naik <manoj@almaden.ibm.com>
Cc: Marc Eshel <eshel@almaden.ibm.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
It's better to use explicit enums. It reduces the chance of someone
inserting new enums in the middle which would break things.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Put old MPEGCOMP API under #if __KERNEL__ and issue warnings when used.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The old, experimental, VIDIOC_S/G_CODEC API to pass MPEG parameters is now
obsolete and is replaced by 'extended controls' which offer more flexibility
and are hopefully more future proof.
Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Make life easier for distro guys, by removing the need of including
at the userspace header.
Also, linux/compiler.h is not needed at userspace.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The zoran driver uses strncpy() in an unsafe way. This patch uses the proper
sizeof()-1 size parameter. Since all strncpy() targets are initialised with
memset() the trailing '\0' is already set. Where std->name was the target for
the strncpy() we overwrote 8 Bytes of the std structure with zeros.
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The videodev.h and videodev2.h describe the public API for V4L and V4L2.
It shouldn't have there any kernel-specific stuff. Those were moved to
v4l2-dev.h.
This patch removes some uneeded headers and include v4l2-common.h on all
V4L driver. This header includes device implementation of V4L2 API provided
on v4l2-dev.h as well as V4L2 internal ioctls that provides connections
between master driver and its i2c devices.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Videodev now is capable of better handling V4L2 api, by
processing V4L2 ioctls and using callbacks to the driver.
The drivers should be migrated to the newer way and the older
one will be obsoleted soon.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
linux/videodev2.h uses types such as __u8 but it fails to include
<linux/types.h>. Within the kernel, that's not a problem because
<linux/time.h> already includes <linux/types.h>. However, there are
user apps that try to include videodev2.h (e.g., ekiga) and at least
on ia64, it causes compilation failures since <linux/types.h> doesn't
get included for any other reason, leaving __u8 etc. undefined. The
attached patch fixes the problem for me.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Add support for the AverMedia 6 Eyes MJPEG card.
- Updated drivers/media/video/Kconfig with AVS6EYES
options.
- Added CONFIG_VIDEO_ZORAN_AVS6EYES to
drivers/media/video/Makefile.
- Added I2C_DRIVERID_BT866 and I2C_DRIVERID_KS0127 to
include/linux/i2c-id.h
- Added drivers/media/video/ks0127.c, imported and modified from
the Marvel project.
- Added drivers/media/video/ks0127.h, imported and modified from
the Marvel project.
- Added drivers/media/video/bt866.c, ported from a 2.4 version
by Christer Weinigel.
- Added AVS6EYES to drivers/media/video/zoran_card.c
- Added input_mux to all cards in drivers/media/video/zoran_card.c
- Added input mux module parameter to drivers/media/video/zoran_card.c
- Added AVS6EYES to card_type in drivers/media/video/zoran.h
- Added input_mux to card_info in drivers/media/video/zoran.h
- Upped BUZ_MAX_INPUT in drivers/media/video/zoran.h from 8 to 16,
as the AVS6EYES has 10.
- Updated Documentation/video4linux/Zoran with information about AVS6EYES.
Signed-off-by: Martin Samuelsson <sam@home.se>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The ioctl DMX_GET_EVENT has never been implemented.
I guess no software is using it because of its lack of implementation.
Future software won't use it, too, because this API doesn't make much
sense the way it is: Frontend events have their own different API.
Scrambling events can't be generated in a useful way by the hardware I
know of.
Signed-off-by: Andreas Oberritter <obi@linuxtv.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
* master.kernel.org:/home/rmk/linux-2.6-arm: (25 commits)
[ARM] 3648/1: Update struct ucontext layout for coprocessor registers
[ARM] Add identifying number for non-rt sigframe
[ARM] Gather common sigframe saving code into setup_sigframe()
[ARM] Gather common sigframe restoration code into restore_sigframe()
[ARM] Re-use sigframe within rt_sigframe
[ARM] Merge sigcontext and sigmask members of sigframe
[ARM] Replace extramask with a full copy of the sigmask
[ARM] Remove rt_sigframe puc and pinfo pointers
[ARM] 3647/1: S3C24XX: add Osiris to the list of simtec pm machines
[ARM] 3645/1: S3C2412: irq support for external interrupts
[ARM] 3643/1: S3C2410: Add new usb clocks
[ARM] 3642/1: S3C24XX: Add machine SMDK2413
[ARM] 3641/1: S3C2412: Fixup gpio register naming
[ARM] 3640/1: S3C2412: Use S3C24XX_DCLKCON instead of S3C2410_DCLKCON
[ARM] 3639/1: S3C2412: serial port support
[ARM] 3638/1: S3C2412: core clocks
[ARM] 3637/1: S3C24XX: Add mpll clock, and set as fclk parent
[ARM] 3636/1: S3C2412: Add selection of CPU_ARM926
[ARM] 3635/1: S3C24XX: Add S3C2412 core cpu support
[ARM] 3633/1: S3C24XX: s3c2410 gpio bugfix - wrong pin nos
...
Considering that there isn't a lot of hw we can depend on during resume,
this is about as good as it gets.
This is x86-only for now, although the basic concept (and most of the
code) will certainly work on almost any platform.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch from Ben Dooks
Serial port support for the on-board UART blocks
on the Samsung S3C2412 and S3C2413 UARTs.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Neil Brown observed that the kmalloc() in nfs_get_user_pages() is more
likely to fail if the I/O is large enough to require the allocation of more
than a single page to keep track of all the pinned pages in the user's
buffer.
Instead of tracking one large page array per dreq/iocb, track pages per
nfs_read/write_data, just like the cached I/O path does. An array for
pages is already allocated for us by nfs_readdata_alloc() (and the write
and commit equivalents).
This is also required for adding support for vectored I/O to the NFS direct
I/O path.
The original reason to pin the user buffer and allocate all the NFS data
structures before trying to schedule I/O was to ensure all needed resources
are allocated on the client before starting to send requests. This reduces
the chance that resource exhaustion on the client will cause a short read
or write.
On the other hand, for an application making very large application I/O
requests, this means that it will be nearly impossible for the application
to make forward progress on a resource-limited client.
Thus, moving the buffer pinning functionality into the I/O scheduling
loops should be good for scalability. The next patch will do the same for
NFS data structure allocation.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
They all duplicate macros to check for empty root and/or node, and
clearing a node. So put those in rbtree.h.
Signed-off-by: Jens Axboe <axboe@suse.de>
A process flag to indicate whether we are doing sync io is incredibly
ugly. It also causes performance problems when one does a lot of async
io and then proceeds to sync it. Part of the io will go out as async,
and the other part as sync. This causes a disconnect between the
previously submitted io and the synced io. For io schedulers such as CFQ,
this will cause us lost merges and suboptimal behaviour in scheduling.
Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
the O_DIRECT path just directly indicate that the writes are sync
by using WRITE_SYNC instead.
Signed-off-by: Jens Axboe <axboe@suse.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (65 commits)
ACPI: suppress power button event on S3 resume
ACPI: resolve merge conflict between sem2mutex and processor_perflib.c
ACPI: use for_each_possible_cpu() instead of for_each_cpu()
ACPI: delete newly added debugging macros in processor_perflib.c
ACPI: UP build fix for bugzilla-5737
Enable P-state software coordination via _PDC
P-state software coordination for speedstep-centrino
P-state software coordination for acpi-cpufreq
P-state software coordination for ACPI core
ACPI: create acpi_thermal_resume()
ACPI: create acpi_fan_suspend()/acpi_fan_resume()
ACPI: pass pm_message_t from acpi_device_suspend() to root_suspend()
ACPI: create acpi_device_suspend()/acpi_device_resume()
ACPI: replace spin_lock_irq with mutex for ec poll mode
ACPI: Allow a WAN module enable/disable on a Thinkpad X60.
sem2mutex: acpi, acpi_link_lock
ACPI: delete unused acpi_bus_drivers_lock
sem2mutex: drivers/acpi/processor_perflib.c
ACPI add ia64 exports to build acpi_memhotplug as a module
ACPI: asus_acpi_init(): propagate correct return value
...
Manual resolve of conflicts in:
arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
include/acpi/processor.h
Split the checkpoint list of the transaction into two lists. In the first
list we keep the buffers that need to be submitted for IO. In the second
list are kept buffers that were already submitted and we just have to wait
for the IO to complete. This should simplify a handling of checkpoint
lists a bit and can eventually be also a performance gain.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Correct the return type of handle_IRQ_event() (inconsistency noticed during
Xen development), and remove redundant declarations. The return type
adjustment required breaking out the definition of irqreturn_t into a
separate header, in order to satisfy current include order dependencies.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
list_replace() is similar to list_replace_rcu(), but unlike
list_replace_rcu() it
could be used when list_empty(old) == 1
doesn't use barriers
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are three different IO cards which an SGI IOC4 controller may find
itself on. One of these variants does not bring out the IDE and serial
signals, so we need to disable attaching the corresponding IOC4 subdrivers
to such cards.
Cleans up message clutter emitted during device probing.
Signed-off-by: Brent Casavant <bcasavan@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove synchronize_kernel() (deprecated 2-APR-2005 in
http://lkml.org/lkml/2005/4/3/11) and makes the RCU API inaccessible to
non-GPL Linux kernel modules (as was announced more than one year ago in
http://lkml.org/lkml/2005/4/3/8). Tested on x86 and ppc64.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new strstrip() function to lib/string.c for removing leading and
trailing whitespace from a string.
Cc: Michael Holzheu <holzheu@de.ibm.com>
Acked-by: Ingo Oeser <ioe-lkml@rameria.de>
Acked-by: Joern Engel <joern@wohnheim.fh-wedel.de>
Cc: Corey Minyard <minyard@acm.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the license on the process event structure passed between kernel and
userspace.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Acked-by: Nguyen Anh Quynh <aquynh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move connector header include to precisely where it's needed.
Remove unused time.h header file as well. This was leftover from previous
iterations of the process events patches.
Signed-off-by: Matt Helsley <matthltc@us.ibm.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: Nguyen Anh Quynh <aquynh@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The likely() profiling tools show that __alloc_page() causes a lot of misses:
! 132 119193 __alloc_pages():mm/page_alloc.c@937
Because most __alloc_page() calls are not atomic.
Signed-off-by: Hua Zhong <hzhong@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The percpu counter data type are changed in this set of patches to support
more users like ext3 who need more than 32 bit to store the free blocks
total in the filesystem.
- Generic perpcu counters data type changes. The size of the global counter
and local counter were explictly specified using s64 and s32. The global
counter is changed from long to s64, while the local counter is changed from
long to s32, so we could avoid doing 64 bit update in most cases.
- Users of the percpu counters are updated to make use of the new
percpu_counter_init() routine now taking an additional parameter to allow
users to pass the initial value of the global counter.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
After a lot of reading the code and thinking about how it behaves I have
managed to figure out what the current ptrace locking rules are. The
current code is in much better that it appears at first glance. The
troublesome code paths are actually the code paths that violate the current
rules.
ptrace uses simple exclusive access as it's locking. You can only touch
task->ptrace if the task is stopped and you are the ptracer, or if the task
is running and are the task itself.
Very simple, very easy to maintain. It just needs to be documented so
people know not to touch ptrace from elsewhere.
Currently we do have a few pieces of code that are in violation of this
rule. Particularly the core dump code, and ptrace_attach. But so far the
code looks fixable.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
"Dual MIT/GPL" is also accepted (kernel/module.c), so updated comments.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We can now make posix_locks_deadlock() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Pass the POSIX lock owner ID to the flush operation.
This is useful for filesystems which don't want to store any locking state
in inode->i_flock but want to handle locking/unlocking POSIX locks
internally. FUSE is one such filesystem but I think it possible that some
network filesystems would need this also.
Also add a flag to indicate that a POSIX locking request was generated by
close(), so filesystems using the above feature won't send an extra locking
request in this case.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add read_mapping_page() which is used for callers that pass
mapping->a_ops->readpage as the filler for read_cache_page. This removes
some duplication from filesystem code.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Create two files in /sys/kernel, kexec_loaded and kexec_crash_loaded. Each
file contains a simple boolean value indicating whether the relevant kernel
has been loaded into memory. The motivation for this is geared around
support.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These definitions have long been superseded by asm-offsets.h
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1. Add architecture specific pages save/restore support. Next two patches
will use this to save/restore 'ACPI NVS' pages.
2. Allow reserved pages 'nosave'. This could avoid save/restore BIOS
reserved pages.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On i386, kernel irq balance doesn't work.
1) In function do_irq_balance, after kernel finds the min_loaded cpu but
before calling set_pending_irq to really pin the selected_irq to the
target cpu, kernel does a cpus_and with irq_affinity[selected_irq].
Later on, when the irq is acked, kernel would calls
move_native_irq=>desc->handler->set_affinity to change the irq affinity.
However, every function pointed by
hw_interrupt_type->set_affinity(unsigned int irq, cpumask_t cpumask)
always changes irq_affinity[irq] to cpumask. Next time when recalling
do_irq_balance, it has to do cpu_ands again with
irq_affinity[selected_irq], but irq_affinity[selected_irq] already
becomes one cpu selected by the first irq balance.
2) Function balance_irq in file arch/i386/kernel/io_apic.c has the same
issue.
[akpm@osdl.org: cleanups]
Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use the x86 cache-bypassing copy instructions for copy_from_user().
Some performance data are
Total of GLOBAL_POWER_EVENTS (CPU cycle samples)
2.6.12.4.orig 1921587
2.6.12.4.nt 1599424
1599424/1921587=83.23% (16.77% reduction)
BSQ_CACHE_REFERENCE (L3 cache miss)
2.6.12.4.orig 57427
2.6.12.4.nt 20858
20858/57427=36.32% (63.7% reduction)
L3 cache miss reduction of __copy_from_user_ll
samples %
37408 65.1412 vmlinux __copy_from_user_ll
23 0.1103 vmlinux __copy_user_zeroing_intel_nocache
23/37408=0.061% (99.94% reduction)
Top 5 of 2.6.12.4.nt
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % app name symbol name
128392 8.0274 vmlinux __copy_user_zeroing_intel_nocache
64206 4.0143 vmlinux journal_add_journal_head
59746 3.7355 vmlinux do_get_write_access
47674 2.9807 vmlinux journal_put_journal_head
46021 2.8774 vmlinux journal_dirty_metadata
pattern9-0-cpu4-0-09011728/summary.out
Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x3f (multiple flags) count 3000
samples % app name symbol name
69755 4.2861 vmlinux __copy_user_zeroing_intel_nocache
55685 3.4215 vmlinux journal_add_journal_head
52371 3.2179 vmlinux __find_get_block
45504 2.7960 vmlinux journal_put_journal_head
36005 2.2123 vmlinux journal_stop
pattern9-0-cpu4-0-09011744/summary.out
Counted BSQ_CACHE_REFERENCE events (cache references seen by the bus unit) with a unit mask of 0x200 (read 3rd level cache miss) count 3000
samples % app name symbol name
1147 5.4994 vmlinux journal_add_journal_head
881 4.2240 vmlinux journal_dirty_data
872 4.1809 vmlinux blk_rq_map_sg
734 3.5192 vmlinux journal_commit_transaction
617 2.9582 vmlinux radix_tree_delete
pattern9-0-cpu4-0-09011731/summary.out
iozone results are
original 2.6.12.4 CPU time = 207.768 sec
cache aware CPU time = 184.783 sec
(three times run)
184.783/207.768=88.94% (11.06% reduction)
original:
pattern9-0-cpu4-0-08191720/iozone.out: CPU Utilization: Wall time 45.997 CPU time 64.527 CPU utilization 140.28 %
pattern9-0-cpu4-0-08191741/iozone.out: CPU Utilization: Wall time 46.878 CPU time 71.933 CPU utilization 153.45 %
pattern9-0-cpu4-0-08191743/iozone.out: CPU Utilization: Wall time 45.152 CPU time 71.308 CPU utilization 157.93 %
cache awre:
pattern9-0-cpu4-0-09011728/iozone.out: CPU Utilization: Wall time 44.842 CPU time 62.465 CPU utilization 139.30 %
pattern9-0-cpu4-0-09011731/iozone.out: CPU Utilization: Wall time 44.718 CPU time 59.273 CPU utilization 132.55 %
pattern9-0-cpu4-0-09011744/iozone.out: CPU Utilization: Wall time 44.367 CPU time 63.045 CPU utilization 142.10 %
Signed-off-by: Hiro Yoshioka <hyoshiok@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds new security hook, task_movememory, to be called when memory
owened by a task is to be moved (e.g. when migrating pages to a this hook is
identical to the setscheduler implementation, but a separate hook introduced
to allow this check to be specialized in the future if necessary.
Since the last posting, the hook has been renamed following feedback from
Christoph Lameter.
Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement an LSM hook for setting a task's IO priority, similar to the hook
for setting a tasks's nice value.
A previous version of this LSM hook was included in an older version of
multiadm by Jan Engelhardt, although I don't recall it being submitted
upstream.
Also included is the corresponding SELinux hook, which re-uses the setsched
permission in the proccess class.
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The definition of the third parameter is a pointer to an array of virtual
addresses which give us some trouble. The existing code calculated the
wrong address in the array since I used void to avoid having to specify a
type.
I now use the correct type "compat_uptr_t __user *" in the definition of
the function in kernel/compat.c.
However, I used __u32 in syscalls.h. Would have to include compat.h there
in order to provide the same definition which would generate an ugly
include situation.
On both ia64 and x86_64 compat_uptr_t is u32. So this works although
parameter declarations differ.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sys_move_pages() support for 32bit (i386 plus x86_64 compat layer)
Add support for move_pages() on i386 and also add the compat functions
necessary to run 32 bit binaries on x86_64.
Add compat_sys_move_pages to the x86_64 32bit binary layer. Note that it is
not up to date so I added the missing pieces. Not sure if this is done the
right way.
[akpm@osdl.org: compile fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
move_pages() is used to move individual pages of a process. The function can
be used to determine the location of pages and to move them onto the desired
node. move_pages() returns status information for each page.
long move_pages(pid, number_of_pages_to_move,
addresses_of_pages[],
nodes[] or NULL,
status[],
flags);
The addresses of pages is an array of void * pointing to the
pages to be moved.
The nodes array contains the node numbers that the pages should be moved
to. If a NULL is passed instead of an array then no pages are moved but
the status array is updated. The status request may be used to determine
the page state before issuing another move_pages() to move pages.
The status array will contain the state of all individual page migration
attempts when the function terminates. The status array is only valid if
move_pages() completed successfullly.
Possible page states in status[]:
0..MAX_NUMNODES The page is now on the indicated node.
-ENOENT Page is not present
-EACCES Page is mapped by multiple processes and can only
be moved if MPOL_MF_MOVE_ALL is specified.
-EPERM The page has been mlocked by a process/driver and
cannot be moved.
-EBUSY Page is busy and cannot be moved. Try again later.
-EFAULT Invalid address (no VMA or zero page).
-ENOMEM Unable to allocate memory on target node.
-EIO Unable to write back page. The page must be written
back in order to move it since the page is dirty and the
filesystem does not provide a migration function that
would allow the moving of dirty pages.
-EINVAL A dirty page cannot be moved. The filesystem does not provide
a migration function and has no ability to write back pages.
The flags parameter indicates what types of pages to move:
MPOL_MF_MOVE Move pages that are only mapped by the process.
MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
Requires sufficient capabilities.
Possible return codes from move_pages()
-ENOENT No pages found that would require moving. All pages
are either already on the target node, not present, had an
invalid address or could not be moved because they were
mapped by multiple processes.
-EINVAL Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
to migrate pages in a kernel thread.
-EPERM MPOL_MF_MOVE_ALL specified without sufficient priviledges.
or an attempt to move a process belonging to another user.
-EACCES One of the target nodes is not allowed by the current cpuset.
-ENODEV One of the target nodes is not online.
-ESRCH Process does not exist.
-E2BIG Too many pages to move.
-ENOMEM Not enough memory to allocate control array.
-EFAULT Parameters could not be accessed.
A test program for move_pages() may be found with the patches
on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3
From: Christoph Lameter <clameter@sgi.com>
Detailed results for sys_move_pages()
Pass a pointer to an integer to get_new_page() that may be used to
indicate where the completion status of a migration operation should be
placed. This allows sys_move_pags() to report back exactly what happened to
each page.
Wish there would be a better way to do this. Looks a bit hacky.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of passing a list of new pages, pass a function to allocate a new
page. This allows the correct placement of MPOL_INTERLEAVE pages during page
migration. It also further simplifies the callers of migrate pages.
migrate_pages() becomes similar to migrate_pages_to() so drop
migrate_pages_to(). The batching of new page allocations becomes unnecessary.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do not leave pages on the lists passed to migrate_pages(). Seems that we will
not need any postprocessing of pages. This will simplify the handling of
pages by the callers of migrate_pages().
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Move comments for kmalloc to right place, currently it near __do_kmalloc
- Comments for kzalloc
- More detailed comments for kmalloc
- Appearance of "kmalloc" and "kzalloc" man pages after "make mandocs"
[rdunlap@xenotime.net: simplification]
Signed-off-by: Paul Drynoff <pauldrynoff@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initialise total_memory earlier in boot. Because if for some reason we run
page reclaim early in boot, we don't want total_memory to be zero when we use
it as a divisor.
And rename total_memory to vm_total_pages to avoid naming clashes with
architectures.
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Martin Bligh <mbligh@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new VMA operation to notify a filesystem or other driver about the
MMU generating a fault because userspace attempted to write to a page
mapped through a read-only PTE.
This facility permits the filesystem or driver to:
(*) Implement storage allocation/reservation on attempted write, and so to
deal with problems such as ENOSPC more gracefully (perhaps by generating
SIGBUS).
(*) Delay making the page writable until the contents have been written to a
backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS.
It permits the filesystem to have some guarantee about the state of the
cache.
(*) Account and limit number of dirty pages. This is one piece of the puzzle
needed to make shared writable mapping work safely in FUSE.
Needed by cachefs (Or is it cachefiles? Or fscache? <head spins>).
At least four other groups have stated an interest in it or a desire to use
the functionality it provides: FUSE, OCFS2, NTFS and JFFS2. Also, things like
EXT3 really ought to use it to deal with the case of shared-writable mmap
encountering ENOSPC before we permit the page to be dirtied.
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
get_user_pages(.write=1, .force=1) can generate COW hits on read-only
shared mappings, this patch traps those as mkpage_write candidates and fails
to handle them the old way.
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Joel Becker <Joel.Becker@oracle.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If CONFIG_SWAP is not defined we get:
mm/vmscan.c: In function âremove_mappingâ:
mm/vmscan.c:387: warning: unused variable âswapâ
Convert defines in swap.h into blank inline functions to fix this warning
and be consistent.
Signed-off-by: Con Kolivas <kernel@kolivas.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Record the node id as we mark sections for instantiation. Use this nid
during instantiation to direct allocations.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Mike Kravetz <kravetz@us.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Bob Picco <bob.picco@hp.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Martin Bligh <mbligh@google.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This implements the use of migration entries to preserve ptes of file backed
pages during migration. Processes can therefore be migrated back and forth
without loosing their connection to pagecache pages.
Note that we implement the migration entries only for linear mappings.
Nonlinear mappings still require the unmapping of the ptes for migration.
And another writepage() ugliness shows up. writepage() can drop the page
lock. Therefore we have to remove migration ptes before calling writepages()
in order to avoid having migration entries point to unlocked pages.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Rip the page migration logic out.
Remove all code that has to do with swapping during page migration.
This also guts the ability to migrate pages to swap. No one used that so lets
let it go for good.
Page migration should be a bit broken after this patch.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Implement read/write migration ptes
We take the upper two swapfiles for the two types of migration ptes and define
a series of macros in swapops.h.
The VM is modified to handle the migration entries. migration entries can
only be encountered when the page they are pointing to is locked. This limits
the number of places one has to fix. We also check in copy_pte_range and in
mprotect_pte_range() for migration ptes.
We check for migration ptes in do_swap_cache and call a function that will
then wait on the page lock. This allows us to effectively stop all accesses
to apge.
Migration entries are created by try_to_unmap if called for migration and
removed by local functions in migrate.c
From: Hugh Dickins <hugh@veritas.com>
Several times while testing swapless page migration (I've no NUMA, just
hacking it up to migrate recklessly while running load), I've hit the
BUG_ON(!PageLocked(p)) in migration_entry_to_page.
This comes from an orphaned migration entry, unrelated to the current
correctly locked migration, but hit by remove_anon_migration_ptes as it
checks an address in each vma of the anon_vma list.
Such an orphan may be left behind if an earlier migration raced with fork:
copy_one_pte can duplicate a migration entry from parent to child, after
remove_anon_migration_ptes has checked the child vma, but before it has
removed it from the parent vma. (If the process were later to fault on this
orphaned entry, it would hit the same BUG from migration_entry_wait.)
This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
not. There's no such problem with file pages, because vma_prio_tree_add
adds child vma after parent vma, and the page table locking at each end is
enough to serialize. Follow that example with anon_vma: add new vmas to the
tail instead of the head.
(There's no corresponding problem when inserting migration entries,
because a missed pte will leave the page count and mapcount high, which is
allowed for. And there's no corresponding problem when migrating via swap,
because a leftover swap entry will be correctly faulted. But the swapless
method has no refcounting of its entries.)
From: Ingo Molnar <mingo@elte.hu>
pte_unmap_unlock() takes the pte pointer as an argument.
From: Hugh Dickins <hugh@veritas.com>
Several times while testing swapless page migration, gcc has tried to exec
a pointer instead of a string: smells like COW mappings are not being
properly write-protected on fork.
The protection in copy_one_pte looks very convincing, until at last you
realize that the second arg to make_migration_entry is a boolean "write",
and SWP_MIGRATION_READ is 30.
Anyway, it's better done like in change_pte_range, using
is_write_migration_entry and make_migration_entry_read.
From: Hugh Dickins <hugh@veritas.com>
Remove unnecessary obfuscation from sys_swapon's range check on swap type,
which blew up causing memory corruption once swapless migration made
MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
From: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change handling of address spaces.
Pass a pointer to the address space in which the page is migrated to all
migration function. This avoids repeatedly having to retrieve the address
space pointer from the page and checking it for validity. The old page
mapping will change once migration has gone to a certain step, so it is less
confusing to have the pointer always available.
Move the setting of the mapping and index for the new page into
migrate_pages().
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the export for migrate_page_remove_references() and migrate_page_copy()
that are unlikely to be used directly by filesystems implementing migration.
The export was useful when buffer_migrate_page() lived in fs/buffer.c but it
has now been moved to migrate.c in the migration reorg.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a writeback_control's `start' and `end' fields are used to
indicate a one-byte-range starting at file offset zero, the required
values of .start=0,.end=0 mean that the ->writepages() implementation
has no way of telling that it is being asked to perform a range
request. Because we're currently overloading (start == 0 && end == 0)
to mean "this is not a write-a-range request".
To make all this sane, the patch changes range of writeback_control.
So caller does: If it is calling ->writepages() to write pages, it
sets range (range_start/end or range_cyclic) always.
And if range_cyclic is true, ->writepages() thinks the range is
cyclic, otherwise it just uses range_start and range_end.
This patch does,
- Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
-1 is usually ok for range_end (type is long long). But, if someone did,
range_end += val; range_end is "val - 1"
u64val = range_end >> bits; u64val is "~(0ULL)"
or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
things, and uses LLONG_MAX for range_end.
- All callers of ->writepages() sets range_start/end or range_cyclic.
- Fix updates of ->writeback_index. It seems already bit strange.
If it starts at 0 and ended by check of nr_to_write, this last
index may reduce chance to scan end of file. So, this updates
->writeback_index only if range_cyclic is true or whole-file is
scanned.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Steven French <sfrench@us.ibm.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The ability to have height 0 radix trees (a direct pointer to the data item
rather than going through a full node->slot) quietly disappeared with
old-2.6-bkcvs commit ffee171812d51652f9ba284302d9e5c5cc14bdfd. On 64-bit
machines this causes nearly 600 bytes to be used for every <= 4K file in
pagecache.
Re-introduce this feature, root tags stored in spare ->gfp_mask bits.
Simplify radix_tree_delete's complex tag clearing arrangement (which would
become even more complex) by just falling back to tag clearing functions
(the pagecache radix-tree never uses this path anyway, so the icache
savings will mean it's actually a speedup).
On my 4GB G5, this saves 8MB RAM per kernel kernel source+object tree in
pagecache.
Pagecache lookup, insertion, and removal speed for small files will also be
improved.
This makes RCU radix tree harder, but it's worth it.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Modify the gen_pool allocator (lib/genalloc.c) to utilize a bitmap scheme
instead of the buddy scheme. The purpose of this change is to eliminate
the touching of the actual memory being allocated.
Since the change modifies the interface, a change to the uncached allocator
(arch/ia64/kernel/uncached.c) is also required.
Both Andrey Volkov and Jes Sorenson have expressed a desire that the
gen_pool allocator not write to the memory being managed. See the
following:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113518602713125&w=2http://marc.theaimsgroup.com/?l=linux-kernel&m=113533568827916&w=2
Signed-off-by: Dean Nelson <dcn@sgi.com>
Cc: Andrey Volkov <avolkov@varma-el.com>
Acked-by: Jes Sorensen <jes@trained-monkey.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add remap_vmalloc_range, vmalloc_user, and vmalloc_32_user so that drivers
can have a nice interface for remapping vmalloc memory.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidate the various arch-specific implementations of pxm_to_node() and
node_to_pxm() into a single generic version.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current hugetlb strict accounting for shared mapping always assume mapping
starts at zero file offset and reserves pages between zero and size of the
file. This assumption often reserves (or lock down) a lot more pages then
necessary if application maps at none zero file offset. libhugetlbfs is
one example that requires proper reservation on shared mapping starts at
none zero offset.
This patch extends the reservation and hugetlb strict accounting to support
any arbitrary pair of (offset, len), resulting a much more robust and
accurate scheme. More importantly, it won't lock down any hugetlb pages
outside file mapping.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Acked-by: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reserve space in the swap disk header for a LABEL and UUID to be specified.
This has been possible with util-linux-2.12b (via e2fsprogs 1.36
libblkid), and is used by at least FC3 and later. The kernel doesn't
really care about this, but the space shouldn't accidentally be used by
something else either.
Also make the on-disk structures be fixed-size types, instead of "int",
though I don't know of any architecture in use where an "int" isn't the
same size as a "__u32" (all current kernel arches have it as "unsigned
int").
Signed-off-by: Andreas Dilger <adilger@shaw.ca>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds panic_on_oom sysctl under sys.vm.
When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue
processes. And if vm.panic_on_oom is 0 the kernel will do oom_kill() in
the same way as it does today. Of course, the default value is 0 and only
root can modifies it.
In general, oom_killer works well and kill rogue processes. So the whole
system can survive. But there are environments where panic is preferable
rather than kill some processes.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When add_zone() is called against empty zone (not populated zone), we have to
initialize the zone which didn't initialize at boot time. But,
init_currently_empty_zone() may fail due to allocation of wait table. So,
this patch is to catch its error code.
Changes against wait_table is in the next patch.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change definitions of some functions and data from __init to __meminit.
These functions and data can be used after bootup by this patch to be used for
hot-add codes.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is just to rename from wait_table_size() to wait_table_hash_nr_entries().
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
As Nick points out, only ia64 uses PG_uncached. So we can push it up into the
higher bits of the lower half of page->flags and make room for another flag on
32-bit machines.
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@sgi.com>
Cc: Jes Sorensen <jes@trained-monkey.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The buddy allocator has a requirement that boundaries between contigious
zones occur aligned with the the MAX_ORDER ranges. Where they do not we
will incorrectly merge pages cross zone boundaries. This can lead to pages
from the wrong zone being handed out.
Originally the buddy allocator would check that buddies were in the same
zone by referencing the zone start and end page frame numbers. This was
removed as it became very expensive and the buddy allocator already made
the assumption that zones boundaries were aligned.
It is clear that not all configurations and architectures are honouring
this alignment requirement. Therefore it seems safest to reintroduce
support for non-aligned zone boundaries. This patch introduces a new check
when considering a page a buddy it compares the zone_table index for the
two pages and refuses to merge the pages where they do not match. The
zone_table index is unique for each node/zone combination when
FLATMEM/DISCONTIGMEM is enabled and for each section/zone combination when
SPARSEMEM is enabled (a SPARSEMEM section is at least a MAX_ORDER size).
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.
This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.
linux/mount.h has been added where necessary to make allyesconfig build
successfully.
Interest has also been expressed for use with the FUSE and XFS filesystems.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Extend the get_sb() filesystem operation to take an extra argument that
permits the VFS to pass in the target vfsmount that defines the mountpoint.
The filesystem is then required to manually set the superblock and root dentry
pointers. For most filesystems, this should be done with simple_set_mnt()
which will set the superblock pointer and then set the root dentry to the
superblock's s_root (as per the old default behaviour).
The get_sb() op now returns an integer as there's now no need to return the
superblock pointer.
This patch permits a superblock to be implicitly shared amongst several mount
points, such as can be done with NFS to avoid potential inode aliasing. In
such a case, simple_set_mnt() would not be called, and instead the mnt_root
and mnt_sb would be set directly.
The patch also makes the following changes:
(*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
pointer argument and return an integer, so most filesystems have to change
very little.
(*) If one of the convenience function is not used, then get_sb() should
normally call simple_set_mnt() to instantiate the vfsmount. This will
always return 0, and so can be tail-called from get_sb().
(*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
dcache upon superblock destruction rather than shrink_dcache_anon().
This is required because the superblock may now have multiple trees that
aren't actually bound to s_root, but that still need to be cleaned up. The
currently called functions assume that the whole tree is rooted at s_root,
and that anonymous dentries are not the roots of trees which results in
dentries being left unculled.
However, with the way NFS superblock sharing are currently set to be
implemented, these assumptions are violated: the root of the filesystem is
simply a dummy dentry and inode (the real inode for '/' may well be
inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
with child trees.
[*] Anonymous until discovered from another tree.
(*) The documentation has been adjusted, including the additional bit of
changing ext2_* into foo_* in the documentation.
[akpm@osdl.org: convert ipath_fs, do other stuff]
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new <linux/dmaengine.h> header shouldn't be included from
the !__KERNEL__ portion of tcp.h
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a generic segmentation offload toggle that can be turned
on/off for each net device. For now it only supports in TCPv4.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the GSO implementation for IPv4 TCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the infrastructure for generic segmentation offload.
The idea is to tap into the potential savings of TSO without hardware
support by postponing the allocation of segmented skb's until just
before the entry point into the NIC driver.
The same structure can be used to support software IPv6 TSO, as well as
UFO and segmentation offload for other relevant protocols, e.g., DCCP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
going to scale if we add any more segmentation methods (e.g., DCCP). So
let's merge them.
They were used to tell the protocol of a packet. This function has been
subsumed by the new gso_type field. This is essentially a set of netdev
feature bits (shifted by 16 bits) that are required to process a specific
skb. As such it's easy to tell whether a given device can process a GSO
skb: you just have to and the gso_type field and the netdev's features
field.
I've made gso_type a conjunction. The idea is that you have a base type
(e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
For example, if we add a hardware TSO type that supports ECN, they would
declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would
have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
packets would be SKB_GSO_TCPV4. This means that only the CWR packets need
to be emulated in software.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
First of all it is unnecessary to allocate a new skb in skb_pad since
the existing one is not shared. More importantly, our hard_start_xmit
interface does not allow a new skb to be allocated since that breaks
requeueing.
This patch uses pskb_expand_head to expand the existing skb and linearize
it if needed. Actually, someone should sift through every instance of
skb_pad on a non-linear skb as they do not fit the reasons why this was
originally created.
Incidentally, this fixes a minor bug when the skb is cloned (tcpdump,
TCP, etc.). As it is skb_pad will simply write over a cloned skb. Because
of the position of the write it is unlikely to cause problems but still
it's best if we don't do it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (33 commits)
[PATCH] myri10ge - drop workaround pci_save_state() disabling MSI
[PATCH] myri10ge - drop workaround for the missing AER ext cap on nVidia CK804
via-velocity: the link is not correctly detected when the device starts
[PATCH] add b44 to maintainers
[PATCH] WAN: ioremap() failure checks in drivers
[PATCH] WAN: register_hdlc_device() doesn't need dev_alloc_name()
[PATCH] skb_padto()-area fixes in 8390, wavelan
[PATCH] make drivers/net/forcedeth.c:nv_update_pause() static
[PATCH] network driver for Hilscher netx
[PATCH] Dereference in tokenring/olympic.c
[PATCH] Array overrun in drivers/net/wireless/wavelan.c
[PATCH] Remove useless check in drivers/net/pcmcia/xirc2ps_cs.c
[PATCH] 8139cp: add ethtool eeprom support
[PATCH] 8139cp: fix eeprom read command length
[PATCH] b44: update b44 Kconfig entry
[PATCH] b44: update version to 1.01
[PATCH] b44: add wol for old nic
[PATCH] b44: add parameter
[PATCH] b44: add wol
[PATCH] b44: fix manual speed/duplex/autoneg settings
...
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (139 commits)
[POWERPC] re-enable OProfile for iSeries, using timer interrupt
[POWERPC] support ibm,extended-*-frequency properties
[POWERPC] Extra sanity check in EEH code
[POWERPC] Dont look for class-code in pci children
[POWERPC] Fix mdelay badness on shared processor partitions
[POWERPC] disable floating point exceptions for init
[POWERPC] Unify ppc syscall tables
[POWERPC] mpic: add support for serial mode interrupts
[POWERPC] pseries: Print PCI slot location code on failure
[POWERPC] spufs: one more fix for 64k pages
[POWERPC] spufs: fail spu_create with invalid flags
[POWERPC] spufs: clear class2 interrupt status before wakeup
[POWERPC] spufs: fix Makefile for "make clean"
[POWERPC] spufs: remove stop_code from struct spu
[POWERPC] spufs: fix spu irq affinity setting
[POWERPC] spufs: further abstract priv1 register access
[POWERPC] spufs: split the Cell BE support into generic and platform dependant parts
[POWERPC] spufs: dont try to access SPE channel 1 count
[POWERPC] spufs: use kzalloc in create_spu
[POWERPC] spufs: fix initial state of wbox file
...
Manually resolved conflicts in:
drivers/net/phy/Makefile
include/asm-powerpc/spu.h
Prepare for changes required to support SATA devices
attached to SAS HBAs. For these devices we don't want to
use host_set at all, since libata will not be the owner
of struct scsi_host.
Signed-off-by: Brian King <brking@us.ibm.com>
(with slight merge modifications made by...)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Currently, the only per-dev EH action is REVALIDATE. EH used to
exploit ehi->dev to do selective revalidation on a ATA bus. However,
this is a bit hacky and makes it impossible to request selective
revalidation from outside of EH or add another per-dev EH action.
This patch adds per-dev EH action mask eh_info->dev_action[] and
update EH to use this field for REVALIDATE. Note that per-dev actions
can still be specified at port-level and it has the same effect of
specifying the action for all devices on the port.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
David Boggs noticed that register_hdlc_device() no longer needs
to call dev_alloc_name() as it's called by register_netdev().
register_hdlc_device() is currently equivalent to register_netdev().
hdlc_setup() is now EXPORTed as per David's request.
Signed-off-by: Krzysztof Halasa <khc@pm.waw.pl>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (27 commits)
[PATCH] PCI: nVidia quirk to make AER PCI-E extended capability visible
[PATCH] PCI: fix issues with extended conf space when MMCONFIG disabled because of e820
[PATCH] PCI: Bus Parity Status sysfs interface
[PATCH] PCI: fix memory leak in MMCONFIG error path
[PATCH] PCI: fix error with pci_get_device() call in the mpc85xx driver
[PATCH] PCI: MSI-K8T-Neo2-Fir: run only where needed
[PATCH] PCI: fix race with pci_walk_bus and pci_destroy_dev
[PATCH] PCI: clean up pci documentation to be more specific
[PATCH] PCI: remove unneeded msi code
[PATCH] PCI: don't move ioapics below PCI bridge
[PATCH] PCI: cleanup unused variable about msi driver
[PATCH] PCI: disable msi mode in pci_disable_device
[PATCH] PCI: Allow MSI to work on kexec kernel
[PATCH] PCI: AMD 8131 MSI quirk called too late, bus_flags not inherited ?
[PATCH] PCI: Move various PCI IDs to header file
[PATCH] PCI Bus Parity Status-broken hardware attribute, EDAC foundation
[PATCH] PCI: i386/x86_84: disable PCI resource decode on device disable
[PATCH] PCI ACPI: Rename the functions to avoid multiple instances.
[PATCH] PCI: don't enable device if already enabled
[PATCH] PCI: Add a "enable" sysfs attribute to the pci devices to allow userspace (Xorg) to enable devices without doing foul direct access
...
Upgrade the zlib_inflate implementation in the kernel from a patched
version 1.1.3/4 to a patched 1.2.3.
The code in the kernel is about seven years old and I noticed that the
external zlib library's inflate performance was significantly faster (~50%)
than the code in the kernel on ARM (and faster again on x86_32).
For comparison the newer deflate code is 20% slower on ARM and 50% slower
on x86_32 but gives an approx 1% compression ratio improvement. I don't
consider this to be an improvement for kernel use so have no plans to
change the zlib_deflate code.
Various changes have been made to the zlib code in the kernel, the most
significant being the extra functions/flush option used by ppp_deflate.
This update reimplements the features PPP needs to ensure it continues to
work.
This code has been tested on ARM under both JFFS2 (with zlib compression
enabled) and ppp_deflate and on x86_32. JFFS2 sees an approx. 10% real
world file read speed improvement.
This patch also removes ZLIB_VERSION as it no longer has a correct value.
We don't need version checks anyway as the kernel's module handling will
take care of that for us. This removal is also more in keeping with the
zlib author's wishes (http://www.zlib.net/zlib_faq.html#faq24) and I've
added something to the zlib.h header to note its a modified version.
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Joern Engel <joern@wh.fh-wedel.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The race is that the shrink_dcache_memory shrinker could get called while a
filesystem is being unmounted, and could try to prune a dentry belonging to
that filesystem.
If it does, then it will call in to iput on the inode while the dentry is
no longer able to be found by the umounting process. If iput takes a
while, generic_shutdown_super could get all the way though
shrink_dcache_parent and shrink_dcache_anon and invalidate_inodes without
ever waiting on this particular inode.
Eventually the superblock gets freed anyway and if the iput tried to touch
it (which some filesystems certainly do), it will lose. The promised
"Self-destruct in 5 seconds" doesn't lead to a nice day.
The race is closed by holding s_umount while calling prune_one_dentry on
someone else's dentry. As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a filesystem
that is being unmounted, and unmount will not be able to start until any
such active prune_one_dentry completes.
This requires that prune_dcache *knows* which filesystem (if any) it is
doing the prune on behalf of so that it can be careful of other
filesystems. shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.
shrink_dcache_anon is now passed a super_block rather than the s_anon list
out of the superblock, so it can get the s_anon list itself, and can pass
the superblock down to prune_dcache.
If prune_dcache finds a dentry that it cannot free, it leaves it where it
is (at the tail of the list) and exits, on the assumption that some other
thread will be removing that dentry soon. To try to make sure that some
work gets done, a limited number of dnetries which are untouchable are
skipped over while choosing the dentry to work on.
I believe this race was first found by Kirill Korotaev.
Cc: Jan Blunck <jblunck@suse.de>
Acked-by: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Balbir Singh <balbir@in.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes the steal_locks() function.
steal_locks() doesn't work correctly with any filesystem that does it's own
lock management, including NFS, CIFS, etc.
In addition it has weird semantics on local filesystems in case tasks
sharing file-descriptor tables are doing POSIX locking operations in
parallel to execve().
The steal_locks() function has an effect on applications doing:
clone(CLONE_FILES)
/* in child */
lock
execve
lock
POSIX locks acquired before execve (by "child", "parent" or any further
task sharing files_struct) will after the execve be owned exclusively by
"child".
According to Chris Wright some LSB/LTP kind of suite triggers without the
stealing behavior, but there's no known real-world application that would
also fail.
Apps using NPTL are not affected, since all other threads are killed before
execve.
Apps using LinuxThreads are only affected if they
- have multiple threads during exec (LinuxThreads doesn't kill other
threads, the app may do it with pthread_kill_other_threads_np())
- rely on POSIX locks being inherited across exec
Both conditions are documented, but not their interaction.
Apps using clone() natively are affected if they
- use clone(CLONE_FILES)
- rely on POSIX locks being inherited across exec
The above scenarios are unlikely, but possible.
If the patch is vetoed, there's a plan B, that involves mostly keeping the
weird stealing semantics, but changing the way lock ownership is handled so
that network and local filesystems work consistently.
That would add more complexity though, so this solution seems to be
preferred by most people.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add the vendor-specific extended capability PCI_CAP_ID_VNDR. It is required
by the Myri-10G Ethernet driver.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a revocation notification method to the key type and calls it whilst
the key's semaphore is still write-locked after setting the revocation
flag.
The patch then uses this to maintain a reference on the task_struct of the
process that calls request_key() for as long as the authorisation key
remains unrevoked.
This fixes a potential race between two processes both of which have
assumed the authority to instantiate a key (one may have forked the other
for example). The problem is that there's no locking around the check for
revocation of the auth key and the use of the task_struct it points to, nor
does the auth key keep a reference on the task_struct.
Access to the "context" pointer in the auth key must thenceforth be done
with the auth key semaphore held. The revocation method is called with the
target key semaphore held write-locked and the search of the context
process's keyrings is done with the auth key semaphore read-locked.
The check for the revocation state of the auth key just prior to searching
it is done after the auth key is read-locked for the search. This ensures
that the auth key can't be revoked between the check and the search.
The revocation notification method is added so that the context task_struct
can be released as soon as instantiation happens rather than waiting for
the auth key to be destroyed, thus avoiding the unnecessary pinning of the
requesting process.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce SELinux hooks to support the access key retention subsystem
within the kernel. Incorporate new flask headers from a modified version
of the SELinux reference policy, with support for the new security class
representing retained keys. Extend the "key_alloc" security hook with a
task parameter representing the intended ownership context for the key
being allocated. Attach security information to root's default keyrings
within the SELinux initialization routine.
Has passed David's testsuite.
Signed-off-by: Michael LeMay <mdlemay@epoch.ncsc.mil>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>