Down the road we want to eliminate the use of the global kernel lock entirely
from the NFS client. To do this, we need to protect the fields in the
nfs_inode structure adequately. Start by serializing updates to the
"cache_validity" field.
Note this change addresses an SMP hang found by njw@osdl.org, where processes
deadlock because nfs_end_data_update and nfs_revalidate_mapping update the
"cache_validity" field without proper serialization.
Test plan:
Millions of fsx ops on SMP clients. Run Nick Wilson's breaknfs program on
large SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce atomic bitops to manipulate the bits in the nfs_inode structure's
"flags" field.
Using bitops means we can use a generic wait_on_bit call instead of an ad hoc
locking scheme in fs/nfs/inode.c, so we can remove the "nfs_i_wait" field from
nfs_inode at the same time.
The other new flags field will continue to use bitmask and logic AND and OR.
This permits several flags to be set at the same time efficiently. The
following patch adds a spin lock to protect these flags, and this spin lock
will later cover other fields in the nfs_inode structure, amortizing the cost
of using this type of serialization.
Test plan:
Millions of fsx ops on SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Certain bits in nfsi->flags can be manipulated with atomic bitops, and some
are better manipulated via logical bitmask operations.
This patch splits the flags field into two. The next patch introduces atomic
bitops for one of the fields.
Test plan:
Millions of fsx ops on SMP clients.
Signed-off-by: Chuck Lever <cel@netapp.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
On the 6700/6702 PXH part, a MSI may get corrupted if an ACPI hotplug
driver and SHPC driver in MSI mode are used together.
This patch will prevent MSI from being enabled for the SHPC as part of
an early pci quirk, as well as on any pci device which sets the no_msi
bit.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When the client performs an exclusive create and opens the file for writing,
a Netapp filer will first create the file using the mode 01777. It does this
since an NFSv3/v4 exclusive create cannot immediately set the mode bits.
The 01777 mode then gets put into the inode->i_mode. After the file creation
is successful, we then do a setattr to change the mode to the correct value
(as per the NFS spec).
The problem is that nfs_refresh_inode() no longer updates inode->i_mode, so
the latter retains the 01777 mode. A bit later, the VFS notices this, and calls
remove_suid(). This of course now resets the file mode to inode->i_mode & 0777.
Hey presto, the file mode on the server is now magically changed to 0777. Duh...
Fixes http://bugzilla.linux-nfs.org/show_bug.cgi?id=32
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds a MOVE_SELF event to inotify. It is sent whenever the inode
you are watching is moved. We need this event so that we can catch
something like this:
- app1:
watch /etc/mtab
- app2:
cp /etc/mtab /tmp/mtab-work
mv /etc/mtab /etc/mtab~
mv /tmp/mtab-work /etc/mtab
app1 still thinks it's watching /etc/mtab but it's actually watching
/etc/mtab~.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I recently tried to construct a totally generic transport class and
found there were certain features missing from the current abstract
transport class. Most notable is that you have to hang the data on the
class_device but most of the API is framed in terms of the generic
device, not the class_device.
These changes are two fold
- Provide the class_device to all of the setup and configure APIs
- Provide and extra API to take the device and the attribute class and
return the corresponding class_device
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This fixes a race during initialization with the NAPI softirq
processing by using an RCU approach.
This race was discovered when refill_skbs() was added to
the setup code.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add limited retry logic to netpoll_send_skb
Each time we attempt to send, decrement our per-device retry counter.
On every successful send, we reset the counter.
We delay 50us between attempts with up to 20000 retries for a total of
1 second. After we've exhausted our retries, subsequent failed
attempts will try only once until reset by success.
Signed-off-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are many instances of
skb->protocol = htons(ETH_P_*);
skb->protocol = __constant_htons(ETH_P_*);
and
skb->protocol = *_type_trans(...);
Most of *_type_trans() are already endian-annotated, so, let's shift
attention on other warnings.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Move hwif_to_node to ide.h
2. Use hwif_to_node in ide-disk.c
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes the now unused fsnotify_unlink & fsnotify_rmdir code.
Compile tested.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Revert commit fec59a711e, which is
breaking sparc64 that doesn't have a working pci_update_resource.
We'll re-do this after 2.6.13 when we'll do it all properly.
NETLINK_ARPD is unused, allocate it to the Open-iSCSI folks.
NETLINK_ROUTE6 and NETLINK_TAPBASE are no longer used, delete
them.
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch below unhooks fsnotify from vfs_unlink & vfs_rmdir. It
introduces two new fsnotify calls, that are hooked in at the dcache
level. This not only more closely matches how the VFS layer works, it
also avoids the problem with locking and inode lifetimes.
The two functions are
- fsnotify_nameremove -- called when a directory entry is going away.
It notifies the PARENT of the deletion. This is called from
d_delete().
- inoderemove -- called when the files inode itself is going away. It
notifies the inode that is being deleted. This is called from
dentry_iput().
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
sparc can not include linux/pagemap.h because of the following circular
dependency:
asm-sparc/pgtable include linux/swap.h
linux/swap.h include now linux/pagemap.h
linux/pagemap.h include linux/mm.h
linux/mm.h include asm/pgtable.h
It needs to have the swp_entry_t type fully visible in pgtable.h,
we can't work around this using macros.
Signed-off-by: Olaf Hering <olh@suse.de>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It's not the real deflateBound() in newer zlib libraries, partly because
the upcoming usage of it won't have the "stream" available, so we can't
have the same interfaces anyway.
My patch in commit fa72b903f7 incorrectly
removed blk_queue_tag->real_max_depth.
The original resize implementation was incorrect in the following
points.
* actual allocation size of tag_index was shorter than real_max_size,
but assumed to be of the same size, possibly causing memory access
beyond the allocated area.
* bits in tag_map between max_deptn and real_max_depth were
initialized to 1's, making the tags permanently reserved.
In an attempt to fix above two bugs, I had removed allocation optimization
in init_tag_map and real_max_size. Tag map/index were allocated and freed
immediately during resize.
Unfortunately, I wasn't considering that tag map/index can be resized
dynamically with tags beyond new_depth active. This led to accessing
freed area after shrinking tags and led to the following bug reporting
thread on linux-scsi.
http://marc.theaimsgroup.com/?l=linux-scsi&m=112319898111885&w=2
To fix the problem, I've revived real_max_depth without allocation
optimization in init_tag_map, and Andrew Vasquez confirmed that the
problem was fixed. As Jens is not going to be available for a week, he
asked me to make sure that this patch reaches you.
http://marc.theaimsgroup.com/?l=linux-scsi&m=112325778530886&w=2
Also, a comment was added to make sure that real_max_size is needed for
dynamic shrinking.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This avoids the whole #ifdef mess by just getting a copy of
dentry->d_inode before d_delete is called - that makes the codepaths the
same for the INOTIFY/DNOTIFY cases as for the regular no-notify case.
I've been running this under a Gnome session for the last 10 minutes.
Inotify is being used extensively.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some PCI devices (e.g. 3c905B, 3c556B) lose all configuration
(including BARs) when transitioning from D3hot->D0. This leaves such
a device in an inaccessible state. The patch below causes the BARs
to be restored when enabling such a device, so that its driver will
be able to access it.
The patch also adds pci_restore_bars as a new global symbol, and adds a
correpsonding EXPORT_SYMBOL_GPL for that.
Some firmware (e.g. Thinkpad T21) leaves devices in D3hot after a
(re)boot. Most drivers call pci_enable_device very early, so devices
left in D3hot that lose configuration during the D3hot->D0 transition
will be inaccessible to their drivers.
Drivers could be modified to account for this, but it would
be difficult to know which drivers need modification. This is
especially true since often many devices are covered by the same
driver. It likely would be necessary to replicate code across dozens
of drivers.
The patch below should trigger only when transitioning from D3hot->D0
(or at boot), and only for devices that have the "no soft reset" bit
cleared in the PM control register. I believe it is safe to include
this patch as part of the PCI infrastructure.
The cleanest implementation of pci_restore_bars was to call
pci_update_resource. Unfortunately, that does not currently exist
for the sparc64 architecture. The patch below includes a null
implemenation of pci_update_resource for sparc64.
Some have expressed interest in making general use of the the
pci_restore_bars function, so that has been exported to GPL licensed
modules.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Current acpi_register_gsi() function has no way to indicate errors to its
callers even though acpi_register_gsi() can fail to register gsi because of
some reasons (out of memory, lack of interrupt vectors, incorrect BIOS, and so
on). As a result, caller of acpi_register_gsi() cannot handle the case that
acpi_register_gsi() fails. I think failure of acpi_register_gsi() should be
handled properly.
This series of patches changes acpi_register_gsi() to return negative value on
error, and also changes callers of acpi_register_gsi() to handle failure of
acpi_register_gsi().
This patch changes the type of return value of acpi_register_gsi() from
"unsigned int" to "int" to indicate an error. If acpi_register_gsi() fails to
register gsi, it returns negative value.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
The recent change to never ignore the bitmap, revealed that the bitmap isn't
begin flushed properly when an array is stopped.
We call bitmap_daemon_work three times as there is a three-stage pipeline for
flushing updates to the bitmap file.
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Checking pte_dirty instead of pte_write in __follow_page is problematic
for s390, and for copy_one_pte which leaves dirty when clearing write.
So revert __follow_page to check pte_write as before, and make
do_wp_page pass back a special extra VM_FAULT_WRITE bit to say it has
done its full job: once get_user_pages receives this value, it no longer
requires pte_write in __follow_page.
But most callers of handle_mm_fault, in the various architectures, have
switch statements which do not expect this new case. To avoid changing
them all in a hurry, make an inline wrapper function (using the old
name) that masks off the new bit, and use the extended interface with
double underscores.
Yes, we do have a call to do_wp_page from do_swap_page, but no need to
change that: in rare case it's needed, another do_wp_page will follow.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
[ Cleanups by Nick Piggin ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We don't want these to be global functions.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When a file is moved over an existing file that you are watching,
inotify won't send you a DELETE_SELF event and it won't unref the inode
until the inotify instance is closed by the application.
Signed-off-by: John McCutchan <ttb@tentacle.dhs.org>
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ethernet drivers to remain as ignorant as is reasonable of the connected
PHY's design and operation details.
Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Add reference count and disable ACPI PCI Interrupt Link
when no device still uses it.
Warn when drivers have not released Link at suspend time.
http://bugzilla.kernel.org/show_bug.cgi?id=3469
Signed-off-by: David Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
In the patch from:
http://www.uwsg.iu.edu/hypermail/linux/kernel/0506.3/0985.html
Is the the following line suppose inside the if CONFIG_PCI=n
#define pci_dma_burst_advice(pdev, strat, strategy_parameter) do { } while (0)
Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It may shut up gcc, but it also incorrectly changes the semantics of the
smp_call_function() helpers.
You can fix the warning other ways if you are interested (create another
inline function that takes no arguments and returns zero), but
preferably gcc just shouldn't complain about unused return values from
statement expressions in the first place.
Apparently gcc 4.0 complains about "({ 0; });", which leads to -Werror
breakage in one of the alpha oprofile modules.
One might could argue that this is a gcc bug, in that statement-expressions
should be considered to be function-like rather than statement-like for the
purposes of this warning. But it's just as easy to use an inline function
in the first place, side-stepping the issue.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use tabs for formatting like anywhere else in this file.
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
turn many #if $undefined_string into #ifdef $undefined_string to fix some
warnings after -Wno-def was added to global CFLAGS
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The cache parameter to mb_cache_shrink isn't used. We may as well remove
it.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I believe that there is a problem with the handling of POSIX locks, which
the attached patch should address.
The problem appears to be a race between fcntl(2) and close(2). A
multithreaded application could close a file descriptor at the same time as
it is trying to acquire a lock using the same file descriptor. I would
suggest that that multithreaded application is not providing the proper
synchronization for itself, but the OS should still behave correctly.
SUS3 (Single UNIX Specification Version 3, read: POSIX) indicates that when
a file descriptor is closed, that all POSIX locks on the file, owned by the
process which closed the file descriptor, should be released.
The trick here is when those locks are released. The current code releases
all locks which exist when close is processing, but any locks in progress
are handled when the last reference to the open file is released.
There are three cases to consider.
One is the simple case, a multithreaded (mt) process has a file open and
races to close it and acquire a lock on it. In this case, the close will
release one reference to the open file and when the fcntl is done, it will
release the other reference. For this situation, no locks should exist on
the file when both the close and fcntl operations are done. The current
system will handle this case because the last reference to the open file is
being released.
The second case is when the mt process has dup(2)'d the file descriptor.
The close will release one reference to the file and the fcntl, when done,
will release another, but there will still be at least one more reference
to the open file. One could argue that the existence of a lock on the file
after the close has completed is okay, because it was acquired after the
close operation and there is still a way for the application to release the
lock on the file, using an existing file descriptor.
The third case is when the mt process has forked, after opening the file
and either before or after becoming an mt process. In this case, each
process would hold a reference to the open file. For each process, this
degenerates to first case above. However, the lock continues to exist
until both processes have released their references to the open file. This
lock could block other lock requests.
The changes to release the lock when the last reference to the open file
aren't quite right because they would allow the lock to exist as long as
there was a reference to the open file. This is too long.
The new proposed solution is to add support in the fcntl code path to
detect a race with close and then to release the lock which was just
acquired when such as race is detected. This causes locks to be released
in a timely fashion and for the system to conform to the POSIX semantic
specification.
This was tested by instrumenting a kernel to detect the handling locks and
then running a program which generates case #3 above. A dangling lock
could be reliably generated. When the changes to detect the close/fcntl
race were added, a dangling lock could no longer be generated.
Cc: Matthew Wilcox <willy@debian.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split spin lock and r/w lock implementation into a single try which is done
inline and an out of line function that repeatedly tries to get the lock
before doing the cpu_relax(). Add a system control to set the number of
retries before a cpu is yielded.
The reason for the spin lock retry is that the diagnose 0x44 that is used to
give up the virtual cpu is quite expensive. For spin locks that are held only
for a short period of time the costs of the diagnoses outweights the savings
for spin locks that are held for a longer timer. The default retry count is
1000.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Attached patch removes #ifdef CONFIG_WATCHDOG_NOWAYOUT mess duplicated in
almost every watchdog driver and replaces it with common define in
linux/watchdog.h.
Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for the Olitec ISDN PCI card in the hisax gazel
driver. The gazel driver supports this card, but wasn't aware of its PCI
ids. Users used to modify the PCI ids of a supported card in
include/linux/pci_ids.h and recompile their kernel to get this card
running, as said in most Howtos. This patch makes the hisax gazel driver
recognize the PCI ids of the Olitec ISDN PCI card.
Signed-off-by: Olivier Blin <oblin@mandriva.com>
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
One chunk was lost somewhere between my and Andrew's machine.
Noticed by Victor Fusco.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert parport_serial to use the new 8250_pci interface, converting
the table to a pciserial_board table. This also unuses the SPCI_*
definitions in serialP.h, which can now be removed.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Re-jig the setup/removal/suspend/resume of 8250 pci ports so that they
know slightly less about how they're attached to a PCI device. Expose
this as the new interface for registering PCI serial ports, as well as
the pciserial_board structure and associated flag definitions.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When the kernel is working well and we want to restart cleanly
kernel_restart is the function to use. But in many instances
the kernel wants to reboot when thing are expected to be working
very badly such as from panic or a software watchdog handler.
This patch adds the function emergency_restart() so that
callers can be clear what semantics they expect when calling
restart. emergency_restart() is expected to be callable
from interrupt context and possibly reliable in even more
trying circumstances.
This is an initial generic implementation for all architectures.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Because the factors of sys_reboot don't exist people calling
into the reboot path duplicate the code badly, leading to
inconsistent expectations of code in the reboot path.
This patch should is just code motion.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add 5780 PCI IDs, chip IDs, and other basic support.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
More unusable TCF_META_* match types that need to get eliminated
before 2.6.13 goes out the door.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
It won't exist any longer when we shrink the SKB in 2.6.14,
and we should kill this off before anyone in userspace starts
using it.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
If a connection tracking helper tells us to expect a connection, and
we're already expecting that connection, we simply free the one they
gave us and return success.
The problem is that NAT helpers (eg. FTP) have to allocate the
expectation first (to see what port is available) then rewrite the
packet. If that rewrite fails, they try to remove the expectation,
but it was freed in ip_conntrack_expect_related.
This is one example of a larger problem: having registered the
expectation, the pointer is no longer ours to use. Reference counting
is needed for ctnetlink anyway, so introduce it now.
To have a single "put" path, we need to grab the reference to the
connection on creation, rather than open-coding it in the caller.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Victor Fusco <victor@cetuc.puc-rio.br>
Fix the sparse warning "implicit cast to nocast type"
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf states:
> I used to mark such ids as obsolete in the header but since
> skb is on diet anyway and there has been no official
> iproute2 release with the ematch bits included it might be
> a better idea to remove the ids from the header completely.
> Those that have picked up my patch on netdev shouldn't care
> about a ABI breakage, actually I doubt that someone is using
> it already.
So here's the patch to remove it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for SIIG Quartet Serial card. This card has Oxford
Semiconducor 16954 quad UART which is clocked by 10x faster
(18.432 MHz) quartz.
Signed-off-by: Andrey Panin <pazke@donpac.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason.
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
I think it's about time to make the build a little more vocal about the
expiry of these functions. Due to recent discussions with problems in
the console initialisation vs power manglement, I'd like to move the
date forward to September.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We need to be careful differentiating between a resync of a complete array,
in which we can clear the bitmap, and a resync of a degraded array, in
which we cannot.
This patch cleans all that up.
Cc: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add pci ids for new ISP types.
Move old definitions in local qla_def.h file to pci_ids.h as
well.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add special case for the POSIX_FADV_DONTNEED and POSIX_FADV_NOREUSE hint
values for s390-64. The user space values in the s390-64 glibc headers for
these two defines have always been 6 and 7 instead of 4 and 5. All 64 bit
applications therefore use the "wrong" values. To get these applications
working without recompiling the kernel needs to accept the "wrong" values.
Since the values for s390-31 are 4 and 5 the compat wrapper for fadvise64
and fadvise64_64 need to rewrite the values for 31 bit system calls.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Something has changed in the core kernel such that we now get concurrent
inode write outs, one e.g via pdflush and one via sys_sync or whatever.
This causes a nasty deadlock in ntfs. The only clean solution
unfortunately requires a minor vfs api extension.
First the deadlock analysis:
Prerequisive knowledge: NTFS has a file $MFT (inode 0) loaded at mount
time. The NTFS driver uses the page cache for storing the file contents as
usual. More interestingly this file contains the table of on-disk inodes
as a sequence of MFT_RECORDs. Thus NTFS driver accesses the on-disk inodes
by accessing the MFT_RECORDs in the page cache pages of the loaded inode
$MFT.
The situation: VFS inode X on a mounted ntfs volume is dirty. For same
inode X, the ntfs_inode is dirty and thus corresponding on-disk inode,
which is as explained above in a dirty PAGE_CACHE_PAGE belonging to the
table of inodes ($MFT, inode 0).
What happens:
Process 1: sys_sync()/umount()/whatever... calls __sync_single_inode() for
$MFT -> do_writepages() -> write_page for the dirty page containing the
on-disk inode X, the page is now locked -> ntfs_write_mst_block() which
clears PageUptodate() on the page to prevent anyone else getting hold of it
whilst it does the write out (this is necessary as the on-disk inode needs
"fixups" applied before the write to disk which are removed again after the
write and PageUptodate is then set again). It then analyses the page
looking for dirty on-disk inodes and when it finds one it calls
ntfs_may_write_mft_record() to see if it is safe to write this on-disk
inode. This then calls ilookup5() to check if the corresponding VFS inode
is in icache(). This in turn calls ifind() which waits on the inode lock
via wait_on_inode whilst holding the global inode_lock.
Process 2: pdflush results in a call to __sync_single_inode for the same
VFS inode X on the ntfs volume. This locks the inode (I_LOCK) then calls
write-inode -> ntfs_write_inode -> map_mft_record() -> read_cache_page() of
the page (in page cache of table of inodes $MFT, inode 0) containing the
on-disk inode. This page has PageUptodate() clear because of Process 1
(see above) so read_cache_page() blocks when tries to take the page lock
for the page so it can call ntfs_read_page().
Thus Process 1 is holding the page lock on the page containing the on-disk
inode X and it is waiting on the inode X to be unlocked in ifind() so it
can write the page out and then unlock the page.
And Process 2 is holding the inode lock on inode X and is waiting for the
page to be unlocked so it can call ntfs_readpage() or discover that
Process 1 set PageUptodate() again and use the page.
Thus we have a deadlock due to ifind() waiting on the inode lock.
The only sensible solution: NTFS does not care whether the VFS inode is
locked or not when it calls ilookup5() (it doesn't use the VFS inode at
all, it just uses it to find the corresponding ntfs_inode which is of
course attached to the VFS inode (both are one single struct); and it uses
the ntfs_inode which is subject to its own locking so I_LOCK is irrelevant)
hence we want a modified ilookup5_nowait() which is the same as ilookup5()
but it does not wait on the inode lock.
Without such functionality I would have to keep my own ntfs_inode cache in
the NTFS driver just so I can find ntfs_inodes independent of their VFS
inodes which would be slow, memory and cpu cycle wasting, and incredibly
stupid given the icache already exists in the VFS.
Below is a patch that does the ilookup5_nowait() implementation in
fs/inode.c and exports it.
ilookup5_nowait.diff:
Introduce ilookup5_nowait() which is basically the same as ilookup5() but
it does not wait on the inode's lock (i.e. it omits the wait_on_inode()
done in ifind()).
This is needed to avoid a nasty deadlock in NTFS.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This rearranges the event ordering for "open" to be consistent with the
ordering of the other events.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This moves the inotify sysctl knobs to "/proc/sys/fs/inotify" from
"/proc/sys/fs". Also some related cleanup.
Signed-off-by: Robert Love <rml@novell.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
inotify is intended to correct the deficiencies of dnotify, particularly
its inability to scale and its terrible user interface:
* dnotify requires the opening of one fd per each directory
that you intend to watch. This quickly results in too many
open files and pins removable media, preventing unmount.
* dnotify is directory-based. You only learn about changes to
directories. Sure, a change to a file in a directory affects
the directory, but you are then forced to keep a cache of
stat structures.
* dnotify's interface to user-space is awful. Signals?
inotify provides a more usable, simple, powerful solution to file change
notification:
* inotify's interface is a system call that returns a fd, not SIGIO.
You get a single fd, which is select()-able.
* inotify has an event that says "the filesystem that the item
you were watching is on was unmounted."
* inotify can watch directories or files.
Inotify is currently used by Beagle (a desktop search infrastructure),
Gamin (a FAM replacement), and other projects.
See Documentation/filesystems/inotify.txt.
Signed-off-by: Robert Love <rml@novell.com>
Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This was a pure indentation change, using:
scripts/Lindent fs/reiserfs/*.c include/linux/reiserfs_*.h
to make reiserfs match the regular Linux indentation style. As Jeff
Mahoney <jeffm@suse.com> writes:
The ReiserFS code is a mix of a number of different coding styles, sometimes
different even from line-to-line. Since the code has been relatively stable
for quite some time and there are few outstanding patches to be applied, it
is time to reformat the code to conform to the Linux style standard outlined
in Documentation/CodingStyle.
This patch contains the result of running scripts/Lindent against
fs/reiserfs/*.c and include/linux/reiserfs_*.h. There are places where the
code can be made to look better, but I'd rather keep those patches separate
so that there isn't a subtle by-hand hand accident in the middle of a huge
patch. To be clear: This patch is reformatting *only*.
A number of patches may follow that continue to make the code more consistent
with the Linux coding style.
Hans wasn't particularly enthusiastic about these patches, but said he
wouldn't really oppose them either.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
free_pages_and_swap_cache() and free_page_and_swap_cache() use release_pages()
and page_cache_release() respectively, so make sure that we have the
declarations in scope.
Cc: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix a problem with ext3 mount option parsing. When remount of a filesystem
fails, old options are now restored.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
kernel/power/disk.c needs a declaration of name_to_dev_t() in scope. mount.h
seems like an appropriate choice.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
tr_type_trans(), hippi_type_trans() left as-is.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds another CDC descriptor type to <linux/usb_cdc.h>; the main claim
to fame for this is that some Motorola phones include it. It's not currently
needed by any driver code; included for completeness.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg,
This patch fixes the kmalloc() flags argument type in USB
subsystem; hopefully all of its occurences. The patch was
made against patch-2.6.12-git2 from Jun 20.
Cleanup of flags for kmalloc() in USB subsystem.
Signed-off-by: Olav Kongas <ok@artecdesign.ee>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
http://bugme.osdl.org/show_bug.cgi?id=4016
Written-by: David Shaohua Li <shaohua.li@intel.com>
Acked-by: Adam Belay <abelay@novell.com>
Signed-off-by: Len Brown <len.brown@intel.com>
ACPI 3.0 added a Correctable Platform Error Interrupt (CPEI)
Processor Overide flag to MADT.Platform_Interrupt_Source.
Record the processor that was provided as hint from ACPI.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Implement the framework for binding physical devices
with ACPI devices. A physical bus like PCI bus
should create a 'acpi_bus_type', with:
.find_device:
For device which has parent such as normal PCI devices.
.find_bridge:
It's for special devices, such as PCI root bridge
or IDE controller. Such devices generally haven't a
parent or ->bus. We use the special method
to get an ACPI handle.
Uses new field in struct device: firmware_data
http://bugzilla.kernel.org/show_bug.cgi?id=4277
Signed-off-by: David Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Register an "acpi" system device to be notified of shutdown preparation.
This depends on CONFIG_PM
http://bugzilla.kernel.org/show_bug.cgi?id=4041
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch corrects a few problems with the IP_ADD_MEMBERSHIP
socket option:
1) The existing code makes an attempt at reference counting joins when
using the ip_mreqn/imr_ifindex interface. Joining the same group
on the same socket is an error, whatever the API. This leads to
unexpected results when mixing ip_mreqn by index with ip_mreqn by
address, ip_mreq, or other API's. For example, ip_mreq followed by
ip_mreqn of the same group will "work" while the same two reversed
will not.
Fixed to always return EADDRINUSE on a duplicate join and
removed the (now unused) reference count in ip_mc_socklist.
2) The group-search list in ip_mc_join_group() is comparing a full
ip_mreqn structure and all of it must match for it to find the
group. This doesn't correctly match a group that was joined with
ip_mreq or ip_mreqn with an address (with or without an index). It
also doesn't match groups that are joined by different addresses on
the same interface. All of these are the same multicast group,
which is identified by group address and interface index.
Fixed the check to correctly match groups so we don't get
duplicate group entries on the ip_mc_socklist.
3) The old code allocates a multicast address before searching for
duplicates requiring it to free in various error cases. This
patch moves the allocate until after the search and
igmp_max_memberships check, so never a need to allocate, then free
an entry.
Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Victor Fusco <victor@cetuc.puc-rio.br>
Fix the sparse warning "implicit cast to nocast type"
Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We shouldn't be allowing, e.g., write locks on files not open for read. To
enforce this, we add a pointer from the lock stateid back to the open stateid
it came from, so that the check will continue to be correct even after the
open is upgraded or downgraded.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
from RFC 3530:
"Share reservations are established by OPEN operations and by their
nature are mandatory in that when the OPEN denies READ or WRITE
operations, that denial results in such operations being rejected
with error NFS4ERR_LOCKED."
(Note that share_denied is really only a legal error for OPEN.)
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add some comments on the use of so_seqid, in an attempt to avoid some of the
confusion outlined in the previous patch....
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We need to fsync the recovery directory after writing to it, but we weren't
doing this correctly. (For example, we weren't taking the i_sem when calling
->fsync().)
Just reuse the existing nfsd fsync code instead.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch renames _mntput() to something a little more descriptive:
mntput_no_expire().
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch renames vfsmount->mnt_fslink to something a little more
descriptive: vfsmount->mnt_expire.
Signed-off-by: Mike Waychison <michael.waychison@sun.com>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a race found by Ram in mark_mounts_for_expiry() in
fs/namespace.c.
The bug can only be triggered with simultaneous exiting of a process having
a private namespace, and expiry of a mount from within that namespace.
It's practically impossible to trigger, and I haven't even tried. But
still, a bug is a bug.
The race happens when put_namespace() is called by another task, while
mark_mounts_for_expiry() is between atomic_read() and get_namespace(). In
that case get_namespace() will be called on an already dead namespace with
unforeseeable results.
The solution was suggested by Al Viro, with his own words:
Instead of screwing with atomic_read() in there, why don't we
simply do the following:
a) atomic_dec_and_lock() in put_namespace()
b) __put_namespace() called without dropping lock
c) the first thing done by __put_namespace would be
struct vfsmount *root = namespace->root;
namespace->root = NULL;
spin_unlock(...);
....
umount_tree(root);
...
d) check in mark_... would be simply namespace && namespace->root.
And we are all set; no screwing around with atomic_read(), no magic
at all. Dying namespace gets NULL ->root.
All changes of ->root happen under spinlock.
If under a spinlock we see non-NULL ->mnt_namespace, it won't be
freed until we drop the lock (we will set ->mnt_namespace to NULL
under that lock before we get to freeing namespace).
If under a spinlock we see non-NULL ->mnt_namespace and
->mnt_namespace->root, we can grab a reference to namespace and be
sure that it won't go away.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new section called ".data.read_mostly" for data items that are read
frequently and rarely written to like cpumaps etc.
If these maps are placed in the .data section then these frequenly read
items may end up in cachelines with data is is frequently updated. In that
case all processors in an SMP system must needlessly reload the cachelines
again and again containing elements of those frequently used variables.
The ability to share these cachelines will allow each cpu in an SMP system
to keep local copies of those shared cachelines thereby optimizing
performance.
Signed-off-by: Alok N Kataria <alokk@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Christoph Lameter <christoph@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use a bit spin lock in the first buffer of the page to synchronise asynch
IO buffer completions, instead of the global page_uptodate_lock, which is
showing some scalabilty problems.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Fix u32 vs pm_message_t confusion in cpufreq.
Signed-off-by: Bernard Blackham <bernard@blackham.com.au>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Make ioprio syscalls return long, like set/getpriority syscalls.
- Move function prototypes into syscalls.h so we can pick them up in the
32/64bit compat code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
OCFS2 wants to mark an inode which has been orphaned by another node so
that during final iput it takes the correct path through the VFS and can
pass through the OCFS2 delete_inode callback. Since i_nlink can get out of
date with other nodes, the best way I see to accomplish this is by clearing
i_nlink on those inodes at drop_inode time. Other than this small amount
of work, nothing different needs to happen, so I think it would be cleanest
to be able to just call generic_drop_inode at the end of the OCFS2
drop_inode callback.
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch ensures that cit_iv is aligned according to cra_alignmask
by allocating it as part of the tfm structure. As a side effect the
crypto layer will also guarantee that the tfm ctx area has enough space
to be aligned by cra_alignmask. This allows us to remove the extra
space reservation from the Padlock driver.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VIA Padlock device requires the input and output buffers to
be aligned on 16-byte boundaries. This patch adds the alignmask
attribute for low-level cipher implementations to indicate their
alignment requirements.
The mid-level crypt() function will copy the input/output buffers
if they are not aligned correctly before they are passed to the
low-level implementation.
Strictly speaking, some of the software implementations require
the buffers to be aligned on 4-byte boundaries as they do 32-bit
loads. However, it is not clear whether it is better to copy
the buffers or pay the penalty for unaligned loads/stores.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds hooks for cipher algorithms to implement multi-block
ECB/CBC operations directly. This is expected to provide significant
performance boots to the VIA Padlock.
It could also be used for improving software implementations such as
AES where operating on multiple blocks at a time may enable certain
optimisations.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This converts the usage of struct of_match to struct of_device_id,
similar to pci_device_id. This allows a device table to be generated,
which can be parsed by depmod(8) to generate a map file for module
loading.
In order for hotplug to work with macio devices, patches to
module-init-tools and hotplug must be applied. Those patches are
available at:
ftp://ftp.suse.com/pub/people/jeffm/linux/macio-hotplug/
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following renames arch_init, a kprobes function for performing any
architecture specific initialization, to arch_init_kprobes in order to
cleanup the namespace.
Also, this patch adds arch_init_kprobes to sparc64 to fix the sparc64 kprobes
build from the last return probe patch.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make TSO segment transmit size decisions at send time not earlier.
The basic scheme is that we try to build as large a TSO frame as
possible when pulling in the user data, but the size of the TSO frame
output to the card is determined at transmit time.
This is guided by tp->xmit_size_goal. It is always set to a multiple
of MSS and tells sendmsg/sendpage how large an SKB to try and build.
Later, tcp_write_xmit() and tcp_push_one() chop up the packet if
necessary and conditions warrant. These routines can also decide to
"defer" in order to wait for more ACKs to arrive and thus allow larger
TSO frames to be emitted.
A general observation is that TSO elongates the pipe, thus requiring a
larger congestion window and larger buffering especially at the sender
side. Therefore, it is important that applications 1) get a large
enough socket send buffer (this is accomplished by our dynamic send
buffer expansion code) 2) do large enough writes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave, you were right and the sleeping locks in shaper were
broken. Markus Kanet noticed this and also tested the patch below that
switches locking to spinlocks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce local_df to a bit field and ip_summed to a 2 bits
field thus saving 13 bits. Move bit fields, packet type,
and protocol into the spare area between the priority
and the destructor. Saves 4 bytes on both, 32bit and
64bit architectures.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the code to load packet data into a register:
k = fentry->k;
if (k < 0) {
...
} else {
u32 _tmp, *p;
p = skb_header_pointer(skb, k, 4, &_tmp);
if (p != NULL) {
A = ntohl(*p);
continue;
}
}
skb_header_pointer checks if the requested data is within the
linear area:
int hlen = skb_headlen(skb);
if (offset + len <= hlen)
return skb->data + offset;
When offset is within [INT_MAX-len+1..INT_MAX] the addition will
result in a negative number which is <= hlen.
I couldn't trigger a crash on my AMD64 with 2GB of memory, but a
coworker tried on his x86 machine and it crashed immediately.
This patch fixes the check in skb_header_pointer to handle large
positive offsets similar to skb_copy_bits. Invalid data can still
be accessed using negative offsets (also similar to skb_copy_bits),
anyone using negative offsets needs to verify them himself.
Thanks to Thomas Vgtle <thomas.voegtle@coreworks.de> for verifying the
problem by crashing his machine and providing me with an Oops.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch adds some ioctls to include/linux/compat_ioctl.h
to allow using ppdev from the 32 bit user space on sparc64.
This patch also adds the PPDEV option in the sparc64 menu, near Parallel
printer support in the 'General machine setup' submenu.
All those ioctls seem to be compatible, since (correct me if I'm wrong)
they dont use the 'long' type. See include/linux/ppdev.h.
The application I used to test the new ioctls only used the following:
PPEXCL
PPCLAIM
PPNEGOT
PPGETMODES
PPRCONTROL
PPWCONTROL
PPDATADIR
PPWDATA
PPRDATA
But I beleive that the other ioctls will work fine.
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Rob Punkunus <rpunkunus@nvidia.com>
Rob Punkunus recently submitted a patch to enable support for MCP51/MCP55 in
the amd74xx driver. This patch was whitespace-corrupted and didn't apply to
2.6.12 since MCP51 support was merged in the 2.6.12-rc series.
Gentoo would like to support this hardware for our upcoming release media, so
I fixed the patch, and here it is :)
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@elka.pw.edu.pl>
The dynamic pci id logic has been bothering me for a while, and now that
I started to look into how to move some of this to the driver core, I
thought it was time to clean it all up.
It ends up making the code smaller, and easier to follow, and fixes a
few bugs at the same time (dynamic ids were not being matched
everywhere, and so could be missed on some call paths for new devices,
semaphore not needed to be grabbed when adding a new id and calling the
driver core, etc.)
I also renamed the function pci_match_device() to pci_match_id() as
that's what it really does.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch increases the number of resource pointers in the
pci_bus structure. This is needed to store >4 resource ranges
for host bridges and transparent PCI bridges. With this change,
all PCI buses will have more resource pointers, but most PCI
buses will only use the first 3 or 4, the remaining being NULL.
The PCI core already deals with this correctly.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
No one was looking at the return value of bus_rescan_devices, and it
really wasn't anything that anyone in the kernel would ever care about.
So change it which enabled some counting code to be removed also.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add bus_find_device() and driver_find_device() which allow searching for a
device in the bus's resp. the driver's klist and obtain a reference on it.
Signed-off-by: Cornelia Huck <cohuck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The code was wrong in several aspects. The locking order was
inconsistent, the device aquire code did not reset a variable
after a wakeup and the wakeup handling was not working for
applications where multiple chips are sharing a single
hardware controller.
When a hardware controller is available the locking is now
reduced to the hardware controller lock and the waitqueue is
moved to the hardware controller structure in order to avoid
a wake_up_all().
The problem was pointed out by Ben Dooks, who also found the
missing variable reset as main cause for his deadlock problem.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Anyone reporting a stuck IRQ should try these options. Its effectiveness
varies we've found in the Fedora case. Quite a few systems with misdescribed
IRQ routing just work when you use irqpoll. It also fixes up the VIA systems
although thats now fixed with the VIA quirk (which we could just make default
as its what Redmond OS does but Linus didn't like it historically).
A small number of systems have jammed IRQ sources or misdescribes that cause
an IRQ that we have no handler registered anywhere for. In those cases it
doesn't help.
Signed-off-by: Alan Cox <number6@the-village.bc.nu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
get_io_context needlessly turned off interrupts and checked for racing io
context creations. Both of which aren't needed, because the io context can
only be created while in process context of the current process.
Also, split the function in 2. A light version, current_io_context does not
elevate the reference count specifically, but can be used when in process
context, because the process holds a reference itself.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch for usb_ch9.h includes linux/types.h instead of asm/types.h so that
__le16 and so on is explicitly defined. It also cleans up non standard //
comment.
Signed-off-by: GOTO Masanori <gotom@debian.or.jp>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch lets i2c-dev.h include linux/compiler.h so that __user is defined.
Signed-off-by: GOTO Masanori <gotom@debian.or.jp>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Looks like it sneaked back with the NFS ACL merge..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In file included from drivers/media/dvb/ttpci/av7110_hw.c:38:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
In file included from drivers/media/dvb/ttpci/av7110_v4l.c:36:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
In file included from drivers/media/dvb/ttpci/av7110_av.c:37:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
drivers/isdn/icn/icn.c:719:4: warning: #warning TODO test headroom or use skb->nb to flag ACK
In file included from drivers/media/dvb/ttpci/av7110_ca.c:39:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
In file included from drivers/media/dvb/ttpci/av7110.c:41:
include/linux/byteorder/swabb.h:96: warning: type qualifiers ignored on function return type
include/linux/byteorder/swabb.h:110: warning: type qualifiers ignored on function return type
Does declaring a function to return a const value actually mean something to
gcc?
Dunno. Kill it and replace sone `__inline__'s with `inline' too.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
linux/etherdevice.h can't be included standalone at the moment, which
is required in order to sort the header files in the recommended
alphabetic order. This patch fixes that and is needed to build spider_net.
Signed-off-by: Arnd Bergmann <arndb@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove two more unused IPV6_AUTHHDR option things,
which I failed to remove them last time,
plus, mark IPV6_AUTHHDR obsolete.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Plug holes with padding fields and initialized them to zero.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is the first step in properly handling the MCFG PCI table.
It defines the structures properly, and saves off the table so that the
pci mmconfig code can access it. It moves the parsing of the table a
little later in the boot process, but still before the information is
needed.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
With CONFIG_PCI=n:
In file included from include/linux/pci.h:917,
from lib/iomap.c:6:
include/asm/pci.h:104: warning: `enum pci_dma_burst_strategy' declared inside parameter list
include/asm/pci.h:104: warning: its scope is only this definition or declaration, which is probably not what you want.
include/asm/pci.h: In function `pci_dma_burst_advice':
include/asm/pci.h:106: dereferencing pointer to incomplete type
include/asm/pci.h:106: `PCI_DMA_BURST_INFINITY' undeclared (first use in this function)
include/asm/pci.h:106: (Each undeclared identifier is reported only once
include/asm/pci.h:106: for each function it appears in.)
make[1]: *** [lib/iomap.o] Error 1
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
After seeing, at best, "guesses" as to the following kind
of information in several drivers, I decided that we really
need a way for platforms to specifically give advice in this
area for what works best with their PCI controller implementation.
Basically, this new interface gives DMA bursting advice on
PCI. There are three forms of the advice:
1) Burst as much as possible, it is not necessary to end bursts
on some particular boundary for best performance.
2) Burst on some byte count multiple. A DMA burst to some multiple of
number of bytes may be done, but it is important to end the burst
on an exact multiple for best performance.
The best example of this I am aware of are the PPC64 PCI
controllers, where if you end a burst mid-cacheline then
chip has to refetch the data and the IOMMU translations
which hurts performance a lot.
3) Burst on a single byte count multiple. Bursts shall end
exactly on the next multiple boundary for best performance.
Sparc64 and Alpha's PCI controllers operate this way. They
disconnect any device which tries to burst across a cacheline
boundary.
Actually, newer sparc64 PCI controllers do not have this behavior.
That is why the "pdev" is passed into the interface, so I can
add code later to check which PCI controller the system is using
and give advice accordingly.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is an updated version of Ben's fix-pci-mmap-on-ppc-and-ppc64.patch
which is in 2.6.12-rc4-mm1.
It fixes the patch to work on PPC iSeries, removes some debug printks
at Ben's request, and incorporates your
fix-pci-mmap-on-ppc-and-ppc64-fix.patch also.
Originally from Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch was discussed at length on linux-pci and so far, the last
iteration of it didn't raise any comment. It's effect is a nop on
architecture that don't define the new pci_resource_to_user() callback
anyway. It allows architecture like ppc who put weird things inside of
PCI resource structures to convert to some different value for user
visible ones. It also fixes mmap'ing of IO space on those archs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds PCI based I/O xAPIC hot-add support to ACPIPHP
driver. When PCI root bridge is hot-added, all PCI based I/O xAPICs
under the root bridge are hot-added by this patch. Hot-remove support
is TBD.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the following new interfaces for I/O xAPIC
hotplug. The implementation of these interfaces depends on each
architecture.
o int acpi_register_ioapic(acpi_handle handle, u64 phys_addr,
u32 gsi_base);
This new interface is to add a new I/O xAPIC specified by
phys_addr and gsi_base pair. phys_addr is the physical address
to which the I/O xAPIC is mapped and gsi_base is global system
interrupt base of the I/O xAPIC. acpi_register_ioapic returns
0 on success, or negative value on error.
o int acpi_unregister_ioapic(acpi_handle handle, u32 gsi_base);
This new interface is to remove a I/O xAPIC specified by
gsi_base. acpi_unregister_ioapic returns 0 on success, or
negative value on error.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When you hot-plug a (root) bridge hierarchy, it may have p2p bridges and
devices attached to it that have not been configured by firmware. In this
case, we need to configure the devices before starting them. This patch
separates device start from device scan so that we can introduce the
configuration step in the middle.
I kept the existing semantics for pci_scan_bus() since there are a huge number
of callers to that function.
Also, I have no way of testing the changes I made to the parisc files, so this
needs review by those folks. Sorry for the massive cross-post, this touches
files in many different places.
Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The size of pointers may differ between (userspace) modpost and (kernelspace)
modules -- so fix mod_devicetable.h to reflect this possibility.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
If a card doesn't provide _any_ information about itself, assume it is a
so-called "anonymous" card. pcmciamtd will bind to it if it is configured to
do so.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add another match flag for devices needing a CIS override. The driver will
only probe/attach if the CIS has been replaced before.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The actual matching of pcmcia drivers and pcmcia devices. The original
version of this was written by David Woodhouse.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This lets you throw out the iteraid stuff that has ended up back in due
to stupid goings on in the IDE world. Its the same heavily tested code
shipped in Fedora/Red Hat products but without the other dependancies on
the Bartlomiej IDE layer.
Pre-requisite: the ide-disk patch I sent to handle pure LBA devices.
Obviously you lose things like hot unplug with the Bartlomiej IDE layer
at the moment but that won't matter to most users.
The patch does the following
- Add IT8211/12 to pci_ids.h
- Add Makefile/Kconfig entry
- Add it8212 driver
No core IDE code is touched by this diff
Embedded system testing and the ability to force raid mode off by David
Howells
Made possible by the ite reference code, documentation and also several
clarifications and pieces of assistance provided by ITE themselves
Signed-off-by: Alan Cox <alan@redhat.com>
Acked-by: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following is the second version of the function return probe patches
I sent out earlier this week. Changes since my last submission include:
* Fix in ppc64 code removing an unneeded call to re-enable preemption
* Fix a build problem in ia64 when kprobes was turned off
* Added another BUG_ON check to each of the architecture trampoline
handlers
My initial patch description ==>
From my experiences with adding return probes to x86_64 and ia64, and the
feedback on LKML to those patches, I think we can simplify the design
for return probes.
The following patch tweaks the original design such that:
* Instead of storing the stack address in the return probe instance, the
task pointer is stored. This gives us all we need in order to:
- find the correct return probe instance when we enter the trampoline
(even if we are recursing)
- find all left-over return probe instances when the task is going away
This has the side effect of simplifying the implementation since more
work can be done in kernel/kprobes.c since architecture specific knowledge
of the stack layout is no longer required. Specifically, we no longer have:
- arch_get_kprobe_task()
- arch_kprobe_flush_task()
- get_rp_inst_tsk()
- get_rp_inst()
- trampoline_post_handler() <see next bullet>
* Instead of splitting the return probe handling and cleanup logic across
the pre and post trampoline handlers, all the work is pushed into the
pre function (trampoline_probe_handler), and then we skip single stepping
the original function. In this case the original instruction to be single
stepped was just a NOP, and we can do without the extra interruption.
The new flow of events to having a return probe handler execute when a target
function exits is:
* At system initialization time, a kprobe is inserted at the beginning of
kretprobe_trampoline. kernel/kprobes.c use to handle this on it's own,
but ia64 needed to do this a little differently (i.e. a function pointer
is really a pointer to a structure containing the instruction pointer and
a global pointer), so I added the notion of arch_init(), so that
kernel/kprobes.c:init_kprobes() now allows architecture specific
initialization by calling arch_init() before exiting. Each architecture
now registers a kprobe on it's own trampoline function.
* register_kretprobe() will insert a kprobe at the beginning of the targeted
function with the kprobe pre_handler set to arch_prepare_kretprobe
(still no change)
* When the target function is entered, the kprobe is fired, calling
arch_prepare_kretprobe (still no change)
* In arch_prepare_kretprobe() we try to get a free instance and if one is
available then we fill out the instance with a pointer to the return probe,
the original return address, and a pointer to the task structure (instead
of the stack address.) Just like before we change the return address
to the trampoline function and mark the instance as used.
If multiple return probes are registered for a given target function,
then arch_prepare_kretprobe() will get called multiple times for the same
task (since our kprobe implementation is able to handle multiple kprobes
at the same address.) Past the first call to arch_prepare_kretprobe,
we end up with the original address stored in the return probe instance
pointing to our trampoline function. (This is a significant difference
from the original arch_prepare_kretprobe design.)
* Target function executes like normal and then returns to kretprobe_trampoline.
* kprobe inserted on the first instruction of kretprobe_trampoline is fired
and calls trampoline_probe_handler() (no change here)
* trampoline_probe_handler() consumes each of the instances associated with
the current task by calling the registered handler function and marking
the instance as unused until an instance is found that has a return address
different then the trampoline function.
(change similar to my previous ia64 RFC)
* If the task is killed with some left-over return probe instances (meaning
that a target function was entered, but never returned), then we just
free any instances associated with the task. (Not much different other
then we can handle this without calling architecture specific functions.)
There is a known problem that this patch does not yet solve where
registering a return probe flush_old_exec or flush_thread will put us
in a bad state. Most likely the best way to handle this is to not allow
registering return probes on these two functions.
(Significant change)
This patch series applies to the 2.6.12-rc6-mm1 kernel, and provides:
* kernel/kprobes.c changes
* i386 patch of existing return probes implementation
* x86_64 patch of existing return probe implementation
* ia64 implementation
* ppc64 implementation (provided by Ananth)
This patch implements the architecture independant changes for a reworking
of the kprobes based function return probes design. Changes include:
* Removing functions for querying a return probe instance off a stack address
* Removing the stack_addr field from the kretprobe_instance definition,
and adding a task pointer
* Adding architecture specific initialization via arch_init()
* Removing extern definitions for the architecture trampoline functions
(this isn't needed anymore since the architecture handles the
initialization of the kprobe in the return probe trampoline function.)
Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Now that PPC64 has no-execute support, here is a second try to fix the
single step out of line during kprobe execution. Kprobes on x86_64 already
solved this problem by allocating an executable page and using it as the
scratch area for stepping out of line. Reuse that.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is pass 2 of my patch to add pci domain info to an existing ioctl. This
time I insert the domain between dev_fn and board_id as Willy suggested and
change the var to unsigned short to ease Christoph's concerns. Although I
thought unsigned int was the correct var type for this. I also thought it
didn't matter where I inserted it in the structure.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a PCI ID I got wrong before. It also adds support for
another new SAS controller due out this summer. I didn't have a marketing
name prior to my last submission. Also modifies the copyright date range.
Signed-off-by: Mike Miller <mike.miller@hp.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I believe at least for seccomp it's worth to turn off the tsc, not just for
HT but for the L2 cache too. So it's up to you, either you turn it off
completely (which isn't very nice IMHO) or I recommend to apply this below
patch.
This has been tested successfully on x86-64 against current cogito
repository (i686 compiles so I didn't bother testing ;). People selling
the cpu through cpushare may appreciate this bit for a peace of mind.
There's no way to get any timing info anymore with this applied
(gettimeofday is forbidden of course). The seccomp environment is
completely deterministic so it can't be allowed to get timing info, it has
to be deterministic so in the future I can enable a computing mode that
does a parallel computing for each task with server side transparent
checkpointing and verification that the output is the same from all the 2/3
seller computers for each task, without the buyer even noticing (for now
the verification is left to the buyer client side and there's no
checkpointing, since that would require more kernel changes to track the
dirty bits but it'll be easy to extend once the basic mode is finished).
Eliminating a cold-cache read of the cr4 global variable will save one
cacheline during the tlb flush while making the code per-cpu-safe at the
same time. Thanks to Mikael Pettersson for noticing the tlb flush wasn't
per-cpu-safe.
The global tlb flush can run from irq (IPI calling do_flush_tlb_all) but
it'll be transparent to the switch_to code since the IPI won't make any
change to the cr4 contents from the point of view of the interrupted code
and since it's now all per-cpu stuff, it will not race. So no need to
disable irqs in switch_to slow path.
Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch removes CONFIG_PMAC_PBOOK (PowerBook support). This is now
split into CONFIG_PMAC_MEDIABAY for the actual hotswap bay that some
powerbooks have, CONFIG_PM for power management related code, and just left
out of any CONFIG_* option for some generally useful stuff that can be used
on non-laptops as well.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This provides declarations for new requests, descriptors, and bitfields as
defined in the Wireless USB 1.0 spec. Device support will involve a new
"Wire Adapter" device class, connecting a USB Host to a cluster of wireless
USB devices. There will be two adapter types:
* Host Wireless Adapter (HWA): the downstream link is wireless, which
connects a wireless USB host to wireless USB devices (not unlike like
a hub) including to the second type of adapter.
* Device Wireless Adapter (DWA): the upstream link is wireless, for
connecting existing USB devices through wired links into the cluser.
All wireless USB devices will need persistent (and secure!) key storage, and
it's probable that Linux -- or device firmware -- will need to be involved
with that to bootstrap the initial secure key exchange.
Some user interface is required in that initial key exchange, and since the
most "hands-off" one is a wired USB link, I suspect wireless operation will
usually not be the only mode for wireless USB devices. (Plus, devices can
recharge batteries using wired USB...) All other key exchange protocols need
error prone user interactions, like copying and/or verifying keys.
It'll likely be a while before we have commercial Wireless USB hardware,
much less Linux implementations that know how to use it.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This updates most of the gadget framework to expect SETUP packets use
USB byteorder (matching the annotation in <linux/usb_ch9.h> and usage
in the host side stack):
- definition in <linux/usb_gadget.h>
- gadget drivers: Ethernet/RNDIS, serial/ACM, file_storage, gadgetfs.
- dummy_hcd
It also includes some other similar changes as suggested by "sparse",
which was used to detect byteorder bugs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch provides an "isp116x-hcd" driver for Philips'
ISP1160/ISP1161 USB host controllers.
The driver:
- is relatively small, meant for use on embedded platforms.
- runs usbtests 1-14 without problems for days.
- has been in use by 6-7 different people on ARM and PPC platforms,
running a range of devices including USB hubs.
- supports suspend/resume of both the platform device and the root hub;
supports remote wakeup of the root hub (but NOT the platform device)
by USB devices.
- does NOT support ISO transfers (nobody has asked for them).
- is PIO-only.
Signed-off-by: Olav Kongas <ok@artecdesign.ee>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Adjust slice values
- Instead of one async queue, one is defined per priority level. This
prevents kernel threads (such as reiserfs/x and others) that run at
higher io priority from conflicting with others. Previously, it was a
coin toss what io prio the async queue got, it was defined by who
first set up the queue.
- Let a time slice only begin, when the previous slice is completely
done. Previously we could be somewhat unfair to a new sync slice, if
the previous slice was async and had several ios queued. This might
need a little tweaking if throughput suffers a little due to this,
allowing perhaps an overlap of a single request or so.
- Optimize the calling of kblockd_schedule_work() by doing it only when
it is strictly necessary (no requests in driver and work left to do).
- Correct sync vs async logic. A 'normal' process can be purely async as
well, and a flusher can be purely sync as well. Sync or async is now a
property of the class defined and requests pending. Previously writers
could be considered sync, when they were really async.
- Get rid of the bit fields in cfqq and crq, use flags instead.
- Various other cleanups and fixes
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This updates the CFQ io scheduler to the new time sliced design (cfq
v3). It provides full process fairness, while giving excellent
aggregate system throughput even for many competing processes. It
supports io priorities, either inherited from the cpu nice value or set
directly with the ioprio_get/set syscalls. The latter closely mimic
set/getpriority.
This import is based on my latest from -mm.
Signed-off-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add separate files for the different 8250 ISA-based serial boards.
Looking across all the various architectures, it seems reasonable that
we can key the availability of the configuration options for these
beasts to the bus-related symbols (iow, CONFIG_ISA). We also standardise
the base baud/uart clock rate for these boards - I'm sure that isn't
architecture specific, but is solely dependent on the crystal fitted
on the board (which should be the same no matter what type of machine
its fitted into.)
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We're using __be16 in userland visible types, so we
have to include asm/byteorder.h so that works.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for alternate slave selection algorithms to bonding
balance-xor and 802.3ad modes. Default mode (what we have now: xor of
MAC addresses) is "layer2", new choice is "layer3+4", using IP and port
information for hashing to select peer.
Originally submitted by Jason Gabler for balance-xor mode;
modified by Jay Vosburgh to additionally support 802.3ad mode. Jason's
original comment is as follows:
The attached patch to the Linux Etherchannel Bonding driver modifies the
driver's "balance-xor" mode as follows:
- alternate hashing policy support for mode 2
* Added kernel parameter "xmit_policy" to allow the specification
of different hashing policies for mode 2. The original mode 2
policy is the default, now found in xmit_hash_policy_layer2().
* Added xmit_hash_policy_layer34()
This patch was inspired by hashing policies implemented by Cisco,
Foundry and IBM, which are explained in
Foundry documentation found at:
http://www.foundrynet.com/services/documentation/sribcg/Trunking.html#112750
Signed-off-by: Jason Gabler <jygabler@lbl.gov>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
1. Establish a simple API for process freezing defined in linux/include/sched.h:
frozen(process) Check for frozen process
freezing(process) Check if a process is being frozen
freeze(process) Tell a process to freeze (go to refrigerator)
thaw_process(process) Restart process
frozen_process(process) Process is frozen now
2. Remove all references to PF_FREEZE and PF_FROZEN from all
kernel sources except sched.h
3. Fix numerous locations where try_to_freeze is manually done by a driver
4. Remove the argument that is no longer necessary from two function calls.
5. Some whitespace cleanup
6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
cleared before setting PF_FROZEN, recalc_sigpending does not check
PF_FROZEN).
This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following cleanups:
- make needlessly global code static
- remove the following unused global functions:
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
- remove the following unused EXPORT_SYMBOL's:
- blk_phys_contig_segment
- blk_hw_contig_segment
- blkdev_scsi_issue_flush_fn
- __blk_attempt_remerge
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following possible cleanups:
- make the needlessly global function __nvram_set_checksum static
- #if 0 the unused global function nvram_set_checksum
- remove the EXPORT_SYMBOL's for both functions
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch makes use of ALIGN() to remove duplicate round-up code.
Signed-off-by: Nick Wilson <njw@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
o Following patch provides purely cosmetic changes and corrects CodingStyle
guide lines related certain issues like below in kexec related files
o braces for one line "if" statements, "for" loops,
o more than 80 column wide lines,
o No space after "while", "for" and "switch" key words
o Changes:
o take-2: Removed the extra tab before "case" key words.
o take-3: Put operator at the end of line and space before "*/"
Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Makes kexec_crashdump() take a pt_regs * as an argument. This allows to
get exact register state at the point of the crash. If we come from direct
panic assertion NULL will be passed and the current registers saved before
crashdump.
This hooks into two places:
die(): check the conditions under which we will panic when calling
do_exit and go there directly with the pt_regs that caused the fatal
fault.
die_nmi(): If we receive an NMI lockup while in the kernel use the
pt_regs and go directly to crash_kexec(). We're probably nested up badly
at this point so this might be the only chance to escape with proper
information.
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: "Vivek Goyal" <vgoyal@in.ibm.com>
o Support for /proc/vmcore interface. This interface exports elf core image
either in ELF32 or ELF64 format, depending on the format in which elf headers
have been stored by crashed kernel.
o Added support for CONFIG_VMCORE config option.
o Removed the dependency on /proc/kcore.
From: "Eric W. Biederman" <ebiederm@xmission.com>
This patch has been refactored to more closely match the prevailing style in
the affected files. And to clearly indicate the dependency between
/proc/kcore and proc/vmcore.c
From: Hariprasad Nellitheertha <hari@in.ibm.com>
This patch contains the code that provides an ELF format interface to the
previous kernel's memory post kexec reboot.
Signed off by Hariprasad Nellitheertha <hari@in.ibm.com>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds support for retrieving the address of elf core header if one
is passed in command line.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch provides the interfaces necessary to read the dump contents,
treating it as a high memory device.
Signed off by Hariprasad Nellitheertha <hari@in.ibm.com>
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch retrieves the max_pfn being used by previous kernel and stores it
in a safe location (saved_max_pfn) before it is overwritten due to user
defined memory map. This pfn is used to make sure that user does not try to
read the physical memory beyond saved_max_pfn.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add kexec support for s390 architecture.
From: Milton Miller <miltonm@bga.com>
- Fix passing of first argument to relocate_kernel assembly.
- Fix Kconfig description.
- Remove wrong comment and comments that describe obvious things.
- Allow only KEXEC_TYPE_DEFAULT as image type -> dump not supported.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces the architecture independent implementation the
sys_kexec_load, the compat_sys_kexec_load system calls.
Kexec on panic support has been integrated into the core patch and is
relatively clean.
In addition the hopefully architecture independent option
crashkernel=size@location has been docuemented. It's purpose is to reserve
space for the panic kernel to live, and where no DMA transfer will ever be
setup to access.
Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Alexander Nyberg <alexn@telia.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds a new preemption model: 'Voluntary Kernel Preemption'. The
3 models can be selected from a new menu:
(X) No Forced Preemption (Server)
( ) Voluntary Kernel Preemption (Desktop)
( ) Preemptible Kernel (Low-Latency Desktop)
we still default to the stock (Server) preemption model.
Voluntary preemption works by adding a cond_resched()
(reschedule-if-needed) call to every might_sleep() check. It is lighter
than CONFIG_PREEMPT - at the cost of not having as tight latencies. It
represents a different latency/complexity/overhead tradeoff.
It has no runtime impact at all if disabled. Here are size stats that show
how the various preemption models impact the kernel's size:
text data bss dec hex filename
3618774 547184 179896 4345854 424ffe vmlinux.stock
3626406 547184 179896 4353486 426dce vmlinux.voluntary +0.2%
3748414 548640 179896 4476950 445016 vmlinux.preempt +3.5%
voluntary-preempt is +0.2% of .text, preempt is +3.5%.
This feature has been tested for many months by lots of people (and it's
also included in the RHEL4 distribution and earlier variants were in Fedora
as well), and it's intended for users and distributions who dont want to
use full-blown CONFIG_PREEMPT for one reason or another.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patches add dynamic sched domains functionality that was
extensively discussed on lkml and lse-tech. I would like to see this added to
-mm
o The main advantage with this feature is that it ensures that the scheduler
load balacing code only balances against the cpus that are in the sched
domain as defined by an exclusive cpuset and not all of the cpus in the
system. This removes any overhead due to load balancing code trying to
pull tasks outside of the cpu exclusive cpuset only to be prevented by
the tasks' cpus_allowed mask.
o cpu exclusive cpusets are useful for servers running orthogonal
workloads such as RT applications requiring low latency and HPC
applications that are throughput sensitive
o It provides a new API partition_sched_domains in sched.c
that makes dynamic sched domains possible.
o cpu_exclusive cpusets sets are now associated with a sched domain.
Which means that the users can dynamically modify the sched domains
through the cpuset file system interface
o ia64 sched domain code has been updated to support this feature as well
o Currently, this does not support hotplug. (However some of my tests
indicate hotplug+preempt is currently broken)
o I have tested it extensively on x86.
o This should have very minimal impact on performance as none of
the fast paths are affected
Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Acked-by: Paul Jackson <pj@sgi.com>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Consolidate balance-on-exec with balance-on-fork. This is made easy by the
sched-domains RCU patches.
As well as the general goodness of code reduction, this allows the runqueues
to be unlocked during balance-on-fork.
schedstats is a problem. Maybe just have balance-on-event instead of
distinguishing fork and exec?
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Instead of requiring architecture code to interact with the scheduler's
locking implementation, provide a couple of defines that can be used by the
architecture to request runqueue unlocked context switches, and ask for
interrupts to be enabled over the context switch.
Also replaces the "switch_lock" used by these architectures with an oncpu
flag (note, not a potentially slow bitflag). This eliminates one bus
locked memory operation when context switching, and simplifies the
task_running function.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do some basic initial tuning.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Reimplement the balance on exec balancing to be sched-domains aware. Use this
to also do balance on fork balancing. Make x86_64 do balance on fork over the
NUMA domain.
The problem that the non sched domains aware blancing became apparent on dual
core, multi socket opterons. What we want is for the new tasks to be sent to
a different socket, but more often than not, we would first load up our
sibling core, or fill two cores of a single remote socket before selecting a
new one.
This gives large improvements to STREAM on such systems.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Remove the very aggressive idle stuff that has recently gone into 2.6 - it is
going against the direction we are trying to go. Hopefully we can regain
performance through other methods.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Do CPU load averaging over a number of different intervals. Allow each
interval to be chosen by sending a parameter to source_load and target_load.
0 is instantaneous, idx > 0 returns a decaying average with the most recent
sample weighted at 2^(idx-1). To a maximum of 3 (could be easily increased).
So generally a higher number will result in more conservative balancing.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2.6.12-rc6-mm1 has a few remaining synchronize_kernel()s, some (but not
all) in comments. This patch changes these synchronize_kernel() calls (and
comments) to synchronize_rcu() or synchronize_sched() as follows:
- arch/x86_64/kernel/mce.c mce_read(): change to synchronize_sched() to
handle races with machine-check exceptions (synchronize_rcu() would not cut
it given RCU implementations intended for hardcore realtime use.
- drivers/input/serio/i8042.c i8042_stop(): change to synchronize_sched() to
handle races with i8042_interrupt() interrupt handler. Again,
synchronize_rcu() would not cut it given RCU implementations intended for
hardcore realtime use.
- include/*/kdebug.h comments: change to synchronize_sched() to handle races
with NMIs. As before, synchronize_rcu() would not cut it...
- include/linux/list.h comment: change to synchronize_rcu(), since this
comment is for list_del_rcu().
- security/keys/key.c unregister_key_type(): change to synchronize_rcu(),
since this is interacting with RCU read side.
- security/keys/process_keys.c install_session_keyring(): change to
synchronize_rcu(), since this is interacting with RCU read side.
Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Without this patch, Linux provokes emergency disk shutdowns and
similar nastiness. It was in SuSE kernels for some time, IIRC.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use
disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4
SMP patch.
Signed-off-by: Li Shaohua<shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds __cpuinit and __cpuinitdata sections that need to exist past
boot to support cpu hotplug.
Caveat: This is done *only* for EM64T CPU Hotplug support, on request from
Andi Kleen. Much of the generic hotplug code in kernel, and none of the other
archs that support CPU hotplug today, i386, ia64, ppc64, s390 and parisc dont
mark sections with __cpuinit, but only mark them as __devinit, and
__devinitdata.
If someone is motivated to change generic code, we need to make sure all
existing hotplug code does not break, on other arch's that dont use __cpuinit,
and __cpudevinit.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Andi Kleen <ak@muc.de>
Acked-by: Zwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I really wish smp_prepare_cpu() would disappear eventually. In the interim
this is ideally a weak function, so we dont end up changing several places
to define this dummy in headers.
Today since the dummy declaration is done only in drivers/base/cpu.c but
the function is called in kernel/power/smp.c i get undefined reference in
my cpu hotplug code for x86_64 under development.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I8K: Change to use stock dmi infrastructure instead of homegrown
parsing code. The driver now requires box's DMI data to match
list of supported models so driver can be safely compiled-in
by default without fear of it poking into random SMM BIOS
code. DMI checks can be ignored with i8k.ignore_dmi option.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes sparse warnings in the qnx4fs (and might even make
qnx4fs work on big-endian boxes)
Signed-off-by: Alexey Dobriyan <adobriyan@mail.ru>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Anders Larsen <al@alarsen.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Another rollup of patches which give various symbols static scope
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch reworks filemap_xip.c with the goal to reduce code duplication
from mm/filemap.c. It applies agains 2.6.12-rc6-mm1. Instead of
implementing the aio functions, this one implements the synchronous
read/write functions only. For readv and writev, the generic fallback is
used. For aio, we rely on the application doing the fallback. Since our
"synchronous" function does memcpy immediately anyway, there is no
performance difference between using the fallbacks or implementing each
operation.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
These are the ext2 related parts. Ext2 now uses the xip_* file operations
along with the get_xip_page aop when mounted with -o xip.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- generic_file* file operations do no longer have a xip/non-xip split
- filemap_xip.c implements a new set of fops that require get_xip_page
aop to work proper. all new fops are exported GPL-only (don't like to
see whatever code use those except GPL modules)
- __xip_unmap now uses page_check_address, which is no longer static
in rmap.c, and defined in linux/rmap.h
- mm/filemap.h is now much more clean, plainly having just Linus'
inline funcs moved here from filemap.c
- fix includes in filemap_xip to make it build cleanly on i386
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is the block device related part. The block device operation
direct_access now has a struct block_device as first parameter.
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Set the recovery directory via /proc/fs/nfsd/nfs4recoverydir.
It may be changed any time, but is used only on startup.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the code to create and remove client subdirectories from the
recovery directory, as described in the previous patch comment.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NFSv4 clients are required to know what state they have on the server so that
they can reclaim it on server reboot. However, it is possible for
pathalogical combinations of server reboots and network partitions to leave a
client in a state where it cannot know whether it has lost its state on the
server.
For this reason, rfc3530 requires that we store some information about clients
to stable storage.
So we maintain a directory /var/lib/nfs/v4recovery with a subdirectory for
each client with active state. We leave open the possibility of including
files underneath each such subdirectory with information about the client, but
for now the subdirectories are empty.
We create a client subdirectory whenever a client makes its first non-reclaim
open_confirm.
We remove a client subdirectory whenever either
a) its lease expires, or
b) the grace period ends without it reclaiming anything.
When handling reclaims, we allow the reclaim if and only if the client doing
the reclaim has a subdirectory.
This patch adds just the code to scan the recovery directory on nfsd startup.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The cb_parsed field is only used by probe_callback, to determine whether the
callback information has been filled in by setclientid. But there is no way
that probe_callback() can be called without that having already happened, so
that check is superfluous, as is cb_parsed.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trivial renaming patch:
I can never remember, while looking at various lists relating the nfsd4 state
structures, which are the "heads" and which are items on other lists, or which
structures are actually on the various lists. The following convention helps
me: given structures foo and bar, with foo containing the head of a list of
bars, use "bars" for the name of the head of the list contained in the struct
foo, and use "per_foo" for the entries in the struct bars.
Already done for struct nfs4_file; go ahead and do it for the other nfsd4
state structures.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch contains the following possible cleanups:
- make needlessly global code static
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
For the purposes of reboot recovery we keep a directory with subdirectories
each having a name that is the ascii hex representation of the md5 sum of a
client identifier for an active client.
This adds the code to calculate that name. We also use it for the purposes of
comparing clients, so if someone ever manages to find two client names that
are md5 collisions, then we'll return clid_inuse to the second.
Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adopt standard kernel style by defining a no-op function instead of putting
ifdef's in the code where the function is called.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Separate out stuff that needs initialization on startup from stuff that only
needs initialization on module init from static data.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Somewhat gratuitous rename to simplify following patch.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Allow recovery of delegations after reboot.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a struct kref to each nfs4_file and take a reference to it from each
stateid and delegation that refers to it. The atomicity guarantees are
overkill given that all this stuff is done under the single nfsd4 state lock,
but a) we'd like finer-grained locking some day, and b) this simplifies the
cleanup of the structures a bit, something that has previously been a bit
complicated and bug-prone.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trivial renaming patch:
I can never remember, while looking at various lists relating the nfsd4 state
structures, which are the "heads" and which are items on other lists, or which
structures are actually on the various lists. The following convention helps
me: given structures foo and bar, with foo containing the head of a list of
bars, use "bars" for the name of the head of the list contained in the struct
foo, and use "per_foo" for the entries in the struct bars.
Go ahead and do this for struct nfs4_file.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We're returning NFS4_FH_NOEXPIRE_WITH_OPEN | NFS4_FH_VOL_RENAME for the
fh_expire_type attribute. This is incorrect:
1. The spec actually only allows NOEXPIRE_WITH_OPEN when
VOLATILE_ANY is also set.
2. Filehandles for open files can expire, if the file is removed
and there is a reboot.
3. Filehandles are only volatile on rename in the nosubtree check
case.
Unfortunately, there's no way to indicate that we only expire on remove. So
our only choice is FH4_VOLATILE_ANY. Although it's redundant, we also set
FH4_VOL_RENAME in the subtree check case, since subtreecheck does actually
cause problems in practice and it seems possibly useful to give clients some
way to distinguish that case.
Fix a mispelled #define while we're at it.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Lindent run and replaced printk() through the corresponding osm_*() function
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Added header "core.h" for i2o_core.ko internal definitions
- More sparse fixes
- Changed display of TID's in sysfs attributes from XXX to 0xXXX
- Use the right functions for accessing I/O and normal memory
- Removed error handling of SCSI device errors and let the SCSI layer
take care of it
- Added new device / removed device handling to SCSI-OSM
- Make status access volatile
- Cleaned up activation of I2O controller
- Removed unnecessary wmb() and rmb() calls
- Use own struct i2o_io for I/O memory instead of struct i2o_dma
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Provide SG_IO access to BLOCK and EXECUTIVE class on Adaptec
controllers
- Use PRIVATE messages in SCSI-OSM because on some controllers normal
SCSI class commands like READ or READ CAPACITY cause errors
- Use new DMA and SG list creation function
- Added workaround to limit sectors per request for Adaptec 2400A
controllers
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Added Bus-OSM which could be used by user space programs to reset a
channel on the controller
- Make ioctl's in Config-OSM obsolete in prefer for sysfs attributes and
move those to its own file
- Added sysfs attribute for firmware read and write access for I2O
controllers
- Added special handling of firmware read and write access for Adaptec
controllers
- Added vendor id and product id as sysfs-attribute to Executive classes
- Added automatic notification of LCT change handling to Exec-OSM
- Added flushing function to Block-OSM for later barrier implementation
- Use PRIVATE messages for Block access on Adaptec controllers, which are
faster then BLOCK class access
- Cleaned up support for Promise controller
- New messages are now detected using the IRQ status register as
suggested by the I2O spec
- Added i2o_dma_high() and i2o_dma_low() functions
- Added facility for SG tablesize calculation when using 32-bit and
64-bit DMA addresses
- Added i2o_dma_map_single() and i2o_dma_map_sg() which could build the
SG list for 32-bit as well as 64-bit DMA addresses
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Removed unnecessary checking of NULL before calling kfree()
- Make some functions static
- Changed pr_debug() into osm_debug()
- Use i2o_msg_in_to_virt() for getting a pointer to the message frame
- Cleaned up some comments
- Changed some le32_to_cpu() into readl() where necessary
- Make error messages of OSM's look the same
- Cleaned up error handling in i2o_block_end_request()
- Removed unused error handling of failed messages in Block-OSM, which
are not allowed by the I2O spec
- Corrected the blocksize detection in i2o_block
- Added hrt and lct sysfs-attribute to controller
- Call done() function in SCSI-OSM after freeing DMA buffers
- Removed unneeded variable for message size calculation in
i2o_scsi_queuecommand()
- Make some changes to remove sparse warnings
- Reordered some functions
- Cleaned up controller initialization
- Replaced some magic numbers by defines
- Removed unnecessary dma_sync_single_for_cpu() call on coherent DMA
- Removed some unused fields in i2o_controller and removed some unused
functions
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changes:
- Fixed sysfs bug where user and parent links where added to the I2O
device itself
- Fixed bug when calculating TID for the event handler and cleaned up the
workflow of i2o_driver_dispatch()
- Fixed oops when no I2O device could be found for an event delivered to
Exec-OSM
- Fixed initialization of spinlock in Exec-OSM
- Fixed memory leak in i2o_cfg_passthru() and i2o_cfg_passthru()
- Removed MTRR support
- Added PCI ID of Promise SX6000 with firmware >= 1.20.x.x
- Turn of caching for ioremapped memory of in_queue
- Added initialization sequence for Promise controllers
- Moved definition of u8 / u16 / u32 for raidutils before first use
Signed-off-by: Markus Lidel <Markus.Lidel@shadowconnect.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add support for TPMs on additional LPC buses.
Signed-off-by: Kylene Hall <kjhall@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch to adds "power cycle" functionality to the IPMI power off module
ipmi_poweroff. It also contains changes to support procfs control of the
feature.
The power cycle action is considered an optional chassis control in the IPMI
specification. However, it is definitely useful when the hardware supports
it. A power cycle is usually required in order to reset a firmware in a bad
state. This action is critical to allow remote management of servers.
The implementation adds power cycle as optional to the ipmi_poweroff module.
It can be modified dynamically through the proc entry mentioned above. During
a power down and enabled, the power cycle command is sent to the BMC firmware.
If it fails either due to non-support or some error, it will retry to send
the command as power off.
Signed-off-by: Christopher A. Poblete <Chris_Poblete@dell.com>
Signed-off-by: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use improved credits estimates for quota operations. Also reserve space
for a quota operation in a transaction only if filesystem was mounted with
some quota option.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use improved credits estimates for quota operations. Also reserve a space
for a quota operation in a transaction only if filesystem was mounted with
some quota options.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Improve estimates on the number of needed credits for quota transaction.
Now we distinguish blocks that might need to be allocated and blocks that
only need to be rewritten. Also we distinguish deleting of a quota
structure and creating of a new one.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
XFS will have to look at iocb->private to fix aio+dio. No other filesystem
is using the blockdev_direct_IO* end_io callback.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes the following changes:
(1) There's a new special key type called ".request_key_auth".
This is an authorisation key for when one process requests a key and
another process is started to construct it. This type of key cannot be
created by the user; nor can it be requested by kernel services.
Authorisation keys hold two references:
(a) Each refers to a key being constructed. When the key being
constructed is instantiated the authorisation key is revoked,
rendering it of no further use.
(b) The "authorising process". This is either:
(i) the process that called request_key(), or:
(ii) if the process that called request_key() itself had an
authorisation key in its session keyring, then the authorising
process referred to by that authorisation key will also be
referred to by the new authorisation key.
This means that the process that initiated a chain of key requests
will authorise the lot of them, and will, by default, wind up with
the keys obtained from them in its keyrings.
(2) request_key() creates an authorisation key which is then passed to
/sbin/request-key in as part of a new session keyring.
(3) When request_key() is searching for a key to hand back to the caller, if
it comes across an authorisation key in the session keyring of the
calling process, it will also search the keyrings of the process
specified therein and it will use the specified process's credentials
(fsuid, fsgid, groups) to do that rather than the calling process's
credentials.
This allows a process started by /sbin/request-key to find keys belonging
to the authorising process.
(4) A key can be read, even if the process executing KEYCTL_READ doesn't have
direct read or search permission if that key is contained within the
keyrings of a process specified by an authorisation key found within the
calling process's session keyring, and is searchable using the
credentials of the authorising process.
This allows a process started by /sbin/request-key to read keys belonging
to the authorising process.
(5) The magic KEY_SPEC_*_KEYRING key IDs when passed to KEYCTL_INSTANTIATE or
KEYCTL_NEGATE will specify a keyring of the authorising process, rather
than the process doing the instantiation.
(6) One of the process keyrings can be nominated as the default to which
request_key() should attach new keys if not otherwise specified. This is
done with KEYCTL_SET_REQKEY_KEYRING and one of the KEY_REQKEY_DEFL_*
constants. The current setting can also be read using this call.
(7) request_key() is partially interruptible. If it is waiting for another
process to finish constructing a key, it can be interrupted. This permits
a request-key cycle to be broken without recourse to rebooting.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-Off-By: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch makes it possible to pass a session keyring through to the
process spawned by call_usermodehelper(). This allows patch 3/3 to pass an
authorisation key through to /sbin/request-key, thus permitting better access
controls when doing just-in-time key creation.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The attached patch changes the key implementation in a number of ways:
(1) It removes the spinlock from the key structure.
(2) The key flags are now accessed using atomic bitops instead of
write-locking the key spinlock and using C bitwise operators.
The three instantiation flags are dealt with with the construction
semaphore held during the request_key/instantiate/negate sequence, thus
rendering the spinlock superfluous.
The key flags are also now bit numbers not bit masks.
(3) The key payload is now accessed using RCU. This permits the recursive
keyring search algorithm to be simplified greatly since no locks need be
taken other than the usual RCU preemption disablement. Searching now does
not require any locks or semaphores to be held; merely that the starting
keyring be pinned.
(4) The keyring payload now includes an RCU head so that it can be disposed
of by call_rcu(). This requires that the payload be copied on unlink to
prevent introducing races in copy-down vs search-up.
(5) The user key payload is now a structure with the data following it. It
includes an RCU head like the keyring payload and for the same reason. It
also contains a data length because the data length in the key may be
changed on another CPU whilst an RCU protected read is in progress on the
payload. This would then see the supposed RCU payload and the on-key data
length getting out of sync.
I'm tempted to drop the key's datalen entirely, except that it's used in
conjunction with quota management and so is a little tricky to get rid
of.
(6) Update the keys documentation.
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Finds a pattern in the skb data according to the specified
textsearch configuration. Use textsearch_next() to retrieve
subsequent occurrences of the pattern. Returns the offset
to the first occurrence or UINT_MAX if no match was found.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implements sequential reading for both linear and non-linear
skb data at zerocopy cost. The data is returned in chunks of
arbitary length, therefore random access is not possible.
Usage:
from := 0
to := 128
state := undef
data := undef
len := undef
consumed := 0
skb_prepare_seq_read(skb, from, to, &state)
while (len = skb_seq_read(consumed, &data, &state)) != 0 do
/* do something with 'data' of length 'len' */
if abort then
/* abort read if we don't wait for
* skb_seq_read() to return 0 */
skb_abort_seq_read(&state)
return
endif
/* not necessary to consume all of 'len' */
consumed += len
done
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
A finite state machine consists of n states (struct ts_fsm_token)
representing the pattern as a finite automation. The data is read
sequentially on a octet basis. Every state token specifies the number
of recurrences and the type of value accepted which can be either a
specific character or ctype based set of characters. The available
type of recurrences include 1, (0|1), [0 n], and [1 n].
The algorithm differs between strict/non-strict mode specyfing
whether the pattern has to start at the first octect. Strict mode
is enabled by default and can be disabled by inserting
TS_FSM_HEAD_IGNORE as the first token in the chain.
The runtime performance of the algorithm should be around O(n),
however while in strict mode the average runtime can be better.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The textsearch infrastructure provides text searching
facitilies for both linear and non-linear data.
Individual search algorithms are implemented in modules
and chosen by the user.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow using setsockopt to set TCP congestion control to use on a per
socket basis.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate out the two uses of netdev_max_backlog. One controls the
upper bound on packets processed per softirq, the new name for this is
netdev_budget; the other controls the limit on packets queued via
netif_rx.
Increase the max_backlog default to account for faster processors.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate the throttling behaviour when the netif receive queue fills
because it behaves badly when using high speed networks under load.
The throttling cause multiple packet drops that cause TCP to go into
slow start mode. The same effective patch has been part of BIC TCP and
H-TCP as well as part of Web100.
The existing code drops 100's of packets when the queue fills;
this changes it to individual packet drop-tail.
Signed-off-by: Stephen Hemmminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the congestion sensing mechanism from netif_rx, and always
return either full or empty. Almost no driver checks the return value
from netif_rx, and those that do only use it for debug messages.
The original design of netif_rx was to do flow control based on the
receive queue, but NAPI has supplanted this and no driver uses the
feedback.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove last vestiages of fastroute code that is no longer used.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhancement to the tcp_diag interface used by the iproute2 ss command
to report the tcp congestion control being used by a socket.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow TCP to have multiple pluggable congestion control algorithms.
Algorithms are defined by a set of operations and can be built in
or modules. The legacy "new RENO" algorithm is used as a starting
point and fallback.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's a bit strange to see tty_register_ldisc call in modules' exit
functions.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In the upcoming aio_down patch, it is useful to store a private data
pointer in the kiocb's wait_queue. Since we provide our own wake up
function and do not require the task_struct pointer, it makes sense to
convert the task pointer into a generic private pointer.
Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This file duplicates <linux/posix_acl_xattr.h>, using slightly different
names.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The following patch removes the f_error field and all checks of f_error.
Trond said:
f_error was introduced for NFS, and made sense when we were guaranteed
always to have a file pointer around when write errors occurred. Since
then, we have (for various reasons) had to introduce the nfs_open_context in
order to track the file read/write state, and it made sense to move our
f_error tracking there too.
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch allows block device drivers to convert their ioctl functions to
unlocked_ioctl() like character devices and other subsystems. All
functions that were called with the BKL held before are still used that
way, but I would not be surprised if it could be removed from the ioctl
functions in drivers/block/ioctl.c themselves.
As a side note, I found that compat_blkdev_ioctl() acquires the BKL as
well, which looks like a bug. I have checked that every user of
disk->fops->compat_ioctl() in the current git tree gets the BKL itself, so
it could easily be removed from compat_blkdev_ioctl().
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch improves write performance for the CD/DVD packet writing driver.
The logic for switching between reading and writing has been changed so
that streaming writes are no longer interrupted by read requests.
Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In ia64 kernel, the O_LARGEFILE flag is forced when opening a file. This
is problematic for execution of 32 bit processes, which are not largefile
aware, either by SW emulation or by HW execution.
For such processes, the problem is two-fold:
1) When trying to open a file that is larger than 4G
the operation should fail, but it's not
2) Writing to offset larger than 4G should fail, but
it's not
The proposed patch takes advantage of the way 32 bit processes are
identified in ia64 systems. Such processes have PER_LINUX32 for their
personality. With the patch, the ia64 kernel will not enforce the
O_LARGEFILE flag if the current process has PER_LINUX32 set. The behavior
for all other architectures remains unchanged.
Signed-off-by: Yoav Zach <yoav.zach@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a new `suid_dumpable' sysctl:
This value can be used to query and set the core dump mode for setuid
or otherwise protected/tainted binaries. The modes are
0 - (default) - traditional behaviour. Any process which has changed
privilege levels or is execute only will not be dumped
1 - (debug) - all processes dump core when possible. The core dump is
owned by the current user and no security is applied. This is intended
for system debugging situations only. Ptrace is unchecked.
2 - (suidsafe) - any binary which normally would not be dumped is dumped
readable by root only. This allows the end user to remove such a dump but
not access it directly. For security reasons core dumps in this mode will
not overwrite one another or other files. This mode is appropriate when
adminstrators are attempting to debug problems in a normal environment.
(akpm:
> > +EXPORT_SYMBOL(suid_dumpable);
>
> EXPORT_SYMBOL_GPL?
No problem to me.
> > if (current->euid == current->uid && current->egid == current->gid)
> > current->mm->dumpable = 1;
>
> Should this be SUID_DUMP_USER?
Actually the feedback I had from last time was that the SUID_ defines
should go because its clearer to follow the numbers. They can go
everywhere (and there are lots of places where dumpable is tested/used
as a bool in untouched code)
> Maybe this should be renamed to `dump_policy' or something. Doing that
> would help us catch any code which isn't using the #defines, too.
Fair comment. The patch was designed to be easy to maintain for Red Hat
rather than for merging. Changing that field would create a gigantic
diff because it is used all over the place.
)
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In situations where a kprobes handler calls a routine which has a probe on it,
then kprobes_handler() disarms the new probe forever. This patch removes the
above limitation by temporarily disarming the new probe. When the another
probe hits while handling the old probe, the kprobes_handler() saves previous
kprobes state and handles the new probe without calling the new kprobes
registered handlers. kprobe_post_handler() restores back the previous kprobes
state and the normal execution continues.
However on x86_64 architecture, re-rentrancy is provided only through
pre_handler(). If a routine having probe is referenced through
post_handler(), then the probes on that routine are disarmed forever, since
the exception stack is gets changed after the processor single steps the
instruction of the new probe.
This patch includes generic changes to support temporary disarming on
reentrancy of probes.
Signed-of-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch moves the lock/unlock of the arch specific kprobe_flush_task()
to the non-arch specific kprobe_flusk_task().
Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Acked-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The architecture independent code of the current kprobes implementation is
arming and disarming kprobes at registration time. The problem is that the
code is assuming that arming and disarming is a just done by a simple write
of some magic value to an address. This is problematic for ia64 where our
instructions look more like structures, and we can not insert break points
by just doing something like:
*p->addr = BREAKPOINT_INSTRUCTION;
The following patch to 2.6.12-rc4-mm2 adds two new architecture dependent
functions:
* void arch_arm_kprobe(struct kprobe *p)
* void arch_disarm_kprobe(struct kprobe *p)
and then adds the new functions for each of the architectures that already
implement kprobes (spar64/ppc64/i386/x86_64).
I thought arch_[dis]arm_kprobe was the most descriptive of what was really
happening, but each of the architectures already had a disarm_kprobe()
function that was really a "disarm and do some other clean-up items as
needed when you stumble across a recursive kprobe." So... I took the
liberty of changing the code that was calling disarm_kprobe() to call
arch_disarm_kprobe(), and then do the cleanup in the block of code dealing
with the recursive kprobe case.
So far this patch as been tested on i386, x86_64, and ppc64, but still
needs to be tested in sparc64.
Signed-off-by: Rusty Lynch <rusty.lynch@intel.com>
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds function-return probes to kprobes for the i386
architecture. This enables you to establish a handler to be run when a
function returns.
1. API
Two new functions are added to kprobes:
int register_kretprobe(struct kretprobe *rp);
void unregister_kretprobe(struct kretprobe *rp);
2. Registration and unregistration
2.1 Register
To register a function-return probe, the user populates the following
fields in a kretprobe object and calls register_kretprobe() with the
kretprobe address as an argument:
kp.addr - the function's address
handler - this function is run after the ret instruction executes, but
before control returns to the return address in the caller.
maxactive - The maximum number of instances of the probed function that
can be active concurrently. For example, if the function is non-
recursive and is called with a spinlock or mutex held, maxactive = 1
should be enough. If the function is non-recursive and can never
relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should
be enough. maxactive is used to determine how many kretprobe_instance
objects to allocate for this particular probed function. If maxactive <=
0, it is set to a default value (if CONFIG_PREEMPT maxactive=max(10, 2 *
NR_CPUS) else maxactive=NR_CPUS)
For example:
struct kretprobe rp;
rp.kp.addr = /* entrypoint address */
rp.handler = /*return probe handler */
rp.maxactive = /* e.g., 1 or NR_CPUS or 0, see the above explanation */
register_kretprobe(&rp);
The following field may also be of interest:
nmissed - Initialized to zero when the function-return probe is
registered, and incremented every time the probed function is entered but
there is no kretprobe_instance object available for establishing the
function-return probe (i.e., because maxactive was set too low).
2.2 Unregister
To unregiter a function-return probe, the user calls
unregister_kretprobe() with the same kretprobe object as registered
previously. If a probed function is running when the return probe is
unregistered, the function will return as expected, but the handler won't
be run.
3. Limitations
3.1 This patch supports only the i386 architecture, but patches for
x86_64 and ppc64 are anticipated soon.
3.2 Return probes operates by replacing the return address in the stack
(or in a known register, such as the lr register for ppc). This may
cause __builtin_return_address(0), when invoked from the return-probed
function, to return the address of the return-probes trampoline.
3.3 This implementation uses the "Multiprobes at an address" feature in
2.6.12-rc3-mm3.
3.4 Due to a limitation in multi-probes, you cannot currently establish
a return probe and a jprobe on the same function. A patch to remove
this limitation is being tested.
This feature is required by SystemTap (http://sourceware.org/systemtap),
and reflects ideas contributed by several SystemTap developers, including
Will Cohen and Ananth Mavinakayanahalli.
Signed-off-by: Hien Nguyen <hien@us.ibm.com>
Signed-off-by: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@laposte.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Move some code duplicated in both callers into vfs_quota_on_mount
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to add check to get_chrdev_list and get_blkdev_list to prevent reads
of /proc/devices from spilling over the provided page if more than 4096
bytes of string data are generated from all the registered character and
block devices in a system
Signed-off-by: Neil Horman <nhorman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Looks like locking can be optimised quite a lot. Increase lock widths
slightly so lo_lock is taken fewer times per request. Also it was quite
trivial to cover lo_pending with that lock, and remove the atomic
requirement. This also makes memory ordering explicitly correct, which is
nice (not that I particularly saw any mem ordering bugs).
Test was reading 4 250MB files in parallel on ext2-on-tmpfs filesystem (1K
block size, 4K page size). System is 2 socket Xeon with HT (4 thread).
intel:/home/npiggin# umount /dev/loop0 ; mount /dev/loop0 /mnt/loop ; /usr/bin/time ./mtloop.sh
Before:
0.24user 5.51system 0:02.84elapsed 202%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.52system 0:02.88elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.57system 0:02.89elapsed 198%CPU (0avgtext+0avgdata 0maxresident)k
0.22user 5.51system 0:02.90elapsed 197%CPU (0avgtext+0avgdata 0maxresident)k
0.19user 5.44system 0:02.91elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k
After:
0.07user 2.34system 0:01.68elapsed 143%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.37system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.39system 0:01.68elapsed 145%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.36system 0:01.68elapsed 144%CPU (0avgtext+0avgdata 0maxresident)k
0.06user 2.42system 0:01.68elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch creates a new kstrdup library function and changes the "local"
implementations in several places to use this function.
Most of the changes come from the sound and net subsystems. The sound part
had already been acknowledged by Takashi Iwai and the net part by David S.
Miller.
I left UML alone for now because I would need more time to read the code
carefully before making changes there.
Signed-off-by: Paulo Marques <pmarques@grupopie.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Based on analysis and a patch from Russ Weight <rweight@us.ibm.com>
There is a race condition that can occur if an inode is allocated and then
released (using iput) during the ->fill_super functions. The race
condition is between kswapd and mount.
For most filesystems this can only happen in an error path when kswapd is
running concurrently. For isofs, however, the error can occur in a more
common code path (which is how the bug was found).
The logic here is "we want final iput() to free inode *now* instead of
letting it sit in cache if fs is going down or had not quite come up". The
problem is with kswapd seeing such inodes in the middle of being killed and
happily taking over.
The clean solution would be to tell kswapd to leave those inodes alone and
let our final iput deal with them. I.e. add a new flag
(I_FORCED_FREEING), set it before write_inode_now() there and make
prune_icache() leave those alone.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch splits del_timer_sync() into 2 functions. The new one,
try_to_del_timer_sync(), returns -1 when it hits executing timer.
It can be used in interrupt context, or when the caller hold locks which
can prevent completion of the timer's handler.
NOTE. Currently it can't be used in interrupt context in UP case, because
->running_timer is used only with CONFIG_SMP.
Should the need arise, it is possible to kill #ifdef CONFIG_SMP in
set_running_timer(), it is cheap.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch tries to solve following problems:
1. del_timer_sync() is racy. The timer can be fired again after
del_timer_sync have checked all cpus and before it will recheck
timer_pending().
2. It has scalability problems. All cpus are scanned to determine
if the timer is running on that cpu.
With this patch del_timer_sync is O(1) and no slower than plain
del_timer(pending_timer), unless it has to actually wait for
completion of the currently running timer.
The only restriction is that the recurring timer should not use
add_timer_on().
3. The timers are not serialized wrt to itself.
If CPU_0 does mod_timer(jiffies+1) while the timer is currently
running on CPU 1, it is quite possible that local interrupt on
CPU_0 will start that timer before it finished on CPU_1.
4. The timers locking is suboptimal. __mod_timer() takes 3 locks
at once and still requires wmb() in del_timer/run_timers.
The new implementation takes 2 locks sequentially and does not
need memory barriers.
Currently ->base != NULL means that the timer is pending. In that case
->base.lock is used to lock the timer. __mod_timer also takes timer->lock
because ->base can be == NULL.
This patch uses timer->entry.next != NULL as indication that the timer is
pending. So it does __list_del(), entry->next = NULL instead of list_del()
when the timer is deleted.
The ->base field is used for hashed locking only, it is initialized
in init_timer() which sets ->base = per_cpu(tvec_bases). When the
tvec_bases.lock is locked, it means that all timers which are tied
to this base via timer->base are locked, and the base itself is locked
too.
So __run_timers/migrate_timers can safely modify all timers which could
be found on ->tvX lists (pending timers).
When the timer's base is locked, and the timer removed from ->entry list
(which means that _run_timers/migrate_timers can't see this timer), it is
possible to set timer->base = NULL and drop the lock: the timer remains
locked.
This patch adds lock_timer_base() helper, which waits for ->base != NULL,
locks the ->base, and checks it is still the same.
__mod_timer() schedules the timer on the local CPU and changes it's base.
However, it does not lock both old and new bases at once. It locks the
timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and
unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base,
and adds this timer. This simplifies the code, because AB-BA deadlock is not
possible. __mod_timer() also ensures that the timer's base is not changed
while the timer's handler is running on the old base.
__run_timers(), del_timer() do not change ->base anymore, they only clear
pending flag.
So del_timer_sync() can test timer->base->running_timer == timer to detect
whether it is running or not.
We don't need timer_list->lock anymore, this patch kills it.
We also don't need barriers. del_timer() and __run_timers() used smp_wmb()
before clearing timer's pending flag. It was needed because __mod_timer()
did not lock old_base if the timer is not pending, so __mod_timer()->list_add()
could race with del_timer()->list_del(). With this patch these functions are
serialized through base->lock.
One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch
adds global
struct timer_base_s {
spinlock_t lock;
struct timer_list *running_timer;
} __init_timer_base;
which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s
struct are replaced by struct timer_base_s t_base.
It is indeed ugly. But this can't have scalability problems. The global
__init_timer_base.lock is used only when __mod_timer() is called for the first
time AND the timer was compile time initialized. After that the timer migrates
to the local CPU.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Renaud Lienhart <renaud.lienhart@free.fr>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
blk_queue_tag->real_max_depth was used to optimize out unnecessary
allocations/frees on tag resize. However, the whole thing was very broken -
tag_map was never allocated to real_max_depth resulting in access beyond the
end of the map, bits in [max_depth..real_max_depth] were set when initializing
a map and copied when resizing resulting in pre-occupied tags.
As the gain of the optimization is very small, well, almost nill, remove the
whole thing.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Patch to allocate the control structures for for ide devices on the node of
the device itself (for NUMA systems). The patch depends on the Slab API
change patch by Manfred and me (in mm) and the pcidev_to_node patch that I
posted today.
Does some realignment too.
Signed-off-by: Justin M. Forbes <jmforbes@linuxtx.org>
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Pravin Shelar <pravin@calsoftinc.com>
Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Make sparse's initalization be accessible at runtime. This allows sparse
mappings to be created after boot in a hotplug situation.
This patch is separated from the previous one just to give an indication how
much of the sparse infrastructure is *just* for hotplug memory.
The section_mem_map doesn't really store a pointer. It stores something that
is convenient to do some math against to get a pointer. It isn't valid to
just do *section_mem_map, so I don't think it should be stored as a pointer.
There are a couple of things I'd like to store about a section. First of all,
the fact that it is !NULL does not mean that it is present. There could be
such a combination where section_mem_map *is* NULL, but the math gets you
properly to a real mem_map. So, I don't think that check is safe.
Since we're storing 32-bit-aligned structures, we have a few bits in the
bottom of the pointer to play with. Use one bit to encode whether there's
really a mem_map there, and the other one to tell whether there's a valid
section there. We need to distinguish between the two because sometimes
there's a gap between when a section is discovered to be present and when we
can get the mem_map for it.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The part of the sparsemem patch which modifies memmap_init_zone() has recently
become a problem. It changes behavior so that there is a call to
pfn_to_page() for each individual page inside of a node's range:
node_start_pfn through node_end_pfn. It used to simply do this once, at the
beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
of a node made it necessary to change.
Mike Kravetz recently wrote a patch which made the NUMA code accept some new
kinds of layouts. The system's memory was laid out like this, with node 0's
memory in two pieces: one before and one after node 1's memory:
Node 0: +++++ +++++
Node 1: +++++
Previous behavior before Mike's patch was to assign nodes like this:
Node 0: 00000 XXXXX
Node 1: 11111
Where the 'X' areas were simply thrown away. The new behavior was to make the
pg_data_t span node 0 across all of its areas, including areas that are really
node 1's: Node 0: 000000000000000 Node 1: 11111
This wastes a little bit of mem_map space, but ends up being OK, and more
fully utilizes the system's memory. memmap_init_zone() initializes all of the
"struct page"s for node 0, even for the "hole", but those never get used,
because there is no pfn_to_page() that resolves to those pages. However, only
calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
allocated for node0->node_mem_map because:
struct page *start = pfn_to_page(start_pfn);
// effectively start = &node->node_mem_map[0]
for (page = start; page < (start + size); page++) {
init_page_here();...
page++;
}
Slow, and wasteful, but generally harmless.
But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
does):
for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
page = pfn_to_page(pfn);
}
And you end up trying to initialize node 1's pages too early, along with bogus
data from node 0. This patch checks for those weird layouts and declines to
touch the pages, making the more frequent pfn_to_page() calls OK to do.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sparsemem abstracts the use of discontiguous mem_maps[]. This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems. Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.
A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA. When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.
Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous. It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.
Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory. This is what allows the mem_map[]
to be chopped up.
In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags. Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations. However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions. Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.
One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags. It might provide
speed increases on certain platforms and will be stored there if there is
room. But, if out of room, an alternate (theoretically slower) mechanism is
used.
This patch introduces CONFIG_FLATMEM. It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Provide a default implementation for early_pfn_to_nid returning node 0. Allow
architectures to override this with their own implementation out of
asm/mmzone.h.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There is some confusion that arose when working on SPARSEMEM patch between
what is needed for DISCONTIG vs. NUMA.
Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently.
All of the current NUMA implementations require an implementation of
DISCONTIG. Because of this, quite a lot of code which is really needed for
NUMA is actually under DISCONTIG #ifdefs. For SPARSEMEM, we changed some
of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n
case.
Introducing this new NEED_MULTIPLE_NODES config option allows code that is
needed for both NUMA or DISCONTIG to be separated out from code that is
specific to DISCONTIG.
One great advantage of this approach is that it doesn't require every
architecture to be converted over. All of the current implementations
should "just work", only the ones implementing SPARSEMEM will have to be
fixed up.
The change to free_area_init() makes it work inside, or out of the new
config option.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Generify the value fields in the page_flags. The aim is to allow the location
and size of these fields to be varied. Additionally we want to move away from
fixed allocations per field whilst still enforcing the overall bit utilisation
limits. We rely on the compiler to spot and optimise the accessor functions.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Introduce a simple allocator for the NUMA remap space. This space is very
scarce, used for structures which are best allocated node local.
This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
the pgdat->node_mem_map initialized in a consistent place for all
architectures.
Issues:
o alloc_remap takes a node_id where we might expect a pgdat which was intended
to allow us to allocate the pgdat's using this mechanism; which we do not yet
do. Could have alloc_remap_node() and alloc_remap_nid() for this purpose.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch effectively eliminates direct use of pgdat->node_mem_map outside
of the DISCONTIG code. On a flat memory system, these fields aren't
currently used, neither are they on a sparsemem system.
There was also a node_mem_map(nid) macro on many architectures. Its use
along with the use of ->node_mem_map itself was not consistent. It has
been removed in favor of two new, more explicit, arch-independent macros:
pgdat_page_nr(pgdat, pagenr)
nid_page_nr(nid, pagenr)
I called them "pgdat" and "nid" because we overload the term "node" to mean
"NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways. I
believe the newer names are much clearer.
These macros can be overridden in the sparsemem case with a theoretically
slower operation using node_start_pfn and pfn_to_page(), instead. We could
make this the only behavior if people want, but I don't want to change too
much at once. One thing at a time.
This patch removes more code than it adds.
Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64. Full list
here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/
Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin J. Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
- Use extern prefix for functions required.
- Removed a lot of wrappers, including t1_read/write_reg_4.
- Removed various macros, using native kernel calls now.
- Enumerated various #defines.
- Removed a lot of shared code which is not currently used in "NIC only" mode.
- Removed dead code.
Documentation/networking/cxgb.txt:
- Updated release notes for version 2.1.1
drivers/net/chelsio/ch_ethtool.h
- removed file, no longer using ETHTOOL namespace.
drivers/net/chelsio/common.h
- moved code from osdep.h to common.h
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/cphy.h
- removed dead code.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/cxgb2.c
- use DMA_{32,64}BIT_MASK in include/linux/dma-mapping.h.
- removed unused code.
- use printk message for link info resembling drivers/net/mii.c.
- no longer using the MODULE_xxx namespace.
- no longer using "pci_" namespace.
- no longer using ETHTOOL namespace.
drivers/net/chelsio/cxgb2.h
- removed file, merged into common.h
drivers/net/chelsio/elmer0.h
- removed dead code.
- added various enums.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/espi.c
- removed various macros, using native kernel calls now.
- removed a lot of wrappers, including t1_read/write_reg_4.
drivers/net/chelsio/espi.h
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/gmac.h
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/mv88x201x.c
- changes to sync with Chelsio TOT.
drivers/net/chelsio/osdep.h
- removed file, consolidation. osdep was used to translate wrapper functions
since our code supports multiple OSs. removed wrappers.
drivers/net/chelsio/pm3393.c
- removed various macros, using native kernel calls now.
- removed a lot of wrappers, including t1_read/write_reg_4.
- removed unused code.
drivers/net/chelsio/regs.h
- added a few register entries for future and current feature support.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/sge.c
- rewrote large portion of scatter-gather engine to stabilize
performance.
- using u8/u16/u32 kernel types instead of __u8/__u16/__u32 compiler
types.
drivers/net/chelsio/sge.h
- rewrote large portion of scatter-gather engine to stabilize
performance.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/subr.c
- merged tp.c into subr.c
- removed various macros, using native kernel calls now.
- removed a lot of wrappers, including t1_read/write_reg_4.
- removed unused code.
drivers/net/chelsio/suni1x10gexp_regs.h
- modified copyright and authorship of file.
- added comment to #endif indicating which symbol it closes.
drivers/net/chelsio/tp.c
- removed file, merged into subr.c.
drivers/net/chelsio/tp.h
- removed file.
include/linux/pci_ids.h
- patched to include PCI_VENDOR_ID_CHELSIO 0x1425, removed define from
our code.
This patch is a follow up to patch 1 regarding "Selective Sub Address
matching with call user data". It allows use of the Fast-Select-Acceptance
optional user facility for X.25.
This patch just implements fast select with no restriction on response
(NRR). What this means (according to ITU-T Recomendation 10/96 section
6.16) is that if in an incoming call packet, the relevant facility bits are
set for fast-select-NRR, then the called DTE can issue a direct response to
the incoming packet using a call-accepted packet that contains
call-user-data. This patch allows such a response.
The called DTE can also respond with a clear-request packet that contains
call-user-data. However, this feature is currently not implemented by the
patch.
How is Fast Select Acceptance used?
By default, the system does not allow fast select acceptance (as before).
To enable a response to fast select acceptance,
After a listen socket in created and bound as follows
socket(AF_X25, SOCK_SEQPACKET, 0);
bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr));
but before a listen system call is made, the following ioctl should be used.
ioctl(call_soc,SIOCX25CALLACCPTAPPRV);
Now the listen system call can be made
listen(call_soc, 4);
After this, an incoming-call packet will be accepted, but no call-accepted
packet will be sent back until the following system call is made on the socket
that accepts the call
ioctl(vc_soc,SIOCX25SENDCALLACCPT);
The network (or cisco xot router used for testing here) will allow the
application server's call-user-data in the call-accepted packet,
provided the call-request was made with Fast-select NRR.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Shaun Pereira <spereira@tusc.com.au>
This is the first (independent of the second) patch of two that I am
working on with x25 on linux (tested with xot on a cisco router). Details
are as follows.
Current state of module:
A server using the current implementation (2.6.11.7) of the x25 module will
accept a call request/ incoming call packet at the listening x.25 address,
from all callers to that address, as long as NO call user data is present
in the packet header.
If the server needs to choose to accept a particular call request/ incoming
call packet arriving at its listening x25 address, then the kernel has to
allow a match of call user data present in the call request packet with its
own. This is required when multiple servers listen at the same x25 address
and device interface. The kernel currently matches ALL call user data, if
present.
Current Changes:
This patch is a follow up to the patch submitted previously by Andrew
Hendry, and allows the user to selectively control the number of octets of
call user data in the call request packet, that the kernel will match. By
default no call user data is matched, even if call user data is present.
To allow call user data matching, a cudmatchlength > 0 has to be passed
into the kernel after which the passed number of octets will be matched.
Otherwise the kernel behavior is exactly as the original implementation.
This patch also ensures that as is normally the case, no call user data
will be present in the Call accepted / call connected packet sent back to
the caller
Future Changes on next patch:
There are cases however when call user data may be present in the call
accepted packet. According to the X.25 recommendation (ITU-T 10/96)
section 5.2.3.2 call user data may be present in the call accepted packet
provided the fast select facility is used. My next patch will include this
fast select utility and the ability to send up to 128 octets call user data
in the call accepted packet provided the fast select facility is used. I
am currently testing this, again with xot on linux and cisco.
Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
(With a fix from Alexey Dobriyan <adobriyan@gmail.com>)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides support for registering multiple netpoll clients to the
same network device. Only one of these clients may register an rx_hook,
however. In practice, this restriction has not been problematic. It is
worth mentioning, though, that the current design can be easily extended to
allow for the registration of multiple rx_hooks.
The basic idea of the patch is that the rx_np pointer in the netpoll_info
structure points to the struct netpoll that has rx_hook filled in. Aside
from this one case, there is no need for a pointer from the struct
net_device to an individual struct netpoll.
A lock is introduced to protect the setting and clearing of the np_rx
pointer. The pointer will only be cleared upon netpoll client module
removal, and the lock should be uncontested.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces a netpoll_info structure, which the struct net_device
will now point to instead of pointing to a struct netpoll. The reason for
this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock
should be maintained per net_device, not per netpoll; and 2) this is a first
step in providing support for multiple netpoll clients to register against the
same net_device.
The struct netpoll is now pointed to by the netpoll_info structure. As
such, the previous behaviour of the code is preserved.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This trivial patch moves the assignment of poll_owner to -1 inside of
the lock. This fixes a potential SMP race in the code.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the lock blocks, the server may send us a GRANTED message that
races with the reply to our LOCK request. Make sure that we catch
the GRANTED by queueing up our request on the nlm_blocked list
before we send off the first LOCK rpc call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Basically copies the VFS's method for tracking writebacks and applies
it to the struct nfs_page.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Unless we're doing O_APPEND writes, we really don't care about revalidating
the file length. Just make sure that we catch any page cache invalidations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Instead of looking at whether or not the file is open for writes before
we accept to update the length using the server value, we should rather
be looking at whether or not we are currently caching any writes.
Failure to do so means in particular that we're not updating the file
length correctly after obtaining a POSIX or BSD lock.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv3 currently returns the unsigned 64-bit cookie directly to
userspace. The following patch causes the kernel to generate
loff_t offsets for the benefit of userland.
The current server-generated READDIR cookie is cached in the
nfs_open_context instead of in filp->f_pos, so we still end up work
correctly under directory insertions/deletion.
Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Attach acls to inodes in the icache to avoid unnecessary GETACL RPC
round-trips. As long as the client doesn't retrieve any acls itself, only the
default acls of exiting directories and the default and access acls of new
directories will end up in the cache, which preserves some memory compared to
always caching the access and default acl of all files.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
NFSv3 has no concept of a umask on the server side: The client applies
the umask locally, and sends the effective permissions to the server.
This behavior is wrong when files are created in a directory that has a
default ACL. In this case, the umask is supposed to be ignored, and
only the default ACL determines the file's effective permissions.
Usually its the server's task to conditionally apply the umask. But
since the server knows nothing about the umask, we have to do it on the
client side. This patch tries to fetch the parent directory's default
ACL before creating a new file, computes the appropriate create mode to
send to the server, and finally sets the new file's access and default
acl appropriately.
Many thanks to Buck Huppmann <buchk@pobox.com> for sending the initial
version of this patch, as well as for arguing why we need this change.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This adds acl support fo nfs clients via the NFSACL protocol extension, by
implementing the getxattr, listxattr, setxattr, and removexattr iops for the
system.posix_acl_access and system.posix_acl_default attributes. This patch
implements a dumb version that uses no caching (and thus adds some overhead).
(Another patch in this patchset adds caching as well.)
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This adds functions for encoding and decoding POSIX ACLs for the NFSACL
protocol extension, and the GETACL and SETACL RPCs. The implementation is
compatible with NFSACL in Solaris.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFS and NFSACL programs run on the same RPC transport. This patch adds
support for this by converting svc_program into a chained list of programs
(server-side).
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add nfs4_acl field to the nfs_inode, and use it to cache acls. Only cache
acls of size up to a page. Also prepare for up to a page of acl data even
when the user doesn't pass in a buffer, as when they want to get the acl
length to decide what size buffer to allocate.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side support for NFSv4 acls: xdr encoding and decoding routines for
writing acls
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Client-side support for NFSv4 acls: xdr encoding and decoding routines for
reading acls
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ACL support will require supporting additional inode operations in v4
(getxattr, setxattr, listxattr). This patch allows different protocol versions
to support different inode operations by adding a file_inode_ops to the
nfs_rpc_ops (to match the existing dir_inode_ops).
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Ensure that we don't create an RPC client without checking that the server
does indeed support the RPC program + version that we are trying to set up.
This enables us to immediately return an error to "mount" if it turns out
that the server is only supporting NFSv2, when we requested NFSv3 or NFSv4.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The patch just changes the order of structure members.
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>