Commit graph

1018 commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo
88f964db6e [DCCP]: Introduce CCID getsockopt for the CCIDs
Allocation for the optnames is similar to the DCCP options, with a
range for rx and tx half connection CCIDs.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18 00:19:32 -07:00
Arnaldo Carvalho de Melo
561713cf47 [DCCP]: Don't use necessarily the same CCID for tx and rx
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18 00:18:52 -07:00
Arnaldo Carvalho de Melo
65299d6c3c [CCID3]: Introduce include/linux/tfrc.h
Moving the TFRC sender and receiver variables to separate structs, so
that we can copy these structs to userspace thru getsockopt,
dccp_diag, etc.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18 00:18:32 -07:00
Arnaldo Carvalho de Melo
ae31c3399d [DCCP]: Move the ack vector code to net/dccp/ackvec.[ch]
Isolating it, that will be used when we introduce a CCID2 (TCP-Like)
implementation.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-18 00:17:51 -07:00
David S. Miller
21f130a237 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-09-18 00:17:10 -07:00
Andrew Morton
f647e08a55 [PATCH] joystick-vs-x.org fix
Fix http://bugzilla.kernel.org/show_bug.cgi?id=5241

2.6.13 broke compilation of the xorg tree, which apprarently insists on
including that file.

Cc: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
Jean Delvare
8ac2120d90 [PATCH] i2c: kill an unused i2c_adapter struct member
Kill an unused member of the i2c_adapter structure.  This additionally
fixes a potential bug, because <linux/i2c.h> doesn't include
<linux/config.h>, so different files including <linux/i2c.h> could see a
different definition of the i2c_adapter structure, depending on them
including <linux/config.h> (or other header files themselves including
<linux/config.h>) before <linux/i2c.h>, or not.

Credits go to Jörn Engel for pointing me to the problem.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-17 11:50:02 -07:00
David S. Miller
1cbf07478b [TG3]: Add AMD K8 to list of write-reorder chipsets.
Thanks to Andy Stewart for the report and testing
debug patches from Michael Chan.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-16 16:59:20 -07:00
Arnaldo Carvalho de Melo
67e6b62921 [DCCP]: Introduce DCCP_SOCKOPT_SERVICE
As discussed in the dccp@vger mailing list:

Now applications have to use setsockopt(DCCP_SOCKOPT_SERVICE, service[s]),
prior to calling listen() and connect().

An array of unsigned ints can be passed meaning that the listening sock accepts
connection requests for several services.

With this we can ditch struct sockaddr_dccp and use only sockaddr_in (and
sockaddr_in6 in the future).

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-16 16:58:40 -07:00
Karsten Keil
06168d8a10 [PATCH] cleanup whitespace in pci_ids.h
Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-16 10:46:28 -07:00
Karsten Keil
a063cf5b7d [PATCH] Add PCI IDs for Sitecom DC-105
Sitecom DC-105 PCI work with hfc_pci HiSax driver

Signed-off-by: Karsten Keil <kkeil@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-16 10:46:28 -07:00
David S. Miller
20ae975dfd [NETLINK]: Reserve a slot for NETLINK_GENERIC.
As requested by Jamal.

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-14 20:52:37 -07:00
Roland Dreier
8b7fc4214b [PATCH] add PCI IDs so RME32 and RME96 drivers build
While doing an allyesconfig build, I noticed that the commit

    commit 8cdfd2519c
    Author: Takashi Iwai <tiwai@suse.de>
    Date:   Wed Sep 7 14:08:11 2005 +0200

        [ALSA] Remove superfluous PCI ID definitions

broke the RME32 and RME96 drivers, since the PCI IDs they use seem to have
changed names.  Here's a patch to fix this -- compile tested only, since I
have no idea what the hardware even is.

Fix the build of the RME32 and RME96 drivers by having them use the
PCI_DEVICE_ID_RME_xxx names defined in <linux/pci_ids.h> instead of the
PCI_DEVICE_ID_xxx names that they used to define themselves.

Also fix the typo in the id PCI_DEVICE_IDRME__DIGI96_8_PAD_OR_PST so the
name is PCI_DEVICE_ID_RME_DIGI96_8_PAD_OR_PST.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-14 14:34:17 -07:00
Linus Torvalds
ddbf9ef385 Merge master.kernel.org:/pub/scm/linux/kernel/git/chrisw/lsm-2.6 2005-09-13 09:48:54 -07:00
Linus Torvalds
5d54e69c68 Merge master.kernel.org:/pub/scm/linux/kernel/git/dwmw2/audit-2.6 2005-09-13 09:47:30 -07:00
Linus Torvalds
63f3d1df1a Merge master.kernel.org:/pub/scm/linux/kernel/git/perex/alsa-current 2005-09-13 09:46:22 -07:00
Jan Beulich
2f4516dbd0 [PATCH] fbcon: constify font data
const-ify the font control structures and data, to make somewhat better
guarantees that these are not modified anywhere in the kernel.
Specifically for a kernel debugger to share this information from the
normal kernel code, such a guarantee seems rather desirable.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: "Antonino A. Daplas" <adaplas@hotpop.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:32 -07:00
Mauro Carvalho Chehab
9db455064d [PATCH] v4l: experimental Sliced VBI API support
Adds all defines, ioctls and structs needed for the sliced VBI API

VBI = Vertical Blank Interval.

It is related with the way TV signals work.  It sends a line, then, it has a
retrace time to allow the tube to move electrons to the beginning of the next
line.  This was the main reason at the beginning of analog B&W TV.

There is a lot of bandwidth lost on VBI.  So, lots of TV systems use it to
send other information such as Closed Captions and Teletext.  Also,
broadcasters uses this as a channel to exchange information from the content
producer to their subsidiaries at each city.

There's already a raw VBI interface on V4L2 api, used for Closed Captions and
Teletext.  The decoding is doing at userlevel space and it is mostly for
analog TV signals, non encoded.

Encoded signals (MPEG, for example), may need also to transmit other
information (like, for example, display aspect, i.e.  4x3, widescreen...).
Sliced VBI interface is a method to allow the video stream to transmit this
kind of information.

Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:32 -07:00
Neil Brown
f2327d9adb [PATCH] nfsd4: move replay_owner
It seems more natural to move the setting of the replay_owner into the
relevant procedure instead of doing it in nfsv4_proc_compound.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:31 -07:00
Peter Osterlund
610827dee8 [PATCH] pktcdvd: BUG_ON cleanups
Remove some redundant BUG_ON() statements in pktcdvd and move one run-time
check to compile-time.

Signed-off-by: Peter Osterlund <petero2@telia.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:31 -07:00
Mike Miller
9dc7a86e85 [PATCH] cciss: new controller pci/subsystem ids
This patch adds new PCI and subsystem ID's that finally made the spec.  It
also include a name change for one controller.  I know there's a lot of
duplicat names but the fw folks wanted this for the different implementations.

Even though the same ASIC is used it may be embedded on some platforms,
standup card in others, and a mezzanine in other servers.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:29 -07:00
Andrew Morton
498d0c5711 [PATCH] set_current_state() commentary
Explain the mysteries of set_current_state().

Quoth Linus:

 The scheduler itself never needs the memory barrier at all.

 The barrier is needed only if the user itself ends up testing some other
 thing afterwards, ie if you have

 	set_process_state(TASK_INTERRUPTIBLE);
 	if (still_need_to_sleep())
 		schedule();

 then the "still_need_to_sleep()" thing may test flags and wakeup events,
 and then you _may_ want to (and often do) make sure that the write of
 TASK_INTERRUPTIBLE is serialized wrt the reads of any wakeup data (since
 the wakeup may have happened on another CPU).

 So the comment is somewhat wrong. We don't really _care_ whether the state
 propagates out to other CPU's since all of our actions are purely local,
 and there is nothing we do that is conditional on any other CPU: we're
 going to sleep unconditionally, and the scheduler only cares about _our_
 state, not about somebody elses state.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:29 -07:00
Andi Kleen
921717a2a1 [PATCH] Make BUILD_BUG_ON fail at compile time.
Force a compiler error instead of a link error, because they are easier to
track down.  Idea stolen from code by Jan Beulich <jbeulich@novell.com>

If the argument to BUILD_BUG_ON evaluates to non-zero the compiler will do:

	t.c:6: error: size of array `type name' is negative

(surprised that gcc doesn't have an extension for this)

Signed-off-by: "Andi Kleen" <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-13 08:22:28 -07:00
Linus Torvalds
61b22e693e Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-09-12 15:55:09 -07:00
Ralf Baechle
e21ce8c7c0 [NETROM]: Implement G8PZT Circuit reset for NET/ROM
NET/ROM is lacking a connection reset like TCP's RST flag which at times
may result in a connecting having to slowly timing out instead of just being
reset.  An earlier attempt to reset the connection by sending a
NR_CONNACK | NR_CHOKE_FLAG transport was inacceptable as it did result in
crashes of BPQ systems.  An alternative approach of introducing a new
transport type 7 (NR_RESET) has be implemented several years ago in
Paula Jayne Dowie G8PZT's Xrouter.

Implement NR_RESET for Linux's NET/ROM but like any messing with the state
engine consider this experimental for now and thus control it by a sysctl
(net.netrom.reset) which for the time being defaults to off.

Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-12 14:27:37 -07:00
Harald Welte
ce441594e9 [PATCH] USB: fix usbdevice_fs header breakage
[USBDEVFS] fix inclusion of <linux/compat.h> to avoud header mess

Without moving the include of compat.h down, userspace programs that use
usbdevice_fs.h end up including half the kernel includes (and eventually
fail to compile).

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2005-09-12 12:23:52 -07:00
Andi Kleen
c47a3167d0 [PATCH] x86-64: Make dmi_find_device for !DMI case inline
Otherwise it will generate warnings and be generated many times.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:50:58 -07:00
Andi Kleen
3f74478b5f [PATCH] x86-64: Some cleanup and optimization to the processor data area.
- Remove unused irqrsp field
- Remove pda->me
- Optimize set_softirq_pending slightly

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 10:49:58 -07:00
Paul Jackson
b3426599af [PATCH] cpuset semaphore depth check optimize
Optimize the deadlock avoidance check on the global cpuset
semaphore cpuset_sem.  Instead of adding a depth counter to the
task struct of each task, rather just two words are enough, one
to store the depth and the other the current cpuset_sem holder.

Thanks to Nikita Danilov for the idea.

Signed-off-by: Paul Jackson <pj@sgi.com>

[ We may want to change this further, but at least it's now
  a totally internal decision to the cpusets code ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 09:16:27 -07:00
Evgeniy Polyakov
f24ec7f6c6 [PATCH] crc16: remove w1 specific comments.
Remove w1 comments from crc16.h and move specific constants into
w1_ds2433.c where they are used.

Replace %d with %zd.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-12 08:48:08 -07:00
Takashi Iwai
676e1a2c1e [ALSA] [PATCH] Add missing sound PCI IDs to pci_ids.h
Added missing PCI IDs for sound drivers to pci_ids.h.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
2005-09-12 16:02:19 +02:00
Evgeniy Polyakov
7672d0b544 [NET]: Add netlink connector.
Kernel connector - new userspace <-> kernel space easy to use
communication module which implements easy to use bidirectional
message bus using netlink as it's backend.  Connector was created to
eliminate complex skb handling both in send and receive message bus
direction.

Connector driver adds possibility to connect various agents using as
one of it's backends netlink based network.  One must register
callback and identifier. When driver receives special netlink message
with appropriate identifier, appropriate callback will be called.

From the userspace point of view it's quite straightforward:

	socket();
	bind();
	send();
	recv();

But if kernelspace want to use full power of such connections, driver
writer must create special sockets, must know about struct sk_buff
handling...  Connector allows any kernelspace agents to use netlink
based networking for inter-process communication in a significantly
easier way:

int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *));
void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask);

struct cb_id
{
	__u32			idx;
	__u32			val;
};

idx and val are unique identifiers which must be registered in
connector.h for in-kernel usage.  void (*callback) (void *) - is a
callback function which will be called when message with above idx.val
will be received by connector core.

Using connector completely hides low-level transport layer from it's
users.

Connector uses new netlink ability to have many groups in one socket.

[ Incorporating many cleanups and fixes by myself and
  Andrew Morton -DaveM ]

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-11 19:15:07 -07:00
Keith Owens
a2a979821b [PATCH] MCA/INIT: scheduler hooks
Scheduler hooks to see/change which process is deemed to be on a cpu.

Signed-off-by: Keith Owens <kaos@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2005-09-11 14:01:30 -07:00
Linus Torvalds
2f79f458d2 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-09-10 17:42:47 -07:00
Linus Torvalds
7f93220b62 Merge master.kernel.org:/pub/scm/linux/kernel/git/dtor/input 2005-09-10 15:54:41 -07:00
David S. Miller
2a04451581 Merge davem@outer-richmond.davemloft.net:src/GIT/net-2.6/ 2005-09-10 11:01:33 -07:00
viro@ZenIV.linux.org.uk
fe08ac3178 [PATCH] __user annotations (scsi/ch)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:16:27 -07:00
Andrew Morton
373016e9e1 [PATCH] time.h: remove ifdefs
Remove these ifdefs - there's no need to have more than one definition of
these multipliers anywhere.

Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:36 -07:00
Nishanth Aravamudan
84f902c090 [PATCH] include: update jiffies/{m,u}secs conversion functions
Clarify the human-time units to jiffies conversion functions by using the
constants in time.h.  This makes many of the subsequent patches direct
copies of the current code.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:36 -07:00
Nishanth Aravamudan
64ed93a268 [PATCH] add schedule_timeout_{,un}interruptible() interfaces
Add schedule_timeout_{,un}interruptible() interfaces so that
schedule_timeout() callers don't have to worry about forgetting to add the
set_current_state() call beforehand.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:36 -07:00
Adrian Bunk
c2d08dade7 [PATCH] include/linux/bio.h: "extern inline" -> "static inline"
"extern inline" doesn't make much sense.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Jens Axboe <axboe@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:35 -07:00
Adrian Bunk
9adeb1b409 [PATCH] "extern inline" -> "static inline"
"extern inline" doesn't make much sense.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:35 -07:00
Adrian Bunk
2befb9e36d [PATCH] include/linux/blkdev.h: "extern inline" -> "static inline"
"extern inline" doesn't make much sense.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:34 -07:00
Victor Fusco
3a11ec5e50 [PATCH] dmapool: Fix "nocast type" warnings
Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:29 -07:00
Victor Fusco
00b61f5192 [PATCH] lib/radix-tree: Fix "nocast type" warnings
Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:28 -07:00
Victor Fusco
b2d550736f [PATCH] mm/slab: fix sparse warnings
Fix the sparse warning "implicit cast to nocast type"

Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:26 -07:00
Adrian Bunk
5ce7852cdf [PATCH] mm/filemap.c: make two functions static
With Nick Piggin <npiggin@suse.de>

Give some things static scope.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:25 -07:00
Ingo Molnar
d79fc0fc66 [PATCH] sched: TASK_NONINTERACTIVE
This patch implements a task state bit (TASK_NONINTERACTIVE), which can be
used by blocking points to mark the task's wait as "non-interactive".  This
does not mean the task will be considered a CPU-hog - the wait will simply
not have an effect on the waiting task's priority - positive or negative
alike.  Right now only pipe_wait() will make use of it, because it's a
common source of not-so-interactive waits (kernel compilation jobs, etc.).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:22 -07:00
Paul Jackson
4247bdc600 [PATCH] cpuset semaphore depth check deadlock fix
The cpusets-formalize-intermediate-gfp_kernel-containment patch
has a deadlock problem.

This patch was part of a set of four patches to make more
extensive use of the cpuset 'mem_exclusive' attribute to
manage kernel GFP_KERNEL memory allocations and to constrain
the out-of-memory (oom) killer.

A task that is changing cpusets in particular ways on a system
when it is very short of free memory could double trip over
the global cpuset_sem semaphore (get the lock and then deadlock
trying to get it again).

The second attempt to get cpuset_sem would be in the routine
cpuset_zone_allowed().  This was discovered by code inspection.
I can not reproduce the problem except with an artifically
hacked kernel and a specialized stress test.

In real life you cannot hit this unless you are manipulating
cpusets, and are very unlikely to hit it unless you are rapidly
modifying cpusets on a memory tight system.  Even then it would
be a rare occurence.

If you did hit it, the task double tripping over cpuset_sem
would deadlock in the kernel, and any other task also trying
to manipulate cpusets would deadlock there too, on cpuset_sem.
Your batch manager would be wedged solid (if it was cpuset
savvy), but classic Unix shells and utilities would work well
enough to reboot the system.

The unusual condition that led to this bug is that unlike most
semaphores, cpuset_sem _can_ be acquired while in the page
allocation code, when __alloc_pages() calls cpuset_zone_allowed.
So it easy to mistakenly perform the following sequence:
  1) task makes system call to alter a cpuset
  2) take cpuset_sem
  3) try to allocate memory
  4) memory allocator, via cpuset_zone_allowed, trys to take cpuset_sem
  5) deadlock

The reason that this is not a serious bug for most users
is that almost all calls to allocate memory don't require
taking cpuset_sem.  Only some code paths off the beaten
track require taking cpuset_sem -- which is good.  Taking
a global semaphore on the main code path for allocating
memory would not scale well.

This patch fixes this deadlock by wrapping the up() and down()
calls on cpuset_sem in kernel/cpuset.c with code that tracks
the nesting depth of the current task on that semaphore, and
only does the real down() if the task doesn't hold the lock
already, and only does the real up() if the nesting depth
(number of unmatched downs) is exactly one.

The previous required use of refresh_mems(), anytime that
the cpuset_sem semaphore was acquired and the code executed
while holding that semaphore might try to allocate memory, is
no longer required.  Two refresh_mems() calls were removed
thanks to this.  This is a good change, as failing to get
all the necessary refresh_mems() calls placed was a primary
source of bugs in this cpuset code.  The only remaining call
to refresh_mems() is made while doing a memory allocation,
if certain task memory placement data needs to be updated
from its cpuset, due to the cpuset having been changed behind
the tasks back.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:21 -07:00
Ingo Molnar
fb1c8f93d8 [PATCH] spinlock consolidation
This patch (written by me and also containing many suggestions of Arjan van
de Ven) does a major cleanup of the spinlock code.  It does the following
things:

 - consolidates and enhances the spinlock/rwlock debugging code

 - simplifies the asm/spinlock.h files

 - encapsulates the raw spinlock type and moves generic spinlock
   features (such as ->break_lock) into the generic code.

 - cleans up the spinlock code hierarchy to get rid of the spaghetti.

Most notably there's now only a single variant of the debugging code,
located in lib/spinlock_debug.c.  (previously we had one SMP debugging
variant per architecture, plus a separate generic one for UP builds)

Also, i've enhanced the rwlock debugging facility, it will now track
write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
All locks have lockup detection now, which will work for both soft and hard
spin/rwlock lockups.

The arch-level include files now only contain the minimally necessary
subset of the spinlock code - all the rest that can be generalized now
lives in the generic headers:

 include/asm-i386/spinlock_types.h       |   16
 include/asm-x86_64/spinlock_types.h     |   16

I have also split up the various spinlock variants into separate files,
making it easier to see which does what. The new layout is:

   SMP                         |  UP
   ----------------------------|-----------------------------------
   asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
   linux/spinlock_types.h      |  linux/spinlock_types.h
   asm/spinlock_smp.h          |  linux/spinlock_up.h
   linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
   linux/spinlock.h            |  linux/spinlock.h

/*
 * here's the role of the various spinlock/rwlock related include files:
 *
 * on SMP builds:
 *
 *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
 *                        initializers
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
 *                        implementations, mostly inline assembly code
 *
 *   (also included on UP-debug builds:)
 *
 *  linux/spinlock_api_smp.h:
 *                        contains the prototypes for the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 *
 * on UP builds:
 *
 *  linux/spinlock_type_up.h:
 *                        contains the generic, simplified UP spinlock type.
 *                        (which is an empty structure on non-debug builds)
 *
 *  linux/spinlock_types.h:
 *                        defines the generic type and initializers
 *
 *  linux/spinlock_up.h:
 *                        contains the __raw_spin_*()/etc. version of UP
 *                        builds. (which are NOPs on non-debug, non-preempt
 *                        builds)
 *
 *   (included on UP-non-debug builds:)
 *
 *  linux/spinlock_api_up.h:
 *                        builds the _spin_*() APIs.
 *
 *  linux/spinlock.h:     builds the final spin_*() APIs.
 */

All SMP and UP architectures are converted by this patch.

arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
be mostly fine.

From: Grant Grundler <grundler@parisc-linux.org>

  Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
  Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
  non-SMP kernels.  That should be trivial to fix up later if necessary.

  I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
  some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
  are well tested and contained entirely inside arch specific code.  I do NOT
  expect any new issues to arise with them.

 If someone does ever need to use debug/metrics with them, then they will
  need to unravel this hairball between spinlocks, atomic ops, and bit ops
  that exist only because parisc has exactly one atomic instruction: LDCW
  (load and clear word).

From: "Luck, Tony" <tony.luck@intel.com>

   ia64 fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjanv@infradead.org>
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Cc: Matthew Wilcox <willy@debian.org>
Signed-off-by: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Mikael Pettersson <mikpe@csd.uu.se>
Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-10 10:06:21 -07:00