A few trivial Makefile cleanups
- dependencipes in head.o was all wrong - deleted
- CMODEL_CFLAG was not used anywhere
- NEW_GCC was then not used outside sparc64/Makefe - do not export it
- FIXME seems not appropriate - all other put oprofile in drivers-y too
- No reason to do -I. (and it still builds)
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Invoke the desc->handle_irq directly in the top-level dispatch,
just like other sophisticated ports.
This will allow us to decrease the cost of the MSI queue dispatch.
Signed-off-by: David S. Miller <davem@davemloft.net>
Quoting Randy:
"It seems sad that this patch sources Kconfig.marker, a 7-line file,
20-something times. Yes, you (we) don't want to put those 7 lines into
20-something different files, so sourcing is the right thing.
However, what you did for avr32 seems more on the right track to me: make
_one_ Instrumentation support menu that includes PROFILING, OPROFILE, KPROBES,
and MARKERS and then use (source) that in all of the arches."
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.
It took some time to cross-compile it, but hopefully these are all the
printks in arch code.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.
The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The set of functions process_session, task_session, process_group and
task_pgrp is confusing, as the names can be mixed with each other when looking
at the code for a long time.
The proposals are to
* equip the functions that return the integer with _nr suffix to
represent that fact,
* and to make all functions work with task (not process) by making
the common prefix of the same name.
For monotony the routines signal_session() and set_signal_session() are
replaced with task_session_nr() and set_task_session(), especially since they
are only used with the explicit task->signal dereference.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Also of_unregister_driver. These will be shortly also used by the
PowerPC code.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This code has been slowly rotting for about eight years. It's currently
impeding a few SCSI cleanups, and nobody seems to have hardware to test
it any more. I talked to Dave Miller about it, and he agrees we can
delete it. If anyone wants a software FC stack in future, they can
retrieve this driver from git.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Do not use *alloc_bootmem_low*(), because ARCH_LOW_ADDRESS_LIMIT
is 4GB and this results in boot failures if all of the physical
memory in the machine is above 4GB.
Signed-off-by: David S. Miller <davem@davemloft.net>
When the cpu count is high and contention hits an atomic object, the
processors can synchronize such that some cpus continually get knocked
out and cannot complete the atomic update.
So implement an exponential backoff when SMP.
Signed-off-by: David S. Miller <davem@davemloft.net>
All asm/ipc.h files do only #include <asm-generic/ipc.h>.
This patch therefore removes all include/asm-*/ipc.h files and moves the
contents of include/asm-generic/ipc.h to include/linux/ipc.h.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For some time /proc/sys/kernel/core_pattern has been able to set its output
destination as a pipe, allowing a user space helper to receive and
intellegently process a core. This infrastructure however has some
shortcommings which can be enhanced. Specifically:
1) The coredump code in the kernel should ignore RLIMIT_CORE limitation
when core_pattern is a pipe, since file system resources are not being
consumed in this case, unless the user application wishes to save the core,
at which point the app is restricted by usual file system limits and
restrictions.
2) The core_pattern code should be able to parse and pass options to the
user space helper as an argv array. The real core limit of the uid of the
crashing proces should also be passable to the user space helper (since it
is overridden to zero when called).
3) Some miscellaneous bugs need to be cleaned up (specifically the
recognition of a recursive core dump, should the user mode helper itself
crash. Also, the core dump code in the kernel should not wait for the user
mode helper to exit, since the same context is responsible for writing to
the pipe, and a read of the pipe by the user mode helper will result in a
deadlock.
This patch:
Remove the check of RLIMIT_CORE if core_pattern is a pipe. In the event that
core_pattern is a pipe, the entire core will be fed to the user mode helper.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Cc: <martin.pitt@ubuntu.com>
Cc: <wwoods@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It makes more sense to make instrumentation support experimental on a
case-by-case basis.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch single-linked binfmt formats list to usual list_head's. This leads
to one-liners in register_binfmt() and unregister_binfmt(). The downside
is one pointer more in struct linux_binfmt. This is not a problem, since
the set of registered binfmts on typical box is very small -- (ELF +
something distro enabled for you).
Test-booted, played with executable .txt files, modprobe/rmmod binfmt_misc.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 2c941a2040 looks incomplete. The
helper functions like prepare_sg() need to support sg chaining too.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (40 commits)
kbuild: introduce ccflags-y, asflags-y and ldflags-y
kbuild: enable 'make CPPFLAGS=...' to add additional options to CPP
kbuild: enable use of AFLAGS and CFLAGS on commandline
kbuild: enable 'make AFLAGS=...' to add additional options to AS
kbuild: fix AFLAGS use in h8300 and m68knommu
kbuild: check for wrong use of CFLAGS
kbuild: enable 'make CFLAGS=...' to add additional options to CC
kbuild: fix up CFLAGS usage
kbuild: make modpost detect unterminated device id lists
kbuild: call export_report from the Makefile
kbuild: move Kai Germaschewski to CREDITS
kconfig/menuconfig: distinguish between selected-by-another options and comments
kconfig: tristate choices with mixed tristate and boolean values
include/linux/Kbuild: remove duplicate entries
kbuild: kill backward compatibility checks
kbuild: kill EXTRA_ARFLAGS
kbuild: fix documentation in makefiles.txt
kbuild: call make once for all targets when O=.. is used
kbuild: pass -g to assembler under CONFIG_DEBUG_INFO
kbuild: update _shipped files for kconfig syntax cleanup
...
Fix up conflicts in arch/um/sys-{x86_64,i386}/Makefile manually.
Introduce architecture dependent kretprobe blacklists to prohibit users
from inserting return probes on the function in which kprobes can be
inserted but kretprobes can not.
This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
registers "__switch_to" to the blacklist on x86-64, because that mark is to
prohibit user from inserting only kretprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
This patch cleans up them. This is against 2.6.23-rc6-mm1.
- fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
- For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
which returns -EINVAL.
- removed remove_pages() only used in powerpc.
- removed no-op remove_memory() in i386, sh, sparc64, x86_64.
- only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
to return -EINVAL.
Note:
Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
archs if there are requirements and testers.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have had complaints where a threaded application is left in a bad state
after one of it's threads is killed when we hit a VM: out_of_memory
condition.
Killing just one of the process threads can leave the application in a bad
state, whereas killing the entire process group would allow for the
application to restart, or be otherwise handled, and makes it very obvious
that something has gone wrong.
This change allows the entire process group to be taken down, rather
than just the one thread.
Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly
from startup and CPU HOTPLUG functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This updates the sparc64 iommu/pci dma mappers to sg chaining.
Acked-by: David S. Miller <davem@davemloft.net>
Later updated to newer kernel with unified sparc64 iommu sg handling.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The variable AFLAGS is a wellknown variable and the usage by
kbuild may result in unexpected behaviour.
On top of that several people over time has asked for a way to
pass in additional flags to gcc.
This patch replace use of AFLAGS with KBUILD_AFLAGS all over
the tree.
Patch was tested on following architectures:
alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k, s390
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
The variable CFLAGS is a wellknown variable and the usage by
kbuild may result in unexpected behaviour.
On top of that several people over time has asked for a way to
pass in additional flags to gcc.
This patch replace use of CFLAGS with KBUILD_CFLAGS all over the
tree and enabling one to use:
make CFLAGS=...
to specify additional gcc commandline options.
One usecase is when trying to find gcc bugs but other
use cases has been requested too.
Patch was tested on following architectures:
alpha, arm, i386, x86_64, mips, sparc, sparc64, ia64, m68k
Test was simple to do a defconfig build, apply the patch and check
that nothing got rebuild.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
It no longer translates to "real irqs" (aka. INO buckets)
so reflect that by using a simpler name for it.
Signed-off-by: David S. Miller <davem@davemloft.net>
All the users go through virt_irq_to_bucket() and essentially
want to go from a virt_irq to an INO, but we have a way
to do that already via virt_to_real_irq_table[].dev_ino.
This also allows us to kill both virt_to_real_irq() and
virt_irq_to_bucket().
Signed-off-by: David S. Miller <davem@davemloft.net>
We have a place to stick INO information in the
virt_to_real_irq_table[], which is currently only used for VIRQs.
And that is readily accessible from the one __irq_ino() call site.
Signed-off-by: David S. Miller <davem@davemloft.net>
We were simply concatenating the devhandle and devino and using that
as the cookie, which defeats the entire purpose of the VIRQ hypervisor
interfaces.
Now that we use physical addresses for the INO buckets, we can
allocate them dynamically for VIRQs and encode the cookies as
~__pa(bucket). This allows us to test for and decode the cookie with
a simple:
brlz $reg1, 1f
xnor $reg1, %g0, $reg2
sequence.
This works because bit 64 is never set in traditional
INO vectors, and it is also never set in a physical
address. So xnor'ing the physical address of the bucket
always gives us a negative number, and thus a unique
condition we can test cheaply.
Inspired by ideas from Greg Onufer.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we chain IVEC entries using 32-bit "pointers"
because we know that the ivector_table is in the main
kernel image, thus below 4GB.
This uses proper 64-bit pointers instead.
Whilst this bloats up the kernel image size, this sets
the infrastructure necessary to significantly shrink the
kernel size by using physical addresses and dynamically
allocating the ivector table.
Signed-off-by: David S. Miller <davem@davemloft.net>
Some typos led to using %i6/%i7 instead of %l6/%l7 in loads which is
really really bad because those are the frame pointer and return PC.
Based upon a raid5 crash report by Bertrand Joel.
Signed-off-by: David S. Miller <davem@davemloft.net>
This also makes us use the MSI queues correctly.
Each MSI queue is serviced by a normal sun4u/sun4v INO interrupt
handler. This handler runs the MSI queue and dispatches the
virtual interrupts indicated by arriving MSIs in that MSI queue.
All of the common logic is placed in pci_msi.c, with callbacks to
handle the PCI controller specific aspects of the operations.
This common infrastructure will make it much easier to add MSG
support.
Signed-off-by: David S. Miller <davem@davemloft.net>
We no longer initialise the name field of the of_platform_driver, but
use the name field of the embedded device_driver's name field instead.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thanks to Tom Callaway for the excellent bug report and
test case.
sys_ipc() has several problems, most to due with semaphore
call handling:
1) 'err' return should be a 'long'
2) "union semun" is passed in a register on 64-bit compared
to 32-bit which provides it on the stack and therefore
by reference
3) Second and third arguments to SEMCTL are swapped compared
to 32-bit.
Signed-off-by: David S. Miller <davem@davemloft.net>
The name field of of_platform_driver is just copied into the
included device_driver. By not overriding an already initialised
device_driver name, we can convert the drivers over time to stop using
the of_platform_driver name.
Also we were not copying the owner field from of_platform_driver, so do
the same with it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apply a consistent format to vmlinux.lds.
The file is now to some degree readable.
In addition move several labels inside the braces
such that they reflect the actual start address of a section.
Without this the label would not reflect if ld added alignment.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The support code is identical to the hypervisor sun4v stuff,
just replacing the hypervisor calls with register reads and
writes in the Fire controller.
Signed-off-by: David S. Miller <davem@davemloft.net>
* master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] Don't take semaphore in cpufreq_quick_get()
[CPUFREQ] Support different families in fid/did to frequency conversion
[CPUFREQ] cpufreq_stats: misc cpuinit section annotations
[CPUFREQ] implement !CONFIG_CPU_FREQ stub for cpufreq_unregister_notifier()
[CPUFREQ] mark hotplug notifier callback as __cpuinit
[CPUFREQ] Only check for transition latency on problematic governors (kconfig fix)
[CPUFREQ] allow ondemand and conservative cpufreq governors to be used as default
[CPUFREQ] move policy's governor initialisation out of low-level drivers into cpufreq core
[CPUFREQ] Longhaul - Add support for PM133 northbridge
[CPUFREQ] x86: use num_online_nodes to get physical cpus numbers for
This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl
were modified to take a network namespace argument, and
deal with it.
vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.
So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.
For now the ifindex generator is left global.
Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.
At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Check the return value of fork_idle() to catch error.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PCI-E slot on T1000 connects directly to the Fire PCI chip with no
intervening bridges visible in the OBP tree.
Unfortunately the bus numbering of the device in that slot is
different (2) from the PCI host controller (0), and thus the
pci_bus_{read,write}_config_*() calls don't work out.
Complicating things further the Fire PCI controller has no config
space it responds to either.
For now treat this case specially so that devices in the slot work.
Longer term we need to perhaps cons up a dummy bridge between the Fire
and the PCI-E slot so that the bus hierarchy is complete inside of the
kernel and thus the bus numbering all works out right.
Signed-off-by: David S. Miller <davem@davemloft.net>
We should only use ports underneath "domain-services", other DS ports
in the MDESC aren't for us to use.
Signed-off-by: David S. Miller <davem@davemloft.net>
It doesn't matter for use in 64-bit objects, but when used in
32-bit environments the top 32-bits of the local and in
registers will get chopped off on the next register window
spill/restore which leads to difficult to track down and
subtle bugs.
Signed-off-by: David S. Miller <davem@davemloft.net>
For the case where the source is not aligned modulo 8
we don't use load-twins to suck the data in and this
kills performance since normal loads allocate in the
L1 cache (unlike load-twin) and thus big memcpys swipe
the entire L1 D-cache.
We need to allocate a register window to implement this
properly, but that actually simplifies a lot of things
as a nice side-effect.
Signed-off-by: David S. Miller <davem@davemloft.net>
argv and envp are pointers to u32's in userspace, so don't
try to put_user() a NULL to them.
Aparently gcc-4.2.x now warns about this, and since we use
-Werror for arch/sparc64 code, this breaks the build.
Signed-off-by: David S. Miller <davem@davemloft.net>
If of_get_property() fails, it returns NULL and the 'len'
parameter is undefined. So we need to explicitly set len
to zero in such cases.
Noticed by Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
When NR_CPUS is smaller than the cpu probed, let the user
know that the cpu won't be used.
Suggested by Al Viro.
Signed-off-by: David S. Miller <davem@davemloft.net>
As noted by Al Viro, when we try to call prom_set_trap_table()
in the SMP trampoline code we try to take the PROM call spinlock
which doesn't work because the current thread pointer isn't
valid yet and lockdep depends upon that being correct.
Furthermore, we cannot set the current thread pointer register
because it can't be properly dereferenced until we return from
prom_set_trap_table(). Kernel TLB misses only work after that
call.
So do the PROM call to set the trap table directly instead of
going through the OBP library C code, and thus avoid the lock
altogether.
These calls are guarenteed to be serialized fully.
Since there are now no calls to the prom_set_trap_table{_sun4v}()
library functions, they can be deleted.
Signed-off-by: David S. Miller <davem@davemloft.net>
On the root PCI bus, the OBP device tree lists device 3 twice.
Once as 'pm' and once as 'lomp'.
Everything goes downhill from there.
Ignore the second instance to workaround this.
Thanks to Kövedi_Krisztián for the bug report and
testing the fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
For hugepage mappings, the file offset, like the address and size, needs to
be aligned to the size of a hugepage.
In commit 68589bc353, the check for this was
moved into prepare_hugepage_range() along with the address and size checks.
But since BenH's rework of the get_unmapped_area() paths leading up to
commit 4b1d89290b, prepare_hugepage_range()
is only called for MAP_FIXED mappings, not for other mappings. This means
we're no longer ever checking for an aligned offset - I've confirmed that
mmap() will (apparently) succeed with a misaligned offset on both powerpc
and i386 at least.
This patch restores the check, removing it from prepare_hugepage_range()
and putting it back into hugetlbfs_file_mmap(). I'm putting it there,
rather than in the get_unmapped_area() path so it only needs to go in one
place, than separately in the half-dozen or so arch-specific
implementations of hugetlb_get_unmapped_area().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1) sun4{u,v}_build_msi() have improper return value handling.
We should always return negative error codes, instead of
using the magic value "0" which could in fact be a valid
MSI number.
2) sun4{u,v}_build_msi() should return -ENOMEM instead of
calling prom_prom() halt with kzalloc() of the interrupt
data fails.
3) We 'remembered' the MSI number using a singleton in the
struct device archdata area, this doesn't work for MSI-X
which can cause multiple MSIs assosciated with one device.
Delete that archdata member, and instead store the MSI
number in the IRQ chip data area.
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes we were using 32-bit values and the top bits were
getting inadvertantly chopped off. This will matter for the
forthcoming Fire controller MSI support.
Signed-off-by: David S. Miller <davem@davemloft.net>
This register is not a part of the sun4v architecture.
Niagara 1 and 2 happened to leave it around.
Signed-off-by: David S. Miller <davem@davemloft.net>
Like the OF device tree, it's useful to let userland get
at the machine description so it can pretty print the
graph etc.
The implementation is a simple MISC device with a read method.
Signed-off-by: David S. Miller <davem@davemloft.net>
Every time a cpu is added via hotplug, we allocate the per-cpu MONDO
queues but we never free them up. Freeing isn't easy since the first
cpu gets this memory from bootmem.
Therefore, the simplest thing to do to fix this bug is to allocate the
queues for all possible cpus at boot time.
Signed-off-by: David S. Miller <davem@davemloft.net>
Check the cpu type in the OBP device tree before committing to
using the optimized Niagara memcpy and memset implementation.
If we don't recognize the cpu type, use a completely generic
version.
Signed-off-by: David S. Miller <davem@davemloft.net>
It didn't handle that case at all, and now dump_stack()
can be implemented directly as show_stack(current, NULL)
Signed-off-by: David S. Miller <davem@davemloft.net>
Fully unify all of the DMA ops so that subordinate bus types to
the DMA operation providers (such as ebus, isa, of_device) can
work transparently.
Basically, we just make sure that for every system device we
create, the dev->archdata 'iommu' and 'stc' fields are filled
in.
Then we have two platform variants of the DMA ops, one for SUN4U which
actually programs the real hardware, and one for SUN4V which makes
hypervisor calls.
This also fixes the crashes in parport_pc on sparc64, reported by
Meelis Roos.
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't provide fake PCI config space for sun4u.
Also, put back the funny host controller space handling that
at least Sabre needs. You have to read PCI host controller
registers at their nature size otherwise you get zeros instead
of correct values.
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid
confusion (among other things, with CONFIG_SUSPEND introduced in the
next patch).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can't mark the whole thing init because there are dependencies
in bootloaders that assume that _start, or whatever the image
entry value, is 2 instructions before the "HdrS" signature.
In fact, TILO assumes this entry is always at 0x4000, yikes!
Also, right after the bootloader info area there are OBP strings and
values that get used later in the boot process, and those are not all
provably .init yet.
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes boot failures when the build-id LD option is
actually used, because without it we end up with multiple
PT_LOAD sections which the SILO boot loader cannot handle.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: ERROR: "sys_ioctl" [arch/sparc64/solaris/solaris.ko] undefined!
[SPARC32]: Make PAGE_SHARED a read-mostly variable.
[SPARC32]: Take enable_irq/disable_irq out of line.
[SPARC32]: clean include/asm-sparc/irq.h
[SPARC32]: Fix rounding errors in ndelay/udelay implementation.
From: Christoph Hellwig <hch@infradead.org>
On Fri, Jul 20, 2007 at 09:24:42AM -0400, Horst H. von Brand wrote:
> When building v2.6.22-3478-g275afca on sparc64 (.config attached) I get:
>
> MODPOST vmlinux
> Building modules, stage 2.
> MODPOST 463 modules
> ERROR: "sys_ioctl" [arch/sparc64/solaris/solaris.ko] undefined!
Sorry, my fault.
It looked to me like sparc64 exports sys_ioctl on it's own, but it
only exports compat_sys_ioctl on it's own.
Signed-off-by: David S. Miller <davem@davemloft.net>
i386 and sparc64 have the identical code to update the cmos clock. Move it
into kernel/time/ntp.c as there are other architectures coming along with the
same requirements.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to make sure, that the clockevent devices are resumed, before
the tick is resumed. The current resume logic does not guarantee this.
Add CLOCK_EVT_MODE_RESUME and call the set mode functions of the clock
event devices before resuming the tick / oneshot functionality.
Fixup the existing users.
Thanks to Nigel Cunningham for tracking down a long standing thinko,
which affected the jinxed VAIO.
[akpm@linux-foundation.org: xen build fix]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix following warning:
WARNING: vmlinux.o(.text+0x35264): Section mismatch: reference to .init.text:__alloc_bootmem (between 'mdesc_bootmem_alloc' and 'mdesc_bootmem_free')
Rename mdesc_mem_ops to *_ops so modpost ignores __init references
and declare mdesc_bootmem_alloc __init since it is only used
during __init context.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following warning:
WARNING: vmlinux.o(.text+0x3cf50): Section mismatch: reference to .init.text:page_in_phys_avail (between 'pci_sun4v_pbm_init' and 'sun4v_pci_init')
pci_sun4v_pbm_init and sun4v_pci_init was only used under __init
context so declare them _init.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing sparc64 mini_rtc driver can handle CMOS based
rtcs trivially with just a few lines of code and the simplifies
things tremendously.
Tested on SB1500.
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_handle and dev_ino fields don't match up exactly to
the traditional IMAP_IGN and IMAP_INO masks.
So store them away in a table and look them up directly.
Signed-off-by: David S. Miller <davem@davemloft.net>
When booting up a control node it's quite common to
not be able to register several service types.
And likewise on guests at least one or two are going
to not be there.
Signed-off-by: David S. Miller <davem@davemloft.net>
The best scheme to get uniqueness seems to be:
FOO -- If node lacks "id" property
FOO-$(ID) -- If node has "id" but parent lacks "cfg-handle"
FOO-$(ID)-$(CFG_HANDLE) -- If node has both
Signed-off-by: David S. Miller <davem@davemloft.net>
The current scheme works on static interpretation of text names, which
is wrong.
The output-device setting, for example, must be resolved via an alias
or similar to a full path name to the console device.
Paths also contain an optional set of 'options', which starts with a
colon at the end of the path. The option area is used to specify
which of two serial ports ('a' or 'b') the path refers to when a
device node drives multiple ports. 'a' is assumed if the option
specification is missing.
This was caught by the UltraSPARC-T1 simulator. The 'output-device'
property was set to 'ttya' and we didn't pick upon the fact that this
is an OBP alias set to '/virtual-devices/console'. Instead we saw it
as the first serial console device, instead of the hypervisor console.
The infrastructure is now there to take advantage of this to resolve
the console correctly even in multi-head situations in fbcon too.
Thanks to Greg Onufer for the bug report.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/sfr/ofcons:
Create drivers/of/platform.c
Create linux/of_platorm.h
[SPARC/64] Rename some functions like PowerPC
Begin consolidation of of_device.h
Begin to consolidate of_device.c
Consolidate of_find_node_by routines
Consolidate of_get_next_child
Consolidate of_get_parent
Consolidate of_find_property
Consolidate of_device_is_compatible
Start split out of common open firmware code
Split out common parts of prom.h
We try to fetch the CIF entry pointer from %o4, but that
can get clobbered by the early OBP calls. It is saved
in %l7 already, so actually this "mov %o4, %l7" can just
be completely removed with no other changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
The "id" property in vdc-port nodes are not unique, they
are all zero. Therefore assign ID's using the parent's
"cfg-handle" property which will be unique.
Signed-off-by: David S. Miller <davem@davemloft.net>
with the recent renames, we forgot to update the matches for
devspec. This is required to keep udev working and autoload modules.
Signed-off-by: David S. Miller <davem@davemloft.net>
and populate it with the common parts from PowerPC and Sparc[64].
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This is to make the of merge easier. Also rename of_bus_type.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: David S. Miller <davem@davemloft.net>
This moves all the common parts for the Sparc, Sparc64 and PowerPC
of_device.c files into drivers/of/device.c.
Apart from the simple move, Sparc gains of_match_node() and a call to
of_node_put in of_release_dev(). PowerPC gains better recovery if
device_create_file() fails in of_device_register().
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This consolidates the routines of_find_node_by_path, of_find_node_by_name,
of_find_node_by_type and of_find_compatible_device. Again, the comparison
of strings are done differently by Sparc and PowerPC and also these add
read_locks around the iterations.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This adds a read_lock around the child/next accesses on Sparc.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This requires creating dummy of_node_{get,put} routines for sparc and
sparc64. It also adds a read_lock around the parent accesses.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
The only change here is that a readlock is taken while the property list
is being traversed on Sparc where it was not taken previously.
Also, Sparc uses strcasecmp to compare property names while PowerPC
uses strcmp.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
The only difference here is that Sparc uses strncmp to match compatibility
names while PowerPC uses strncasecmp.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
This creates drivers/of/base.c (depending on CONFIG_OF) and puts
the first trivially common bits from the prom.c files into it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Acked-by: David S. Miller <davem@davemloft.net>
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.
This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
per cpu data section contains two types of data. One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus. In the current kernel, these two sets are
not clearely separated out. This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.
One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end. Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.
This patch:
Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
unregister_chrdev() always returns 0. There is no need to check the return
value.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch completes Linus's wish that the fault return codes be made into
bit flags, which I agree makes everything nicer. This requires requires
all handle_mm_fault callers to be modified (possibly the modifications
should go further and do things like fault accounting in handle_mm_fault --
however that would be for another patch).
[akpm@linux-foundation.org: fix alpha build]
[akpm@linux-foundation.org: fix s390 build]
[akpm@linux-foundation.org: fix sparc build]
[akpm@linux-foundation.org: fix sparc64 build]
[akpm@linux-foundation.org: fix ia64 build]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ian Molton <spyro@f2s.com>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Matthew Wilcox <willy@debian.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
Cc: Richard Curnow <rc@rc0.org.uk>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
Cc: Chris Zankel <chris@zankel.net>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Still apparently needs some ARM and PPC loving - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reset the handshake and per-capability state so that when the
link comes back up we'll renegotiate the DS version and then
reregister all of the services.
Signed-off-by: David S. Miller <davem@davemloft.net>
Create and destroy VIO devices in response to MD update events. These
run synchronously inside of the MD update mutex so the VIO layer
doesn't need to do internal locking of any sort.
Signed-off-by: David S. Miller <davem@davemloft.net>
And add dummy handlers for the VIO device layer. These will be filled
in with real code after the vdc, vnet, and ds drivers are reworked to
have simpler dependencies on the VIO device tree.
Signed-off-by: David S. Miller <davem@davemloft.net>
If the kernel OOPSed or BUGed then it probably should be considered as
tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
tainted kernel. This saves a lot of time explaining oddities in the
calltraces.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Added parisc patch from Matthew Wilson -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to make sure the MD update occurs before we try to
process dr-cpu configure requests. MD update and dr-cpu
were being processed by seperate threads so that did not
happen occaisionally.
Fix this by executing all domain services data packets from
a single thread, in order.
This will help simplify some other things as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
When cpu_up() fails, we can discern the most likely cause.
If cpu_present() is false, this means the cpu did not appear
in the MD. If -ENODEV is the error return value, then
the processor did not boot properly into the kernel.
Pass this information back in the dr-cpu response packet.
Signed-off-by: David S. Miller <davem@davemloft.net>
When we hot-plug in new cpus, the core_id and proc_id of existing
cpus can change. So in order to set the cpu groups correctly we
need to clear the maps out completely first.
Signed-off-by: David S. Miller <davem@davemloft.net>
dr-cpu unconfigure requests will walk throught he enabled
IRQs and trigger ->set_affinity so that the going-down
cpu no longer has INOs targetted to it.
Signed-off-by: David S. Miller <davem@davemloft.net>
Take a page from the powerpc folks and just calculate the
delay factor directly.
Since frequency scaling chips use a system-tick register,
the value is going to be the same system-wide.
Signed-off-by: David S. Miller <davem@davemloft.net>
With the move of ldom_startcpu_cpuid() into smp.c some other
things need to follow along:
1) smp.c is not a driver so we can't use "PFX" macro in the
printk calls.
2) smp.c now needs asm/io.h and asm/hvtramp.h, ds.c no longer
does
3) kimage_addr_to_ra() also needs to move into smp.c
While we're here, update copyright info and my email address
in smp.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not select HOTPLUG_CPU from SUN_LDOMS, that causes
HOTPLUG_CPU to be selected even on non-SMP which is
illegal.
Only build hvtramp.o when SMP, just like trampoline.o
Protect dr-cpu code in ds.c with HOTPLUG_CPU.
Likewise move ldom_startcpu_cpuid() to smp.c and protect
it and the call site with SUN_LDOMS && HOTPLUG_CPU.
Signed-off-by: David S. Miller <davem@davemloft.net>
The VIO drivers register themselves unconditionally just
like those of any other bus type, so to avoid crashes
on non-VIO systems we need to always register vio_bus_type.
Signed-off-by: David S. Miller <davem@davemloft.net>
Only adding cpus is supports at the moment, removal
will come next.
When new cpus are configured, the machine description is
updated. When we get the configure request we pass in a
cpu mask of to-be-added cpus to the mdesc CPU node parser
so it only fetches information for those cpus. That code
also proceeds to update the SMT/multi-core scheduling bitmaps.
cpu_up() does all the work and we return the status back
over the DS channel.
CPUs via dr-cpu need to be booted straight out of the
hypervisor, and this requires:
1) A new trampoline mechanism. CPUs are booted straight
out of the hypervisor with MMU disabled and running in
physical addresses with no mappings installed in the TLB.
The new hvtramp.S code sets up the critical cpu state,
installs the locked TLB mappings for the kernel, and
turns the MMU on. It then proceeds to follow the logic
of the existing trampoline.S SMP cpu bringup code.
2) All calls into OBP have to be disallowed when domaining
is enabled. Since cpus boot straight into the kernel from
the hypervisor, OBP has no state about that cpu and therefore
cannot handle being invoked on that cpu.
Luckily it's only a handful of interfaces which can be called
after the OBP device tree is obtained. For example, rebooting,
halting, powering-off, and setting options node variables.
CPU removal support will require some infrastructure changes
here. Namely we'll have to process the requests via a true
kernel thread instead of in a workqueue. workqueues run on
a per-cpu thread, but when unconfiguring we might need to
force the thread to execute on another cpu if the current cpu
is the one being removed. Removal of a cpu also causes the kernel
to destroy that cpu's workqueue running thread.
Another issue on removal is that we may have interrupts still
pointing to the cpu-to-be-removed. So new code will be needed
to walk the active INO list and retarget those cpus as-needed.
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a special domain services capability for setting
variables in the OBP options node. Guests don't have permanent
store for the OBP variables like a normal system, so they are
instead maintained in the LDOM control node or in the SC.
Signed-off-by: David S. Miller <davem@davemloft.net>
Property values cannot be referenced outside of
mdesc_grab()/mdesc_release() pairs. The only major
offender was the VIO bus layer, easily fixed.
Add some commentary to mdesc.h describing these rules.
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we have to be able to handle MD updates, having an in-tree
set of data structures representing the MD objects actually makes
things more painful.
The MD itself is easy to parse, and we can implement the existing
interfaces using direct parsing of the MD binary image.
The MD is now reference counted, so accesses have to now take the
form:
handle = mdesc_grab();
... operations on MD ...
mdesc_release(handle);
The only remaining issue are cases where code holds on to references
to MD property values. mdesc_get_property() returns a direct pointer
to the property value, most cases just pull in the information they
need and discard the pointer, but there are few that use the pointer
directly over a long lifetime. Those will be fixed up in a subsequent
changeset.
A preliminary handler for MD update events from domain services is
there, it is rudimentry but it works and handles all of the reference
counting. It does not check the generation number of the MDs,
and it does not generate a "add/delete" list for notification to
interesting parties about MD changes but that will be forthcoming.
Signed-off-by: David S. Miller <davem@davemloft.net>
All of the interrupts say "LDX RX" and "LDX TX" currently
which is next to useless. Put a device specific prefix
before "RX" and "TX" instead which makes it much more
useful.
Signed-off-by: David S. Miller <davem@davemloft.net>
Besides the existing usage for power-button interrupts, we'll
want to make use of this code for domain-services where the
LDOM manager can send reboot requests to the guest node.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) LDC_MODE_RELIABLE is deprecated an unused by anything, plus
it and LDC_MODE_STREAM were mis-numbered.
2) read_stream() should try to read as much as possible into
the per-LDC stream buffer area, so do not trim the read_nonraw()
length by the caller's size parameter.
3) Send data ACKs when necessary in read_nonraw().
4) In read_nonraw() when we get a pure ACK, advance the RX head
unconditionally past it.
5) Provide the ACKID field in the ldcdgb() packet dump in read_nonraw().
This helps debugging stream mode LDC channel problems.
6) Decrease verbosity of rx_data_wait() so that it is more useful.
A debugging message each loop iteration is too much.
7) In process_data_ack() stop the loop checking when we hit lp->tx_tail
not lp->tx_head.
8) Set the seqid field properly in send_data_nack().
Signed-off-by: David S. Miller <davem@davemloft.net>
This is also a partial workaround for a bug in the LDOM firmware which
double-transmits RX inos during high load. Without this, such an
event causes the kernel to loop forever in the interrupt call chain
ACK'ing but never actually running the IRQ handler (and thus clearing
the interrupt condition in the device).
There is still a bad potential effect when double INOs occur,
not covered by this changeset. Namely, if the INO is already on
the per-cpu INO vector list, we still blindly re-insert it and
thus we can end up losing interrupts already linked in after
it.
We could deal with that by traversing the list before insertion,
but that's too expensive for this edge case.
Signed-off-by: David S. Miller <davem@davemloft.net>
Virtual devices on Sun Logical Domains are built on top
of a virtual channel framework. This, with help of hypervisor
interfaces, provides a link layer protocol with basic
handshaking over which virtual device clients and servers
communicate.
Built on top of this is a VIO device protocol which has it's
own handshaking and message types. At this layer attributes
are exchanged (disk size, network device addresses, etc.)
descriptor rings are registered, and data transfers are
triggers and replied to.
Signed-off-by: David S. Miller <davem@davemloft.net>
The PCI syscalls are built on every architecture except X86, but only
a few have ever hooked them up. Use a new Kconfig symbol to save a
couple of kB on the architectures that have never used the syscalls.
Tested on x86 and ia64 only.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently there are 97 occurrences where drivers need the pci
revision ID. We can do this once for all devices. Even the pci
subsystem needs the revision several times for quirks. The extra
u8 member pads out nicely in the pci_dev struct.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
the SMP load-balancer uses the boot-time migration-cost estimation
code to attempt to improve the quality of balancing. The reason for
this code is that the discrete priority queues do not preserve
the order of scheduling accurately, so the load-balancer skips
tasks that were running on a CPU 'recently'.
this code is fundamental fragile: the boot-time migration cost detector
doesnt really work on systems that had large L3 caches, it caused boot
delays on large systems and the whole cache-hot concept made the
balancing code pretty undeterministic as well.
(and hey, i wrote most of it, so i can say it out loud that it sucks ;-)
under CFS the same purpose of cache affinity can be achieved without
any special cache-hot special-case: tasks are sorted in the 'timeline'
tree and the SMP balancer picks tasks from the left side of the
tree, thus the most cache-cold task is balanced automatically.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We were doing the wrong call to turn them on, and also
when enabling we need to forcefully set the state to IDLE.
Signed-off-by: David S. Miller <davem@davemloft.net>
In pci_determine_mem_io_space(), do not hard code the region sizes.
Instead, use the values given to us in the ranges property.
Thanks goes to Mikael Petterson for the original Xorg failure
bug repoert, and strace dumps from Mikael and Dmitry Artamonow.
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the IDE controller not showing up on Netra-T1
systems.
Just like Simba bridges, some PCI bridges can lack the
'ranges' OBP property. So we handle this similarly to
the existing Simba code:
1) In of_device register address resolving, we push the
translation to the parent.
2) In PCI device scanning, we interrogate the PCI config
space registers of the PCI bus device in order to resolve
the resources, just like the generic Linux PCI probing
code does.
With much help and testing from Fabio, who also reported
the initial problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Fabio Massimo Di Nitto <fabbione@ubuntu.com>
To be consistent with other architectures, include the generic version
of rwsem.h.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to access the 64-bit IRQ IMAP and ICLR registers of bus
controllers 4-bytes in and as a 32-bit register word, since only the
low 32-bits were relevant. This seemed like a good idea at the time.
But the PCI-E controller requires full 8-byte 64-bit access to
these registers, so we switched over to accessing them fully.
SBUS was not adjusted properly, which broke interrupts completely.
Signed-off-by: David S. Miller <davem@davemloft.net>
If we are on hummingbird, bus runs at 66MHZ.
pbm->pci_bus should be setup with the result of pci_scan_one_pbm()
or else we deref NULL pointers in the error interrupt handlers.
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not just sun4v hypervisor platforms that should return true
for this, sun4u with UltraSPARC-IV should return true too.
Signed-off-by: David S. Miller <davem@davemloft.net>
The scheduling domain hierarchy is:
all cpus -->
cpus that share an instruction cache -->
cpus that share an integer execution unit
Signed-off-by: David S. Miller <davem@davemloft.net>
If the system supports hypervisor based statistics, allow them to
be fetched, enabled, and disabled via sysfs.
Enable and disable via the boolean:
/sys/devices/systems/cpu/cpuN/mmustat_enable
Statistic values are provided under:
/sys/devices/systems/cpu/cpuN/mmu_status/
Signed-off-by: David S. Miller <davem@davemloft.net>
Also, use per-cpu data for struct cpu. Calling kmalloc for
each cpu in topology_init() is just plain clumsy.
Signed-off-by: David S. Miller <davem@davemloft.net>
The RO_DATA section were hardcoded to a specific
alignment in include/asm-generic/vmlinux.h.
But for sparc64 this did not match the PAGE_SIZE.
Introduce a new section definition named:
RO_DATA that takes actual alignment as parameter.
RODATA are provided for backward compatibility.
On top of this avoid hardcoding alignment for
sparc64 in reset of the script
Fix is build-tested on sparc64 + x86_64.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Several interfaces were missing and others misnumbered or
improperly documented.
Also, make sure to check the return value when registering
the kernel TSBs with the hypervisor. This helped to find
the 4MB kernel TSB alignment bug fixed in a previous changeset.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) The TSB lookup was not using the correct hash mask.
2) It was not aligned on a boundary equal to it's size,
which is required by the sun4v Hypervisor.
wasn't having it's return value checked, and that bug will be fixed up
as well in a subsequent changeset.
Signed-off-by: David S. Miller <davem@davemloft.net>
It was using an immediate _PAGE_EXEC_4U value in an 'and'
instruction to perform the test. This doesn't work because
the immediate field is signed 13-bit, this the mask being
tested against the PTE was 0x1000 sign-extended to 32-bits
instead of just plain 0x1000.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is bug 8540 on bugzilla.kernel.org
arch/sparc64/time.c contains references to assorted bq4802 stuff if
CONFIG_PCI is not set, and compile fails. I #ifdef'ed out everything
that looks PCI-ish in that file.
Signed-off-by: David S. Miller <davem@davemloft.net>
Cheetah systems can have cpuids as large as 1023, although physical
systems don't have that many cpus.
Only three limitations existed in the kernel preventing arbitrary
NR_CPUS values:
1) dcache dirty cpu state stored in page->flags on
D-cache aliasing platforms. With some build time
calculations and some build-time BUG checks on
page->flags layout, this one was easily solved.
2) The cheetah XCALL delivery code could only handle
a cpumask with up to 32 cpus set. Some simple looping
logic clears that up too.
3) thread_info->cpu was a u8, easily changed to a u16.
There are a few spots in the kernel that still put NR_CPUS
sized arrays on the kernel stack, but that's not a sparc64
specific problem.
Signed-off-by: David S. Miller <davem@davemloft.net>
These messages were very useful when bringing up the
OBP based PCI device scan code, but it's just a lot
of noise every bootup now especially on big machines.
The messages can be re-enabled via 'ofpci_debug=1' on
the kernel command line.
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle arbitrary base and length values as long as they
are multiples of IO_PAGE_SIZE.
Bug found by Arun Kumar Rao.
Signed-off-by: David S. Miller <davem@davemloft.net>
Hypervisor interfaces need to be negotiated in order to use
some API calls reliably. So add a small set of interfaces
to request API versions and query current settings.
This allows us to fix some bugs in the hypervisor console:
1) If we can negotiate API group CORE of at least major 1
minor 1 we can use con_read and con_write which can improve
console performance quite a bit.
2) When we do a console write request, we should hold the
spinlock around the whole request, not a byte at a time.
What would happen is that it's easy for output from
different cpus to get mixed with each other.
3) Use consistent udelay() based polling, udelay(1) each
loop with a limit of 1000 polls to handle stuck hypervisor
console.
Signed-off-by: David S. Miller <davem@davemloft.net>