PowerPC currently doesn't implement pci_set_dma_mask(), which means drivers
calling it will get the generic version in drivers/pci/pci.c.
The powerpc dma mapping ops include a dma_set_mask() hook, which luckily is
not implemented by anyone - so there is no bug in the fact that the hook
is currently never called.
However in future we'll add implementation(s) of dma_set_mask(), and so we
need pci_set_dma_mask() to call the hook.
To save adding a hook to the dma mapping ops, pci-set_consistent_dma_mask()
simply calls the dma_set_mask() hook and then copies the new mask into
dev.coherenet_dma_mask.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We have multiple calls to has_feature being inlined, but gcc can't
be sure that the store via get_paca() doesn't alias the path to
cur_cpu_spec->feature.
Reorder to put the calls to read_purr and read_spurr adjacent to each
other. To add a sense of consistency, reorder the remaining lines to
perform parallel steps on purr and scaled purr of each line instead of
calculating and then using one value before going on to the next.
In addition, we can tell gcc that no SPURR means no PURR. The test is
completely hidden in the PURR case, and in the !PURR case the second test
is eliminated resulting in the simple register copy in the out-of-line
branch.
Further, gcc sees get_paca()->system_time referenced several times and
allocates a register to address it (shadowing r13) instead of caching its
value. Reading into a local varable saves the shadow of r13 and removes
a potentially duplicate load (between the nested if and its parent).
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If CPU_FTR_PURR is not set, we will never set cpu_purr_data->initialized.
Checking via __get_cpu_var on 64 bit avoids one dependent load compared
to cpu_has_feature in the not-present case, and is always required when
it is present. The code is under CONFIG_VIRT_CPU_ACCOUNTING so 32 bit
will not be affected.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
timer_interrupt() was calculating per_cpu_offset several times, having to
start from the toc because of potential aliasing issues.
Placing both decrementer per_cpu varables in a struct and calculating
the address once with __get_cpu_var results in better code on both 32
and 64 bit.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Use __get_cpu_var(x) instead of per_cpu(x, smp_processor_id()), as it
is optimized on ppc64 to access the current cpu's per-cpu offset directly;
it's local_paca.offset instead of TOC->paca[local_paca->processor_id].offset.
This is the trivial portion, two functions with one use each.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
as its only called from time_init, which is __init.
Also remove unneeded forward declaration.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove exports of __res and cpm_install_handler/cpm_free_handler. Remove
cpm_install_handler/cpm_free_handler from the commproc.h as well. Both
were used for ARCH=ppc and aren't defined for ARCH=powerpc.
CC arch/powerpc/kernel/ppc_ksyms.o
arch/powerpc/kernel/ppc_ksyms.c:180: error: '__res' undeclared here (not in a function)
arch/powerpc/kernel/ppc_ksyms.c:180: warning: type defaults to 'int' in declaration of '__res'
make[1]: *** [arch/powerpc/kernel/ppc_ksyms.o] Error 1
make: *** [arch/powerpc/kernel] Error 2
LD .tmp_vmlinux1
arch/powerpc/kernel/built-in.o:(__ksymtab+0x198): undefined reference to `cpm_free_handler'
arch/powerpc/kernel/built-in.o:(__ksymtab+0x1a0): undefined reference to `cpm_install_handler'
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Jochen Friedrich <jochen@scram.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
isel (Integer Select) is a new user space instruction in the
PowerISA 2.04 spec. Not all processors implement it so lets emulate
to ensure code built with isel will run everywhere.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
This makes the early debug option force the console loglevel
to the max. The early debug option is meant to catch messages very
early in the kernel boot process, in many cases, before the kernel
has a chance to parse the "debug" command line argument. Thus it
makes sense when CONFIG_PPC_EARLY_DEBUG is set, to force the console
log level to the max at boot time.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This adds a variant of of_translate_address that uses the dma-ranges
property instead of "ranges", it's to be used by PCI code in parsing
the dma-ranges property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This removes "volatile" from the MMIO pointer udbg_comport
in udbg_16550.c driver, it's useless and makes checkpatch.pl
complain when adding things to this file.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32 bits PCI code will display a rather scary error message
PCI: Cannot allocate resource region N of device XXX
at boot when the existing setup of a device as left by the
firmware doesn't match the kernel needs and the device needs
to be moved. This is often not an error at all, as the kernel
will generally easily reallocate the device elsewhere.
This changes the message to something less scary and lowers
its level from error to warning.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 32-bit powerpc resource fixup code uses unsigned longs to do the
offsetting of resources which overflows on platforms such as 4xx where
resources can be 64 bits.
This fixes it by using resource_size_t instead.
However, the IO stuff does rely on some 32 bits arithmetic, so we hack
by cropping the result of the fixups for IO resources with a 32 bits
mask.
This isn't the prettiest but should work for now until we change the
32 bits PCI code to do IO mappings like 64 bits does, within a reserved
are of the kernel address space.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This merges the 32-bit and 64-bit implementations of
pci_process_bridge_OF_ranges(). The new function is cleaner than both
the old ones, and supports 64 bits ranges on ppc32 which is necessary
for the 4xx port.
It also adds some better (hopefully) output to the kernel log which
should help diagnose problems and makes better use of existing OF
parsing helpers (avoiding a few bugs of both implementations along
the way).
There are still a few unfortunate ifdef's but there is no way around
these for now at least not until some other bits of the PCI code are
made common.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This defines isa_mem_base on both 32 and 64 bits (it used to be 32 bits
only). This avoids a few ifdef's in later patches and potentially can
allow support for VGA text mode on 64 bits powerpc.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently we hardwire the number of SLBs to 64, but PAPR says we
should use the ibm,slb-size property to obtain the number of SLB
entries. This uses this property instead of assuming 64. If no
property is found, we assume 64 entries as before.
This soft patches the SLB handler, so it shouldn't change performance
at all.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
and it becomes clear that we should use zalloc_maybe_bootmem.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It only needs the iommu_table address. It also makes use of the node
name to print error messages. So just pass it the things it needs.
This reduces the places that know about the pci_dn by one.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 'data' member of proc_ppc64_lparcfg is unused, but the lparcfg
module's init routine allocates 4K for it.
Remove the code which allocates and frees this buffer.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Also use of_unregister_driver to implement of_unregister_platform_driver.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The size of swapper_pg_dir is 8k instead of 4k when using 64-bit PTEs
(CONFIG_PTE_64BIT).
This was reported by Cedric Hombourger <chombourger@gmail.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
The commit fa13a5a1f2 (sched: restore
deterministic CPU accounting on powerpc), unconditionally calls
update_process_tick() in system context. In the deterministic
accounting case this is the correct thing to do. However, in the
non-deterministic accounting case we need to not do this, since doing
this results in the time accounted as hardware irq time being
artificially elevated.
Also this collapses 2 consecutive '#ifdef CONFIG_VIRT_CPU_ACCOUNTING'
checks in time.h into one for neatness.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It is already declared in ppc-pci.h which is included.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
since it's not used outside of arch/powerpc/kernel/pci-common.c.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This cleans up the SMT thread handling, removing some hard coded
assumptions and providing a set of helpers to convert between linux
cpu numbers, thread numbers and cores.
This implementation requires the number of threads per core to be a
power of 2 and identical on all cores in the system, but it's an
implementation detail, not an API requirement and so this limitation
can be lifted in the future if anybody ever needs it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This reverts commit a2b51812a4.
It turns out that this change caused some machines to fail to come
back up when being rebooted, and generated an error in the hypervisor
error log on some machines. The platform architecture (PAPR) is a
little unclear on exactly when the RTAS ibm,os-term function should be
called. Until that is clarified I'm reverting this commit.
Signed-off-by: Paul Mackerras <paulus@samba.org>
If we get no user time and no system time allocated since the last
account_system_vtime, the system to user time ratio estimate can end
up dividing by zero.
This was causing a problem noticed by Balbir Singh.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The rtas_os_term() routine was being called at the wrong time.
The actual rtas call "os-term" will not ever return, and so
calling it from the panic notifier is too early. Instead,
call it from the machine_reset() call.
This splits the rtas_os_term() routine into two: one part to capture
the kernel panic message, invoked during the panic notifier, and
another part that is invoked during machine_reset().
Prior to this patch, the os-term call was never being made,
because panic_timeout was always non-zero. Calling os-term
helps keep the hypervisor happy! We have to keep the hypervisor
happy to avoid service, dump and error reporting problems.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The current VDSO implementation is hardcoded to 128 byte cache blocks,
which are only used on IBM's 64-bit processors.
Convert it to get the cache block sizes out of vdso_data instead,
similar to how the ppc64 in-kernel cache flush does it.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
There are several issues with the rtas_ibm_suspend_me code, which
enables platform-assisted suspension of an LPAR as covered in PAPR
2.2.
1.) rtas_ibm_suspend_me uses on_each_cpu() to invoke
rtas_percpu_suspend_me on all cpus via IPI:
if (on_each_cpu(rtas_percpu_suspend_me, &data, 1, 0))
...
'data' is on the calling task's stack, but rtas_ibm_suspend_me takes
no measures to ensure that all instances of rtas_percpu_suspend_me are
finished accessing 'data' before returning. This can result in the
IPI'd cpus accessing random stack data and getting stuck in H_JOIN.
This is addressed by using an atomic count of workers and a completion
on the stack.
2.) rtas_percpu_suspend_me is needlessly calling H_JOIN in a loop.
The only event that can cause a cpu to return from H_JOIN is an H_PROD
from another cpu or a NMI/system reset. Each cpu need call H_JOIN
only once per suspend operation.
Remove the loop and the now unnecessary 'waiting' state variable.
3.) H_JOIN must be called with MSR[EE] off, but lazy interrupt
disabling may cause the caller of rtas_ibm_suspend_me to call H_JOIN
with it on; the local_irq_disable() in on_each_cpu() is not
sufficient.
Fix this by explicitly saving the MSR and clearing the EE bit before
calling H_JOIN.
4.) H_PROD is being called with the Linux logical cpu number as the
parameter, not the platform interrupt server value. (It's also being
called for all possible cpus, which is harmless, but unnecessary.)
This is fixed by calling H_PROD for each online cpu using
get_hard_smp_processor_id(cpu) for the argument.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The early btext debug wouldn't work on PowerMac when booted from BootX
due to the code looking for the wrong property name.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
These don't need to be seen by everyone on every boot.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The context switch code in the kernel issues a dummy stwcx. to clear the
reservation, as recommended by the architecture. However, some processors
can have issues if this stwcx to address A occurs while the reservation
is already held to a different address B. To avoid this problem, the dummy
stwcx. needs to be paired with a dummy lwarx to the same address.
This adds the dummy lwarx, and creates a cpu feature bit to indicate
which cpus are affected. Tested on mpc8641_hpcn_defconfig in
arch/powerpc; build tested in arch/ppc.
Signed-off-by: Becky Bruce <becky.bruce@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
broken on powerpc, because we end up counting user time twice: once in
timer_interrupt() and once in update_process_times().
This fixes the problem by pulling the code in update_process_times
that updates utime and stime into a separate function called
account_process_tick. If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
there is a version of account_process_tick in kernel/timer.c that
simply accounts a whole tick to either utime or stime as before. If
CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
implement account_process_tick.
This also lets us simplify the s390 code a bit; it means that the s390
timer interrupt can now call update_process_times even when
CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
suitable account_process_tick().
account_process_tick() now takes the task_struct * as an argument.
Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This makes the altivec code in swsusp_32.S depend on CONFIG_ALTIVEC to
avoid build failures for systems that don't have altivec. I'm not sure
whether the code will actually work for other systems, but it was merged
for just ppc32 rather than powermac a very long time ago.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
If the low level MMU hash table insertion returns an error (which
can happen in some rare circumstances when the hypervisor refuses
the insertion of a PTE, typically if you try to access junk via
/dev/mem), the generated signal had an incorrect si_addr value due
to a bug in the assembly, which was loading it as a 32 bits quantity
instead of a 64 bits quantity.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The size passing to memset is wrong.
Signed-off-by Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
An allyesconfig build creates a .text section that is so big that the
.text.init.refok and .fixup sections are too far away for the relocations
to be fixed up correctly. This patch fixes that by linking all the
relevent text sections for each file together.
Suggested by Paul Mackerras.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The decrementer in Book E and 4xx processors interrupts on the
transition from 1 to 0, rather than on the 0 to -1 transition as on
64-bit server and 32-bit "classic" (6xx/7xx/7xxx) processors. At the
moment we subtract 1 from the count of how many decrementer ticks are
required before the next interrupt before putting it into the
decrementer, which is correct for server/classic processors, but could
possibly cause the interrupt to happen too early on Book E and 4xx if
the timebase/decrementer frequency is low.
This fixes the problem by making set_dec subtract 1 from the count for
server and classic processors, instead of having the callers subtract
1. Since set_dec already had a bunch of ifdefs to handle different
processor types, there is no net increase in ugliness. :)
Note that calling set_dec(0) may not generate an interrupt on some
processors. To make sure that decrementer_set_next_event always calls
set_dec with an interval of at least 1 tick, we set min_delta_ns of
the decrementer_clockevent to correspond to 2 ticks (2 rather than 1
to compensate for truncations in the conversions between ticks and
ns).
This also removes a redundant call to set the decrementer to
0x7fffffff - it was already set to that earlier in timer_interrupt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
We had an historical confusion in the kernel between cache line
and cache block size. The former is an implementation detail of
the L1 cache which can be useful for performance optimisations,
the later is the actual size on which the cache control
instructions operate, which can be different.
For some reason, we had a weird hack reading the right property
on powermac and the wrong one on any other 64 bits (32 bits is
unaffected as it only uses the cputable for cache block size
infos at this stage).
This fixes the booting-without-of.txt documentation to mention
the right properties, and fixes the 64 bits initialization code
to look for the block size first, with a fallback to the line
size if the property is missing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The 44x family has an interesting "feature" which is a virtually
tagged instruction cache (yuck !). So far, we haven't dealt with
it properly, which means we've been mostly lucky or people didn't
report the problems, unless people have been running custom patches
in their distro...
This is an attempt at fixing it properly. I chose to do it by
setting a global flag whenever we change a PTE that was previously
marked executable, and flush the entire instruction cache upon
return to user space when that happens.
This is a bit heavy handed, but it's hard to do more fine grained
flushes as the icbi instruction, on those processor, for some very
strange reasons (since the cache is virtually mapped) still requires
a valid TLB entry for reading in the target address space, which
isn't something I want to deal with.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
On 4xx CPUs, the current implementation of flush_tlb_page() uses
a low level _tlbie() assembly function that only works for the
current PID. Thus, invalidations caused by, for example, a COW
fault triggered by get_user_pages() from a different context will
not work properly, causing among other things, gdb breakpoints
to fail.
This patch adds a "pid" argument to _tlbie() on 4xx processors,
and uses it to flush entries in the right context. FSL BookE
also gets the argument but it seems they don't need it (their
tlbivax form ignores the PID when invalidating according to the
document I have).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
PowerPC 440EP(x) 440GR(x) processors have the same PVR values, since
they have identical cores. However, FPU is not supported on GR(x) and
enabling APU instruction broadcast in the CCR0 register (to enable FPU)
may cause unpredictable results. There's no safe way to detect FPU
support at runtime. This patch provides a workarund for the issue.
We use a POWER6 "logical PVR approach". First, we identify all EP(x)
and GR(x) processors as GR(x) ones (which is safe). Then we check
the device tree cpu path. If we have a EP(x) processor entry,
we call identify_cpu again with PVR | 0x8. This bit is always 0
in the real PVR. This way we enable FPU only for 440EP(x).
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
* Convert files to UTF-8.
* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)
* Correct town names (Goettingen -> Göttingen)
* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
This patch adapts the ppc64 code to use the generic parse_crashkernel()
function introduced in the generic patch of that series.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.
It took some time to cross-compile it, but hopefully these are all the
printks in arch code.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds POWERPC specific hooks for scaled time accounting.
POWER6 includes a SPURR register. The SPURR is based off the PURR register
but is scaled based on CPU frequency and issue rates. This gives a more
accurate account of the instructions used per task. The PURR and timebase
will be constant relative to the wall clock, irrespective of the CPU
frequency.
This implementation reads the SPURR register in account_system_vtime which
is only call called on context witch and hard and soft irq entry and exit.
The percentage of user and system time is then estimated using the ratio of
these accounted by the PURR. If the SPURR is not present, the PURR read.
An earlier implementation of this patch read the SPURR whenever the PURR
was read, which included the system call entry and exit path.
Unfortunately this showed a performance regression on lmbench runs, so was
re-implemented.
I've included the lmbench results here when run bare metal on POWER6. 1st
column is the unpatch results. 2nd column is the results using the below
patch and the 3rd is the % diff of these results from the base. 4th and
5th columns are the results and % differnce from the base using the older
patch (SPURR read in syscall entry/exit path).
Base Scaled-Acct SPURR-in-syscall
Result Result % diff Result % diff
Simple syscall: 0.3086 0.3086 0.0000 0.3452 11.8600
Simple read: 0.4591 0.4671 1.7425 0.5044 9.86713
Simple write: 0.4364 0.4366 0.0458 0.4731 8.40971
Simple stat: 2.0055 2.0295 1.1967 2.0669 3.06158
Simple fstat: 0.5962 0.5876 -1.442 0.6368 6.80979
Simple open/close: 3.1283 3.1009 -0.875 3.2088 2.57328
Select on 10 fd's: 0.8554 0.8457 -1.133 0.8667 1.32101
Select on 100 fd's: 3.5292 3.6329 2.9383 3.6664 3.88756
Select on 250 fd's: 7.9097 8.1881 3.5197 8.2242 3.97613
Select on 500 fd's: 15.2659 15.836 3.7357 15.873 3.97814
Select on 10 tcp fd's: 0.9576 0.9416 -1.670 0.9752 1.83792
Select on 100 tcp fd's: 7.248 7.2254 -0.311 7.2685 0.28283
Select on 250 tcp fd's: 17.7742 17.707 -0.375 17.749 -0.1406
Select on 500 tcp fd's: 35.4258 35.25 -0.496 35.286 -0.3929
Signal handler installation: 0.6131 0.6075 -0.913 0.647 5.52927
Signal handler overhead: 2.0919 2.1078 0.7600 2.1831 4.35967
Protection fault: 0.7345 0.7478 1.8107 0.8031 9.33968
Pipe latency: 33.006 16.398 -50.31 33.475 1.42368
AF_UNIX sock stream latency: 14.5093 30.910 113.03 30.715 111.692
Process fork+exit: 219.8 222.8 1.3648 229.37 4.35623
Process fork+execve: 876.14 873.28 -0.32 868.66 -0.8533
Process fork+/bin/sh -c: 2830 2876.5 1.6431 2958 4.52296
File /var/tmp/XXX write bw: 1193497 1195536 0.1708 118657 -0.5799
Pagefaults on /var/tmp/XXX: 3.1272 3.2117 2.7020 3.2521 3.99398
Also, kernel compile times show no difference with this patch applied.
[pbadari@us.ibm.com: Avoid unnecessary PURR reading]
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (24 commits)
[POWERPC] Fix vmemmap warning in init_64.c
[POWERPC] Fix 64 bits vDSO DWARF info for CR register
[POWERPC] Add 1TB workaround for PA6T
[POWERPC] Enable NO_HZ and high res timers for pseries and ppc64 configs
[POWERPC] Quieten cache information at boot
[POWERPC] Quieten clockevent printk
[POWERPC] Enable SLUB in *_defconfig
[POWERPC] Fix 1TB segment detection
[POWERPC] Fix iSeries_hpte_insert prototype
[POWERPC] Fix copyright symbol
[POWERPC] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers
[POWERPC] ibmebus: Add device creation and bus probing based on of_device
[POWERPC] ibmebus: Remove bus match/probe/remove functions
[POWERPC] Move of_device allocation into of_device.[ch]
[POWERPC] mpc52xx: device tree changes for FEC and MDIO
[POWERPC] bestcomm: GenBD task support
[POWERPC] bestcomm: FEC task support
[POWERPC] bestcomm: ATA task support
[POWERPC] bestcomm: core bestcomm support for Freescale MPC5200
[POWERPC] mpc52xx: Update mpc52xx_psc structure with B revision changes
...
All asm/ipc.h files do only #include <asm-generic/ipc.h>.
This patch therefore removes all include/asm-*/ipc.h files and moves the
contents of include/asm-generic/ipc.h to include/linux/ipc.h.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes powerpc64's compat code use the new linux/elfcore-compat.h,
reducing some hand-copied duplication.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Slab constructors currently have a flags parameter that is never used. And
the order of the arguments is opposite to other slab functions. The object
pointer is placed before the kmem_cache pointer.
Convert
ctor(void *object, struct kmem_cache *s, unsigned long flags)
to
ctor(struct kmem_cache *s, void *object)
throughout the kernel
[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update dump_task_altivec() (which has so far never been put to use) so that
it dumps the Altivec/VMX registers (VR[0] - VR[31], VSCR and VRSAVE) in the
same format as the ptrace get_vrregs(), and add the appropriate glue
typedef and #defines to make it work.
A new note type of NT_PPC_VMX was chosen to be 0x100 (arbitrarily) because
it allows the low range values to be used for more generic purposes and
0x100 seems an adequate starting point for PowerPC extensions.
Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@suse.de>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current DWARF info for CR are incorrect, causing the gcc unwinder to
go to lunch if we take a segfault in the vdso. This fixes it.
Problem identified by Andrew Haley, and fix provided by Jakub Jelinek
(thanks !).
Unfortunately, a bug in gcc cause it to not quite work either, but that
is being fixed separately with something around the lines of:
linux-unwind.h:
fs->regs.reg[R_CR2].loc.offset = (long) ®s->ccr - new_cfa;
+ /* CR? regs are just 32-bit and PPC is big-endian. */
+ fs->regs.reg[R_CR2].loc.offset += sizeof (long) - 4;
(According to Jakub)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
PA6T has a bug where the slbie instruction does not honor the large
segment bit. As a result, we have to always use slbia when switching
context.
We don't have to worry about changing the slbie's during fault processing,
since they should never be replacing one VSID with another using the
same ESID. I.e. there's no risk for inserting duplicate entries due to a
failed slbie of the old entry. So as long as we clear it out on context
switch we should be fine.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
After 6 years the ppc64 kernel still thinks its important to tell me my
cache line size is 0x80 bytes. I think most people who care know that by
now. The rest probably cant even understand the hex output.
Since we might have misconfigured firmware or cpus that have a linesize
that isnt 128 bytes, I still print it out for those cases. If people
would prefer to remove it completely, lets do it.
Also for lpar remove the htab_address printout since its not used.
Anton
ppc64 boot log usability expert
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The clockevent bootup message only needs to be KERN_INFO.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Replace struct ibmebus_dev and struct ibmebus_driver with struct of_device
and struct of_platform_driver, respectively. Match the external ibmebus
interface and drivers using it.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The devtree root is now searched for devices matching a built-in whitelist
during boot, so these devices appear on the bus from the beginning. It is
still possible to manually add/remove devices to/from the bus by using the
probe/remove sysfs interface. Also, when a device driver registers itself,
the devtree is matched against its matchlist.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove old code that will be replaced by rewritten and shorter functions in
the next patch. Keep struct ibmebus_dev and struct ibmebus_driver for now,
but replace ibmebus_{,un}register_driver() by dummy functions. This way, the
kernel will still compile and run during the transition and git bisect will
be happy.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Extract generic of_device allocation code from of_platform_device_create()
and move it into of_device.[ch], called of_device_alloc(). Also, there's now
of_device_free() which puts the device node.
This way, bus drivers that build on of_platform (like ibmebus will) can
build upon this code instead of reinventing the wheel.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This cleans up the formatting in the vDSO linker script, mostly just the
use of whitespace. It's intended to approximate the kernel standard
conventions for indenting C, treating elements of the linker script about
like initialized variable definitions.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This cleans up the formatting in the vDSO linker script, mostly just the
use of whitespace. It's intended to approximate the kernel standard
conventions for indenting C, treating elements of the linker script about
like initialized variable definitions.
Signed-off-by: Roland McGrath <roland@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce architecture dependent kretprobe blacklists to prohibit users
from inserting return probes on the function in which kprobes can be
inserted but kretprobes can not.
This patch also removes "__kprobes" mark from "__switch_to" on x86_64 and
registers "__switch_to" to the blacklist on x86-64, because that mark is to
prohibit user from inserting only kretprobe.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly
from startup and CPU HOTPLUG functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Identical handlers of PTRACE_DETACH go into ptrace_request().
Not touching compat code.
Not touching archs that don't call ptrace_request.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This updates the ppc iommu/pci dma mappers to sg chaining. Includes
further fixes from FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This changes the uevent buffer functions to use a struct instead of a
long list of parameters. It does no longer require the caller to do the
proper buffer termination and size accounting, which is currently wrong
in some places. It fixes a known bug where parts of the uevent
environment are overwritten because of wrong index calculations.
Many thanks to Mathieu Desnoyers for finding bugs and improving the
error handling.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (408 commits)
[POWERPC] Add memchr() to the bootwrapper
[POWERPC] Implement logging of unhandled signals
[POWERPC] Add legacy serial support for OPB with flattened device tree
[POWERPC] Use 1TB segments
[POWERPC] XilinxFB: Allow fixed framebuffer base address
[POWERPC] XilinxFB: Add support for custom screen resolution
[POWERPC] XilinxFB: Use pdata to pass around framebuffer parameters
[POWERPC] PCI: Add 64-bit physical address support to setup_indirect_pci
[POWERPC] 4xx: Kilauea defconfig file
[POWERPC] 4xx: Kilauea DTS
[POWERPC] 4xx: Add AMCC Kilauea eval board support to platforms/40x
[POWERPC] 4xx: Add AMCC 405EX support to cputable.c
[POWERPC] Adjust TASK_SIZE on ppc32 systems to 3GB that are capable
[POWERPC] Use PAGE_OFFSET to tell if an address is user/kernel in SW TLB handlers
[POWERPC] 85xx: Enable FP emulation in MPC8560 ADS defconfig
[POWERPC] 85xx: Killed <asm/mpc85xx.h>
[POWERPC] 85xx: Add cpm nodes for 8541/8555 CDS
[POWERPC] 85xx: Convert mpc8560ads to the new CPM binding.
[POWERPC] mpc8272ads: Remove muram from the CPM reg property.
[POWERPC] Make clockevents work on PPC601 processors
...
Fixed up conflict in Documentation/powerpc/booting-without-of.txt manually.
Implement show_unhandled_signals sysctl + support to print when a process
is killed due to unhandled signals just as i386 and x86_64 does.
Default to having it off, unlike x86 that defaults on.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Currently find_legacy_serial_ports() can find no serial ports on the
OPB with flattened device tree. Thus no legacy boot console can be
initialized. Just the early udbg console works, which is initialized
with udbg_init_44x_as1 on the UART's physical address specified in
kernel config. This happens because we look for ns16750 serial
devices only and expect opb node to have a device type property. This
patch makes it look for ns16550-compatible devices and use
of_device_is_compatible() for opb in case device type is not
specified.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This makes the kernel use 1TB segments for all kernel mappings and for
user addresses of 1TB and above, on machines which support them
(currently POWER5+, POWER6 and PA6T).
We detect that the machine supports 1TB segments by looking at the
ibm,processor-segment-sizes property in the device tree.
We don't currently use 1TB segments for user addresses < 1T, since
that would effectively prevent 32-bit processes from using huge pages
unless we also had a way to revert to using 256MB segments. That
would be possible but would involve extra complications (such as
keeping track of which segment size was used when HPTEs were inserted)
and is not addressed here.
Parts of this patch were originally written by Ben Herrenschmidt.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move to using PAGE_OFFSET instead of TASK_SIZE or KERNELBASE value on
6xx/40x/44x/fsl-booke to determine if the faulting address is a kernel or
user space address. This mimics how the macro is_kernel_addr() works.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
In testing the new clocksource and clockevent code on a PPC601
processor, I discovered that the clockevent multiplier value for the
decrementer clockevent was overflowing. Because the RTCL register in
the 601 effectively counts at 1GHz (it doesn't actually, but it
increases by 128 every 128ns), and the shift value was 32, that meant
the multiplier value had to be 2^32, which won't fit in an unsigned
long on 32-bit. The same problem would arise on any platform where
the timebase frequency was 1GHz or more (not that we actually have any
such machines today).
This fixes it by reducing the shift value to 16. Doing the
calculations with a resolution of 2^-16 nanoseconds (15 femtoseconds)
should be quite adequate. :)
Signed-off-by: Paul Mackerras <paulus@samba.org>
On old powermacs, we sometimes set the decrementer to 1 in order to
trigger a decrementer interrupt, which we use to handle an interrupt
that was pending at the time when it was re-enabled. This was causing
the decrementer clock event device to call the event function for the
next event early, which was causing problems when high-res timers were
not enabled.
This fixes the problem by recording the timebase value at which the
next event should occur, and checking the current timebase against the
recorded value in timer_interrupt. If it isn't time for the next
event, it just reprograms the decrementer and returns.
This also subtracts 1 from the value stored into the decrementer,
which is appropriate because the decrementer interrupts on the
transition from 0 to -1, not when the decrementer reaches 0.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some IBM machines supply a "logical" PVR (processor version register)
value in the device tree in the cpu nodes rather than the real PVR.
This is used for instance to indicate that the processors in a POWER6
partition have been configured by the hypervisor to run in POWER5+
mode rather than POWER6 mode. To cope with this, we call identify_cpu
a second time with the logical PVR value (the first call is with the
real PVR value in the very early setup code).
However, POWER5+ machines can also supply a logical PVR value, and use
the same value (the value that indicates a v2.04 architecture
compliant processor). This causes problems for code that uses the
performance monitor (such as oprofile), because the PMU registers are
different in POWER6 (even in POWER5+ mode) from the real POWER5+.
This change works around this problem by taking out the PMU
information from the cputable entries for the logical PVR values, and
changing identify_cpu so that the second call to it won't overwrite
the PMU information that was established by the first call (the one
with the real PVR), but does update the other fields. Specifically,
if the cputable entry for the logical PVR value has num_pmcs == 0,
none of the PMU-related fields get used.
So that we can create a mixed cputable entry, we now make cur_cpu_spec
point to a single static struct cpu_spec, and copy stuff from
cpu_specs[i] into it. This has the side-effect that we can now make
cpu_specs[] be initdata.
Ultimately it would be good to move the PMU-related fields out to a
separate structure, pointed to by the cputable entries, and change
identify_cpu so that it saves the PMU info pointer, copies the whole
structure, and restores the PMU info pointer, rather than identify_cpu
having to list all the fields that are *not* PMU-related.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now we will only have entries in the device tree for the actual existing
devices (including their OS/400 properties). This way viocd.c gets all
the information about the devices from the device tree.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
It was only being used to carry around dma_iommu_ops and vio_iommu_table
which we can use directly instead. This also means that vio_bus_device
doesn't need to refer to them either.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove vio_dma_ops declaration (since it no longer exists) and some
unused fields from struct vio_driver.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This allows platforms which don't have anything to do at setup_arch time
(like a bunch of the 4xx platforms) to eliminate an empty setup_arch hook.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>