s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory
model (which is more dynamic than most). Instead, they had proposed to
implement it with an additional path through vm_normal_page(), using a bit in
the pte to determine whether or not the page should be refcounted:
vm_normal_page()
{
...
if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
if (vma->vm_flags & VM_MIXEDMAP) {
#ifdef s390
if (!mixedmap_refcount_pte(pte))
return NULL;
#else
if (!pfn_valid(pfn))
return NULL;
#endif
goto out;
}
...
}
This is fine, however if we are allowed to use a bit in the pte to determine
refcountedness, we can use that to _completely_ replace all the vma based
schemes. So instead of adding more cases to the already complex vma-based
scheme, we can have a clearly seperate and simple pte-based scheme (and get
slightly better code generation in the process):
vm_normal_page()
{
#ifdef s390
if (!mixedmap_refcount_pte(pte))
return NULL;
return pte_page(pte);
#else
...
#endif
}
And finally, we may rather make this concept usable by any architecture rather
than making it s390 only, so implement a new type of pte state for this.
Unfortunately the old vma based code must stay, because some architectures may
not be able to spare pte bits. This makes vm_normal_page a little bit more
ugly than we would like, but the 2 cases are clearly seperate.
So introduce a pte_special pte state, and use it in mm/memory.c. It is
currently a noop for all architectures, so this doesn't actually result in any
compiled code changes to mm/memory.o.
BTW:
I haven't put vm_normal_page() into arch code as-per an earlier suggestion.
The reason is that, regardless of where vm_normal_page is actually
implemented, the *abstraction* is still exactly the same. Also, while it
depends on whether the architecture has pte_special or not, that is the
only two possible cases, and it really isn't an arch specific function --
the role of the arch code should be to provide primitive functions and
accessors with which to build the core code; pte_special does that. We do
not want architectures to know or care about vm_normal_page itself, and
we definitely don't want them being able to invent something new there
out of sight of mm/ code. If we made vm_normal_page an arch function, then
we have to make vm_insert_mixed (next patch) an arch function too. So I
don't think moving it to arch code fundamentally improves any abstractions,
while it does practically make the code more difficult to follow, for both
mm and arch developers, and easier to misuse.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Carsten Otte <cotte@de.ibm.com>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Register structures are defined per SDM.
Add three small routines for kernel:
ia64_ttag, ia64_loadrs, ia64_flushrs
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Implement __fls on all 64-bit archs:
alpha has an implementation of fls64.
Added __fls(x) = fls64(x) - 1.
ia64 has fls, but not __fls.
Added __fls based on code of fls.
mips and powerpc have __ilog2, which is the same as __fls.
Added __fls = __ilog2.
parisc, s390, sh and sparc64:
Include generic __fls.
x86_64 already has __fls.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move XPC and XPNET from arch/ia64/sn/kernel to drivers/misc/sgi-xp.
Signed-off-by: Dean Nelson <dcn@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
There are many notify_die() and almost all take same style with
ia64_mca_spin(). This patch defines macros and replace them all,
to reduce lines and to improve readability.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
[rebased for sched-devel/latest]
- Add a new cpuset file, having levels:
sched_relax_domain_level
- Modify partition_sched_domains() and build_sched_domains()
to take attributes parameter passed from cpuset.
- Fill newidle_idx for node domains which currently unused but
might be required if sched_relax_domain_level become higher.
- We can change the default level by boot option 'relax_domain_level='.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Create a simple macro to always return a pointer to the node_to_cpumask(node)
value. This relies on compiler optimization to remove the extra indirection:
#define node_to_cpumask_ptr(v, node) \
cpumask_t _##v = node_to_cpumask(node), *v = &_##v
For those systems with a large cpumask size, then a true pointer
to the array element can be used:
#define node_to_cpumask_ptr(v, node) \
cpumask_t *v = &(node_to_cpumask_map[node])
A node_to_cpumask_ptr_next() macro is provided to access another
node_to_cpumask value.
The other change is to always include asm-generic/topology.h moving the
ifdef CONFIG_NUMA to this same file.
Note: there are no references to either of these new macros in this patch,
only the definition.
Based on 2.6.25-rc5-mm1
# alpha
Cc: Richard Henderson <rth@twiddle.net>
# fujitsu
Cc: David Howells <dhowells@redhat.com>
# ia64
Cc: Tony Luck <tony.luck@intel.com>
# powerpc
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
# sparc
Cc: David S. Miller <davem@davemloft.net>
Cc: William L. Irwin <wli@holomorphy.com>
# x86
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they (or some user of them) rely
on it dragging in some unrelated header file, but I can't build all
these files, so we'll have to fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* Use ide_default_irq() instead of ide_init_default_irq() in
ide_generic host driver (so the correct IRQ is always set
regardless of CONFIG_PCI / CONFIG_BLK_DEV_IDEPCI).
* Remove no longer needed ide_init_default_irq() macro.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
It is always == '((base) + 0x206)' if CONFIG_IDE_ARCH_OBSOLETE_DEFAULTS=y
and it is not needed otherwise (arm, blackfin, parisc, ppc64, sh, sparc[64]).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add CONFIG_IDE_ARCH_OBSOLETE_DEFAULTS to drivers/ide/Kconfig and use
it instead of defining IDE_ARCH_OBSOLETE_DEFAULTS in <arch/ide.h>.
v2:
* Define ide_default_irq() in ide-probe.c/ns87415.c if not already defined
and drop defining ide_default_irq() for CONFIG_IDE_ARCH_OBSOLETE_DEFAULTS=n.
[ Thanks to Stephen Rothwell and David Miller for noticing the problem. ]
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
There is a NUMA memory configuration issue in 2.6.24:
A 2-node machine of ours has got the following memory layout:
Node 0: 0 - 2 Gbytes
Node 0: 4 - 8 Gbytes
Node 1: 8 - 16 Gbytes
Node 0: 16 - 18 Gbytes
"efi_memmap_init()" merges the three last ranges into one.
"register_active_ranges()" is called as follows:
efi_memmap_walk(register_active_ranges, NULL);
i.e. once for the 4 - 18 Gbytes range. It picks up the node
number from the start address, and registers all the memory for
the node #0.
"register_active_ranges()" should be called as follows to
make sure there is no merged address range at its entry:
efi_memmap_walk(filter_memory, register_active_ranges);
"filter_memory()" is similar to "filter_rsvd_memory()",
but the reserved memory ranges are not filtered out.
Signed-off-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Updates based on the "Intel® Itanium® Architecture Software Developer's Manual
Specification Update October 2007".
http://download.intel.com/design/itanium/specupdt/24869911.pdf
Signed-off-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add kprobe-booster support on ia64.
Kprobe-booster improves the performance of kprobes by eliminating single-step,
where possible. Currently, kprobe-booster is implemented on x86 and x86-64.
This is an ia64 port.
On ia64, kprobe-booster executes a copied bundle directly, instead of single
stepping. Bundles which have B or X unit and which may cause an exception
(including break) are not executed directly. And also, to prevent hitting
break exceptions on the copied bundle, only the hindmost kprobe is executed
directly if several kprobes share a bundle and are placed in different slots.
Note: set_brl_inst() is used for preparing an instruction buffer(it does not
modify any active code), so it does not need any atomic operation.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: bibo,mao <bibo.mao@intel.com>
Cc: Rusty Lynch <rusty.lynch@intel.com>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
when compile 2.6.25-rc8-mm1, below warning happend.
because walk_page_range pass argument as "const struct mm*",
but pgd_offset() receive as "struct mm*".
CC mm/pagewalk.o
mm/pagewalk.c: In function 'walk_page_range':
mm/pagewalk.c:111: warning: passing argument 1 of 'pgd_offset' discards qualifiers from pointer target type
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This attached patch significantly shrinks boot memory allocation on ia64.
It does this by not allocating per_cpu areas for cpus that can never
exist.
In the case where acpi does not have any numa node description of the
cpus, I defaulted to assigning the first 32 round-robin on the known
nodes.. For the !CONFIG_ACPI I used for_each_possible_cpu().
Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
The patch defines kernel parameter "nptcg=". The parameter overrides max number
of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or
SAL PALO.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
According to SDM2.2, Itanium supports multiple outstanding ptc.g instructions.
But current kernel function ia64_global_tlb_purge() uses a spinlock to serialize
ptc.g instructions issued by multiple processors. This serialization might have
scalability issue on a big SMP machine where many processors could purge TLB
in parallel.
The patch fixes this problem by issuing multiple ptc.g instructions in
ia64_global_tlb_purge(). It also adds support for the "PALO" table to get
a platform view of the max number of outstanding ptc.g instructions (which
may be different from the processor view found from PAL_VM_SUMMARY).
PALO specification can be found at: http://www.dig64.org/home/DIG64_PALO_R1_0.pdf
spinaphore implementation by Matthew Wilcox.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This interface provides more flexible functionality for smp
infrastructure ... e.g. KVM frequently needs to operate on
a subset of cpus.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Dynamic TR resource should be managed in the uniform way.
Add two interfaces for kernel:
ia64_itr_entry: Allocate a (pair of) TR for caller.
ia64_ptr_entry: Purge a (pair of ) TR by caller.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Anthony Xu <anthony.xu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Currently include/linux/kvm.h is not considered by make headers_install,
because Kbuild cannot handle " unifdef-$(CONFIG_FOO) += foo.h. This problem
was introduced by
commit fb56dbb31c
Author: Avi Kivity <avi@qumranet.com>
Date: Sun Dec 2 10:50:06 2007 +0200
KVM: Export include/linux/kvm.h only if $ARCH actually supports KVM
Currently, make headers_check barfs due to <asm/kvm.h>, which <linux/kvm.h>
includes, not existing. Rather than add a zillion <asm/kvm.h>s, export kvm.
only if the arch actually supports it.
Signed-off-by: Avi Kivity <avi@qumranet.com>
which makes this an 2.6.25 regression.
One way of solving the issue is to enhance Kbuild, but Avi and David conviced
me, that changing headers_install is not the way to go. This patch changes
the definition for linux/kvm.h to unifdef-y.
If unifdef-y is used for linux/kvm.h "make headers_check" will fail on all
architectures without asm/kvm.h. Therefore, this patch also provides
asm/kvm.h on all architectures.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Avi Kivity <avi@qumranet.com>
Cc: Sam Ravnborg <sam@ravnborg.org
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After we have regset support, we can use CORE_DUMP_USE_REGSET.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is the 64-bit regset implementation under IA64. Basically register
read/write, which is derived from current ptrace register read/write.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
ia64 named their handler kprobes_fault_handler while all other
arches used kprobe_fault_handler. Change the function definition
and header declaration.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Remove all code which does exactly the same thing as ptrace_request().
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] fix ia64 kprobes compilation
[IA64] move gcc_intrin.h from header-y to unifdef-y
[IA64] workaround tiger ia64_sal_get_physical_id_info hang
[IA64] move defconfig to arch/ia64/configs/
[IA64] Fix irq migration in multiple vector domain
[IA64] signal(ia64_ia32): add a signal stack overflow check
[IA64] signal(ia64): add a signal stack overflow check
[IA64] CONFIG_SGI_SN2 - auto select NUMA and ACPI_NUMA
Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant
architectures with kprobes support. This facilitates easy handling of
in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on
kretprobes being present in the kernel.
Thanks to Sam Ravnborg for helping make the patch more lean.
Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the following compile error with a recent gcc:
CC kernel/kprobes.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/kernel/kprobes.c:1066: error: __ksymtab_jprobe_return causes a section type conflict
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
When I submitted 0df29025fd to ad
an #ifdef __KERNEL__ to include/asm-ia64/gcc_intrin.h a few weeks
ago I neglected to move gcc_intrin.h from header-y to unifdef-y.
Thanks to David Woodhouse for pointing this out.
Signed-off-by: Doug Chapman <doug.chapman@hp.com>
Signed-off-by: Jarod Wilson <jwilson@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This fixes regression introduced in 113134fcbc
Intel Tiger platforms hang when calling SAL_GET_PHYSICAL_ID_INFO
instead of properly returning -1 for unimplemented, so add a
version check.
SGI Altix platforms have an incorrect SAL version hard-coded into
their prom -- they encode 2.9, but actually implement 3.2 -- so
fix it up and allow ia64_sal_get_physical_id_info to keep
working.
Signed-off-by: Alex Chiang <achiang@hp.com>
Acked-by: Russ Anderson <rja@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Fix the problem that the following error message is sometimes displayed
at irq migration when vector domain is enabled.
"Unexpected interrupt vector %d on CPU %d is not mapped to any IRQ!"
The cause of this problem is an interrupt is sent to the previous
target CPU after cleaning up vector to irq mapping table. To clean up
vector to irq map on the previous target CPU safty, change the irq
migration in multiple vector domain as follows. The original idea is
from x86 interrupt management code.
- Delay vector to irq table cleanup until the interrupts are sent
to new target CPUs. By this, it is ensured that target CPU is
completely changed on the interrupt controller side.
- Even after the interrupts are sent to new target CPUs, there can
be pended interrupts remaining on the previous target CPU. So we
need to delay clearning up vector to irq table until the pended
interrupt is handled. For this, send IPI to the previous target
CPU with lower priority vector and clean up vector to irq table
in its handler.
This patch affects only to irq migration code with multiple vector
domain is enabled.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This patch implements VIRT_CPU_ACCOUNTING for ia64,
which enable us to use more accurate cpu time accounting.
The VIRT_CPU_ACCOUNTING is an item of kernel config, which s390
and powerpc arch have. By turning this config on, these archs
change the mechanism of cpu time accounting from tick-sampling
based one to state-transition based one.
The state-transition based accounting is done by checking time
(cycle counter in processor) at every state-transition point,
such as entrance/exit of kernel, interrupt, softirq etc.
The difference between point to point is the actual time consumed
during in the state. There is no doubt about that this value is
more accurate than that of tick-sampling based accounting.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Commit bdc807871d broke the build
for this config because the sim_defconfig selects CONFIG_HZ=250
but include/asm-ia64/param.h has an ifdef for the simulator to
force HZ to 32. So we ended up with a kernel/timeconst.h set
for HZ=250 ... which then failed the check for the right HZ
value and died with:
Drop the #ifdef magic from param.h and make force CONFIG_HZ=32
directly for the simulator.
Signed-off-by: Tony Luck <tony.luck@intel.com>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix large MCA bootmem allocation
[IA64] Simplify cpu_idle_wait
[IA64] Synchronize RBS on PTRACE_ATTACH
[IA64] Synchronize kernel RSE to user-space and back
[IA64] Rename TIF_PERFMON_WORK back to TIF_NOTIFY_RESUME
[IA64] Wire up timerfd_{create,settime,gettime} syscalls
When attaching to a stopped process, the RSE must be explicitly
synced to user-space, so the debugger can read the correct values.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
CC: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
This is base kernel patch for ptrace RSE bug. It's basically a backport
from the utrace RSE patch I sent out several weeks ago. please review.
when a thread is stopped (ptraced), debugger might change thread's user
stack (change memory directly), and we must avoid the RSE stored in
kernel to override user stack (user space's RSE is newer than kernel's
in the case). To workaround the issue, we copy kernel RSE to user RSE
before the task is stopped, so user RSE has updated data. we then copy
user RSE to kernel after the task is resummed from traced stop and
kernel will use the newer RSE to return to user.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
CC: Roland McGrath <roland@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Since the RSE synchronization will need a TIF_ flag, but all
work-to-be-done bits are already used, so we have to multiplex
TIF_NOTIFY_RESUME again.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Tony Luck <tony.luck@intel.com>