When new node becomes enable by hot-add, new sysfs file must be created for
new node. So, if new node is enabled by add_memory(), register_one_node() is
called to create it. In addition, I386's arch_register_node() and a part of
register_nodes() of powerpc are consolidated to register_one_node() as a
generic_code().
This is tested by Tiger4(IPF) with node hot-plug emulation.
Signed-off-by: Keiichiro Tokunaga <tokuanga.keiich@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch allows hot-add memory which is not aligned to section.
Now, hot-added memory has to be aligned to section size. Considering big
section sized archs, this is not useful.
When hot-added memory is registerd as iomem resoruce by iomem resource
patch, we can make use of that information to detect valid memory range.
Note: With this, not-aligned memory can be registerd. To allow hot-add
memory with holes, we have to do more work around add_memory().
(It doesn't allows add memory to already existing mem section.)
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Register hot-added memory to iomem_resource. With this, /proc/iomem can
show hot-added memory.
Note: kdump uses /proc/iomem to catch memory range when it is installed.
So, kdump should be re-installed after /proc/iomem change.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add node-hot-add support to add_memory().
node hotadd uses this sequence.
1. allocate pgdat.
2. refresh NODE_DATA()
3. call free_area_init_node() to initialize
4. create sysfs entry
5. add memory (old add_memory())
6. set node online
7. run kswapd for new node.
(8). update zonelist after pages are onlined. (This is already merged in -mm
due to update phase is difference.)
Note:
To make common function as much as possible,
there is 2 changes from v2.
- The old add_memory(), which is defiend by each archs,
is renamed to arch_add_memory(). New add_memory becomes
caller of arch dependent function as a common code.
- This patch changes add_memory()'s interface
From: add_memory(start, end)
TO : add_memory(nid, start, end).
It was cause of similar code that finding node id from
physical address is inside of old add_memory() on each arch.
In addition, acpi memory hotplug driver can find node id easier.
In v2, it must walk DSDT'S _CRS by matching physical address to
get the handle of its memory device, then get _PXM and node id.
Because input is just physical address.
However, in v3, the acpi driver can use handle to get _PXM and node id
for the new memory device. It can pass just node id to add_memory().
Fix interface of arch_add_memory() is in next patche.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Change the name of old add_memory() to arch_add_memory. And use node id to
get pgdat for the node at NODE_DATA().
Note: Powerpc's old add_memory() is defined as __devinit. However,
add_memory() is usually called only after bootup.
I suppose it may be redundant. But, I'm not well known about powerpc.
So, I keep it. (But, __meminit is better at least.)
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In current code, zonelist is considered to be build once, no modification.
But MemoryHotplug can add new zone/pgdat. It must be updated.
This patch modifies build_all_zonelists(). By this, build_all_zonelist() can
reconfig pgdat's zonelists.
To update them safety, this patch use stop_machine_run(). Other cpus don't
touch among updating them by using it.
In old version (V2 of node hotadd), kernel updated them after zone
initialization. But present_page of its new zone is still 0, because
online_page() is not called yet at this time. Build_zonelists() checks
present_pages to find present zone. It was too early. So, I changed it after
online_pages().
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When add_zone() is called against empty zone (not populated zone), we have to
initialize the zone which didn't initialize at boot time. But,
init_currently_empty_zone() may fail due to allocation of wait table. So,
this patch is to catch its error code.
Changes against wait_table is in the next patch.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: Yasunori Goto <y-goto@jp.fujitsu.com>
If hot-added memory's address is smaller than old area, spanned_pages will
not be updated. It must be fixed.
example) Old zone_start_pfn = 0x60000, and spanned_pages = 0x10000
Added new memory's start_pfn = 0x50000, and end_pfn = 0x60000
new spanned_pages will be still 0x10000 by old code.
(It should be updated to 0x20000.) Because old_zone_end_pfn will be
0x70000, and end_pfn smaller than it. So, spanned_pages will not be
updated.
In current code, spanned_pages is updated only when end_pfn is updated.
But, it should be updated by subtraction between bigger end_pfn and new
zone_start_pfn.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Based on an older patch from Mike Kravetz <kravetz@us.ibm.com>
We need to have a mem_map for high addresses in order to make fops->no_page
work on spufs mem and register files. So far, we have used the
memory_present() function during early bootup, but that did not work when
CONFIG_NUMA was enabled.
We now use the __add_pages() function to add the mem_map when loading the
spufs module, which is a lot nicer.
Signed-off-by: Arnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When pages are onlined, not only zone->present_pages but also
pgdat->node_present_pages should be refreshed.
This parameter is used to show information at
/sys/device/system/node/nodeX/meminfo via si_meminfo_node().
So, it shows strange value for MemUsed which is calculated
(node_present_pages - all zones free pages).
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
__add_section defines an unused pointer to the zones pgdat. Remove this
definition. This fixes a compile warning.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The calculation for node_spanned_pages at grow_pgdat_span() is clearly
wrong. This is patch for it.
(Please see grow_zone_span() to compare. It is correct.)
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Acked-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
> I found the tests does not work well with Dave's patchset.
> I've found the followings:
>
> - setup_per_zone_pages_min() calls should be added in
> capture_page_range() and online_pages()
> - lru_add_drain() should be called before try_to_migrate_pages()
The following patch deals with the first item.
Signed-off-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This basically keeps up from having to extern __kmalloc_section_memmap().
The vaddr_in_vmalloc_area() helper could go in a vmalloc header, but that
header gets hard to work with, because it needs some arch-specific macros.
Just stick it in here for now, instead of creating another header.
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Lion Vollnhals <webmaster@schiggl.de>
Signed-off-by: Jiri Slaby <xslaby@fi.muni.cz>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds generic memory add/remove and supporting functions for memory
hotplug into a new file as well as a memory hotplug kernel config option.
Individual architecture patches will follow.
For now, disable memory hotplug when swsusp is enabled. There's a lot of
churn there right now. We'll fix it up properly once it calms down.
Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>