643b52b9c0
We shrink a radix tree when its root node has only one child, in the left most slot. The child becomes the new root node. To perform this operation in a manner compatible with concurrent lockless lookups, we atomically switch the root pointer from the parent to its child. However a concurrent lockless lookup may now have loaded a pointer to the parent (and is presently deciding what to do next). For this reason, we also have to keep the parent node in a valid state after shrinking the tree, until the next RCU grace period -- otherwise this lookup with the parent pointer may not do the right thing. Notably, we need to keep the child in the left most slot there in case that is requested by the lookup. This is all pretty standard RCU stuff. It is worth repeating because in my eagerness to obey the radix tree node constructor scheme, I had broken it by zeroing the radix tree node before the grace period. What could happen is that a lookup can load the parent pointer, then decide it wants to follow the left most child slot, only to find the slot contained NULL due to the concurrent shrinker having zeroed the parent node before waiting for a grace period. The lookup would return a false negative as a result. Fix it by doing that clearing in the RCU callback. I would normally want to rip out the constructor entirely, but radix tree nodes are one of those places where they make sense (only few cachelines will be touched soon after allocation). This was never actually found in any lockless pagecache testing or by the test harness, but by seeing the odd problem with my scalable vmap rewrite. I have not tickled the test harness into reproducing it yet, but I'll keep working at it. Fortunately, it is not a problem anywhere lockless pagecache is used in mainline kernels (pagecache probe is not a guarantee, and brd does not have concurrent lookups and deletes). Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
---|---|---|
.. | ||
lzo | ||
reed_solomon | ||
zlib_deflate | ||
zlib_inflate | ||
.gitignore | ||
argv_split.c | ||
audit.c | ||
bitmap.c | ||
bitrev.c | ||
bug.c | ||
bust_spinlocks.c | ||
check_signature.c | ||
cmdline.c | ||
cpumask.c | ||
crc-ccitt.c | ||
crc-itu-t.c | ||
crc7.c | ||
crc16.c | ||
crc32.c | ||
crc32defs.h | ||
ctype.c | ||
debug_locks.c | ||
debugobjects.c | ||
dec_and_lock.c | ||
devres.c | ||
div64.c | ||
dump_stack.c | ||
extable.c | ||
fault-inject.c | ||
find_next_bit.c | ||
gen_crc32table.c | ||
genalloc.c | ||
halfmd4.c | ||
hexdump.c | ||
hweight.c | ||
idr.c | ||
inflate.c | ||
int_sqrt.c | ||
iomap.c | ||
iomap_copy.c | ||
iommu-helper.c | ||
ioremap.c | ||
irq_regs.c | ||
kasprintf.c | ||
Kconfig | ||
Kconfig.debug | ||
Kconfig.kgdb | ||
kernel_lock.c | ||
klist.c | ||
kobject.c | ||
kobject_uevent.c | ||
kref.c | ||
libcrc32c.c | ||
list_debug.c | ||
lmb.c | ||
locking-selftest-hardirq.h | ||
locking-selftest-mutex.h | ||
locking-selftest-rlock-hardirq.h | ||
locking-selftest-rlock-softirq.h | ||
locking-selftest-rlock.h | ||
locking-selftest-rsem.h | ||
locking-selftest-softirq.h | ||
locking-selftest-spin-hardirq.h | ||
locking-selftest-spin-softirq.h | ||
locking-selftest-spin.h | ||
locking-selftest-wlock-hardirq.h | ||
locking-selftest-wlock-softirq.h | ||
locking-selftest-wlock.h | ||
locking-selftest-wsem.h | ||
locking-selftest.c | ||
Makefile | ||
parser.c | ||
percpu_counter.c | ||
plist.c | ||
prio_heap.c | ||
prio_tree.c | ||
proportions.c | ||
radix-tree.c | ||
random32.c | ||
ratelimit.c | ||
rbtree.c | ||
reciprocal_div.c | ||
rwsem-spinlock.c | ||
rwsem.c | ||
scatterlist.c | ||
sha1.c | ||
smp_processor_id.c | ||
sort.c | ||
spinlock_debug.c | ||
string.c | ||
swiotlb.c | ||
textsearch.c | ||
ts_bm.c | ||
ts_fsm.c | ||
ts_kmp.c | ||
vsprintf.c |