android_kernel_motorola_sm6225/kernel
Paul E. McKenney b668c9cf3e rcu: Fix grace-period-stall bug on large systems with CPU hotplug
When the last CPU of a given leaf rcu_node structure goes
offline, all of the tasks queued on that leaf rcu_node structure
(due to having blocked in their current RCU read-side critical
sections) are requeued onto the root rcu_node structure.  This
requeuing is carried out by rcu_preempt_offline_tasks().
However, it is possible that these queued tasks are the only
thing preventing the leaf rcu_node structure from reporting a
quiescent state up the rcu_node hierarchy.  Unfortunately, the
old code would fail to do this reporting, resulting in a
grace-period stall given the following sequence of events:

1.	Kernel built for more than 32 CPUs on 32-bit systems or for more
	than 64 CPUs on 64-bit systems, so that there is more than one
	rcu_node structure.  (Or CONFIG_RCU_FANOUT is artificially set
	to a number smaller than CONFIG_NR_CPUS.)

2.	The kernel is built with CONFIG_TREE_PREEMPT_RCU.

3.	A task running on a CPU associated with a given leaf rcu_node
	structure blocks while in an RCU read-side critical section
	-and- that CPU has not yet passed through a quiescent state
	for the current RCU grace period.  This will cause the task
	to be queued on the leaf rcu_node's blocked_tasks[] array, in
	particular, on the element of this array corresponding to the
	current grace period.

4.	Each of the remaining CPUs corresponding to this same leaf rcu_node
	structure pass through a quiescent state.  However, the task is
	still in its RCU read-side critical section, so these quiescent
	states cannot be reported further up the rcu_node hierarchy.
	Nevertheless, all bits in the leaf rcu_node structure's ->qsmask
	field are now zero.

5.	Each of the remaining CPUs go offline.  (The events in step
	#4 and #5 can happen in any order as long as each CPU passes
	through a quiescent state before going offline.)

6.	When the last CPU goes offline, __rcu_offline_cpu() will invoke
	rcu_preempt_offline_tasks(), which will move the task to the
	root rcu_node structure, but without reporting a quiescent state
	up the rcu_node hierarchy (and this failure to report a quiescent
	state is the bug).

	But because this leaf rcu_node structure's ->qsmask field is
	already zero and its ->block_tasks[] entries are all empty,
	force_quiescent_state() will skip this rcu_node structure.

	Therefore, grace periods are now hung.

This patch abstracts some code out of rcu_read_unlock_special(),
calling the result task_quiet() by analogy with cpu_quiet(), and
invokes task_quiet() from both rcu_read_lock_special() and
__rcu_offline_cpu().  Invoking task_quiet() from
__rcu_offline_cpu() reports the quiescent state up the rcu_node
hierarchy, fixing the bug.  This ends up requiring a separate
lock_class_key per level of the rcu_node hierarchy, which this
patch also provides.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12589088301770-git-send-email->
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-22 18:58:15 +01:00
..
gcov microblaze: Enable GCOV_PROFILE_ALL 2009-09-21 14:29:21 +02:00
irq Merge branch 'irq-threaded-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-09-11 13:21:31 -07:00
power headers: utsname.h redux 2009-09-23 18:13:10 -07:00
time NOHZ: update idle state also when NOHZ is inactive 2009-10-07 13:05:05 +02:00
trace Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-10-08 12:06:09 -07:00
.gitignore
acct.c bsdacct: switch credentials for writing to the accounting file 2009-08-24 11:33:40 +10:00
async.c
audit.c Audit: send signal info if selinux is disabled 2009-09-24 03:50:26 -04:00
audit.h Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
audit_tree.c Fix rule eviction order for AUDIT_DIR 2009-06-24 00:02:38 -04:00
audit_watch.c Audit: reorganize struct audit_watch to save 8 bytes 2009-09-24 03:50:25 -04:00
auditfilter.c Audit: clean up all op= output to include string quoting 2009-06-24 00:00:52 -04:00
auditsc.c Audit: rearrange audit_context to save 16 bytes per struct 2009-09-24 03:50:26 -04:00
backtracetest.c
bounds.c
capability.c
cgroup.c cgroup: catch bad css refcnt at css_put 2009-10-01 16:11:12 -07:00
cgroup_freezer.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
compat.c
configs.c
cpu.c Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-09-15 09:19:38 -07:00
cpuset.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
cred-internals.h
cred.c creds_are_invalid() needs to be exported for use by modules: 2009-09-23 11:02:26 -07:00
delayacct.c headers: taskstats_kern.h trim 2009-09-18 09:48:52 -07:00
dma.c
exec_domain.c
exit.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-10-08 12:16:35 -07:00
extable.c
fork.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-10-08 12:16:35 -07:00
freezer.c sched: fix nr_uninterruptible accounting of frozen tasks really 2009-07-18 14:19:53 +02:00
futex.c futex: Fix spurious wakeup for requeue_pi really 2009-10-28 20:34:34 +01:00
futex_compat.c futex: Fix compat_futex to be same as futex for REQUEUE_PI 2009-08-10 15:41:12 +02:00
groups.c groups: move code to kernel/groups.c 2009-06-16 19:47:48 -07:00
hrtimer.c Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-10-05 12:04:16 -07:00
hung_task.c sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
itimer.c itimers: Add tracepoints for itimer 2009-08-29 14:10:07 +02:00
kallsyms.c kallsyms: use new arch_is_kernel_text() 2009-09-23 07:39:30 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.preempt
kexec.c kexec: fix omitting offset in extended crashkernel syntax 2009-07-29 19:10:34 -07:00
kfifo.c kfifo: Use "const" definitions 2009-09-19 13:13:17 -07:00
kgdb.c
kmod.c Revert "kmod: fix race in usermodehelper code" 2009-09-23 18:12:10 -07:00
kprobes.c const: constify remaining file_operations 2009-10-01 16:11:11 -07:00
ksysfs.c
kthread.c sched: Keep kthreads at default priority 2009-09-09 17:30:06 +02:00
latencytop.c
lockdep.c lockdep: Use cpu_clock() for lockstat 2009-10-09 15:56:44 +02:00
lockdep_internals.h lockdep: BFS cleanup 2009-07-24 10:53:29 +02:00
lockdep_proc.c seq_file: constify seq_operations 2009-09-23 07:39:29 -07:00
lockdep_states.h
Makefile rcu: "Tiny RCU", The Bloatwatch Edition 2009-10-26 09:40:29 +01:00
module.c module: fix up CONFIG_KALLSYMS=n build. 2009-10-01 16:11:11 -07:00
mutex-debug.c
mutex-debug.h
mutex.c Merge branch 'linus' into perfcounters/core 2009-06-11 17:55:42 +02:00
mutex.h
notifier.c
ns_cgroup.c cgroups: let ss->can_attach and ss->attach do whole threadgroups at a time 2009-09-24 07:20:58 -07:00
nsproxy.c nsproxy: extract create_nsproxy() 2009-06-18 13:03:56 -07:00
panic.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-10-08 12:16:35 -07:00
params.c param: allow whitespace as kernel parameter separator 2009-09-25 00:32:58 +09:30
perf_event.c perf_event: Provide vmalloc() based mmap() backing 2009-10-06 14:21:50 +02:00
pid.c mm: also use alloc_large_system_hash() for the PID hash table 2009-09-22 07:17:38 -07:00
pid_namespace.c pidns: deny CLONE_PARENT|CLONE_NEWPID combination 2009-09-24 07:21:04 -07:00
pm_qos_params.c
posix-cpu-timers.c itimers: Add tracepoints for itimer 2009-08-29 14:10:07 +02:00
posix-timers.c time: Introduce CLOCK_REALTIME_COARSE 2009-08-21 21:43:46 +02:00
printk.c printk: add printk_delay to make messages readable for some scenarios 2009-09-23 07:39:28 -07:00
profile.c kernel/profile.c: Switch /proc/irq/prof_cpu_mask to seq_file 2009-09-20 20:15:40 +02:00
ptrace.c ptrace: __ptrace_detach: do __wake_up_parent() if we reap the tracee 2009-09-24 07:20:59 -07:00
rcupdate.c rcu: "Tiny RCU", The Bloatwatch Edition 2009-10-26 09:40:29 +01:00
rcutiny.c rcu: Do tiny cleanups in rcutiny 2009-10-26 09:40:40 +01:00
rcutorture.c rcu: Improve rcutorture diagnostics when bad torture_type specified 2009-10-26 09:40:31 +01:00
rcutree.c rcu: Fix grace-period-stall bug on large systems with CPU hotplug 2009-11-22 18:58:15 +01:00
rcutree.h rcu: Fix grace-period-stall bug on large systems with CPU hotplug 2009-11-22 18:58:15 +01:00
rcutree_plugin.h rcu: Fix grace-period-stall bug on large systems with CPU hotplug 2009-11-22 18:58:15 +01:00
rcutree_trace.c rcu: Add rnp->blocked_tasks to tracing 2009-10-15 11:20:22 +02:00
relay.c const: mark struct vm_struct_operations 2009-09-27 11:39:25 -07:00
res_counter.c memcg: some modification to softlimit under hierarchical memory reclaim. 2009-10-01 16:11:13 -07:00
resource.c walk system ram range 2009-09-23 07:39:41 -07:00
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock() 2009-08-06 05:50:21 +02:00
rtmutex.h
rtmutex_common.h
rwsem.c
sched.c rcu: Enable synchronize_sched_expedited() fastpath 2009-11-10 22:48:49 +01:00
sched_clock.c sched_clock: Fix atomicity/continuity bug by using cmpxchg64() 2009-09-30 22:56:10 +02:00
sched_cpupri.c sched: Add new prio to cpupri before removing old prio 2009-08-02 14:26:09 +02:00
sched_cpupri.h
sched_debug.c sched: Add new wakeup preemption mode: WAKEUP_RUNNING 2009-09-17 10:17:25 +02:00
sched_fair.c sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
sched_features.h sched: Add new wakeup preemption mode: WAKEUP_RUNNING 2009-09-17 10:17:25 +02:00
sched_idletask.c sched: Simplify sys_sched_rr_get_interval() system call 2009-09-21 09:53:55 +02:00
sched_rt.c sched: Simplify sys_sched_rr_get_interval() system call 2009-09-21 09:53:55 +02:00
sched_stats.h
seccomp.c
semaphore.c
signal.c signals: inline __fatal_signal_pending 2009-09-24 07:21:01 -07:00
slow-work.c sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
smp.c cpumask: remove arch_send_call_function_ipi 2009-09-24 09:34:47 +09:30
softirq.c rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls 2009-11-02 16:06:37 +01:00
softlockup.c sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
spinlock.c locking: Allow arch-inlined spinlocks 2009-08-31 18:08:50 +02:00
srcu.c rcu: Add synchronize_srcu_expedited() 2009-10-26 09:40:30 +01:00
stacktrace.c
stop_machine.c
sys.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2009-09-24 07:53:22 -07:00
sys_ni.c Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-09-24 15:13:11 -07:00
sysctl.c Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 2009-09-24 07:53:22 -07:00
sysctl_check.c
taskstats.c genetlink: make netns aware 2009-07-12 14:03:27 -07:00
test_kprobes.c
time.c time: Prevent 32 bit overflow with set_normalized_timespec() 2009-09-15 10:17:30 +02:00
timeconst.pl
timer.c Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-09-23 09:46:15 -07:00
tracepoint.c trivial: fix typo "to to" in multiple files 2009-09-21 15:14:55 +02:00
tsacct.c
uid16.c headers: utsname.h redux 2009-09-23 18:13:10 -07:00
up.c
user.c uids: Prevent tear down race 2009-11-02 16:02:39 +01:00
user_namespace.c
utsname.c utsns: extract creeate_uts_ns() 2009-06-18 13:03:55 -07:00
utsname_sysctl.c sysctl: remove "struct file *" argument of ->proc_handler 2009-09-24 07:21:04 -07:00
wait.c locking, sched: Give waitqueue spinlocks their own lockdep classes 2009-08-10 14:43:09 +02:00
workqueue.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-09-11 13:23:18 -07:00