android_kernel_motorola_sm6225/mm
Linus Torvalds 1500c29400 gup: document and work around "COW can break either way" issue
commit 17839856fd588f4ab6b789f482ed3ffd7c403e1f upstream.

Doing a "get_user_pages()" on a copy-on-write page for reading can be
ambiguous: the page can be COW'ed at any time afterwards, and the
direction of a COW event isn't defined.

Yes, whoever writes to it will generally do the COW, but if the thread
that did the get_user_pages() unmapped the page before the write (and
that could happen due to memory pressure in addition to any outright
action), the writer could also just take over the old page instead.

End result: the get_user_pages() call might result in a page pointer
that is no longer associated with the original VM, and is associated
with - and controlled by - another VM having taken it over instead.

So when doing a get_user_pages() on a COW mapping, the only really safe
thing to do would be to break the COW when getting the page, even when
only getting it for reading.

At the same time, some users simply don't even care.

For example, the perf code wants to look up the page not because it
cares about the page, but because the code simply wants to look up the
physical address of the access for informational purposes, and doesn't
really care about races when a page might be unmapped and remapped
elsewhere.

This adds logic to force a COW event by setting FOLL_WRITE on any
copy-on-write mapping when FOLL_GET (or FOLL_PIN) is used to get a page
pointer as a result.

The current semantics end up being:

 - __get_user_pages_fast(): no change. If you don't ask for a write,
   you won't break COW. You'd better know what you're doing.

 - get_user_pages_fast(): the fast-case "look it up in the page tables
   without anything getting mmap_sem" now refuses to follow a read-only
   page, since it might need COW breaking.  Which happens in the slow
   path - the fast path doesn't know if the memory might be COW or not.

 - get_user_pages() (including the slow-path fallback for gup_fast()):
   for a COW mapping, turn on FOLL_WRITE for FOLL_GET/FOLL_PIN, with
   very similar semantics to FOLL_FORCE.

If it turns out that we want finer granularity (ie "only break COW when
it might actually matter" - things like the zero page are special and
don't need to be broken) we might need to push these semantics deeper
into the lookup fault path.  So if people care enough, it's possible
that we might end up adding a new internal FOLL_BREAK_COW flag to go
with the internal FOLL_COW flag we already have for tracking "I had a
COW".

Alternatively, if it turns out that different callers might want to
explicitly control the forced COW break behavior, we might even want to
make such a flag visible to the users of get_user_pages() instead of
using the above default semantics.

But for now, this is mostly commentary on the issue (this commit message
being a lot bigger than the patch, and that patch in turn is almost all
comments), with that minimal "enable COW breaking early" logic using the
existing FOLL_WRITE behavior.

[ It might be worth noting that we've always had this ambiguity, and it
  could arguably be seen as a user-space issue.

  You only get private COW mappings that could break either way in
  situations where user space is doing cooperative things (ie fork()
  before an execve() etc), but it _is_ surprising and very subtle, and
  fork() is supposed to give you independent address spaces.

  So let's treat this as a kernel issue and make the semantics of
  get_user_pages() easier to understand. Note that obviously a true
  shared mapping will still get a page that can change under us, so this
  does _not_ mean that get_user_pages() somehow returns any "stable"
  page ]

[surenb: backport notes
	Replaced (gup_flags | FOLL_WRITE) with write=1 in gup_pgd_range.
	Removed FOLL_PIN usage in should_force_cow_break since it's missing in
	the earlier kernels.]

Change-Id: I6ae007f29a538767fddccf3306264e892f9134ab
Reported-by: Jann Horn <jannh@google.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill Shutemov <kirill@shutemov.name>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[surenb: backport to 4.19 kernel]
Cc: stable@vger.kernel.org # 4.19.x
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Git-commit: 5e24029791
Git-repo: https://android.googlesource.com/kernel/common/
Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com>
2022-07-27 11:58:39 -07:00
..
kasan Merge android-4.19.78 (75337a6) into msm-4.19 2020-03-16 23:09:43 -07:00
backing-dev.c
balloon_compaction.c
bootmem.c
cleancache.c
cma.c mm: cma: Print correct request pages 2021-03-02 18:45:12 +08:00
cma.h
cma_debug.c mm: cma: make writeable CMA debugfs optional 2019-08-06 12:35:54 +05:30
compaction.c Reverting below patches from android-4.19-stable.125 2020-07-29 13:12:56 +05:30
debug.c
debug_page_ref.c
dmapool.c UPSTREAM: mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options 2019-08-30 11:58:12 +02:00
early_ioremap.c
fadvise.c
failslab.c
filemap.c Merge android-4.19-stable.149 (9ce79d9) into msm-4.19 2020-10-21 09:25:49 +05:30
frame_vector.c UPSTREAM: mm: untag user pointers in get_vaddr_frames 2019-10-07 15:27:40 -04:00
frontswap.c
gup.c gup: document and work around "COW can break either way" issue 2022-07-27 11:58:39 -07:00
gup_benchmark.c mm/gup_benchmark.c: prevent integer overflow in ioctl 2019-12-01 09:17:07 +01:00
highmem.c
hmm.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
huge_memory.c gup: document and work around "COW can break either way" issue 2022-07-27 11:58:39 -07:00
hugetlb.c Merge android-4.19-stable.146 (443485d) into msm-4.19 2020-10-16 11:06:31 +05:30
hugetlb_cgroup.c mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() 2019-11-20 18:45:20 +01:00
hwpoison-inject.c
init-mm.c
internal.h Reverting below patches from android-4.19-stable.125 2020-07-29 13:12:56 +05:30
interval_tree.c
Kconfig mm/Kconfig: forcing allocators to return ZONE_DMA32 memory 2020-09-29 09:15:00 -07:00
Kconfig.debug
khugepaged.c Merge android-4.19-stable.152 (13abe23) into msm-4.19 2020-10-28 17:52:20 +05:30
kmemleak-test.c
kmemleak.c Merge android-4.19-stable.149 (9ce79d9) into msm-4.19 2020-10-21 09:25:49 +05:30
ksm.c Merge android-4.19-stable.125 (a483478) into msm-4.19 2020-09-20 23:45:10 +05:30
list_lru.c mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node 2019-06-19 08:17:59 +02:00
maccess.c uaccess: Add non-pagefault user-space write function 2020-09-09 19:04:29 +02:00
madvise.c Merge android-4.19.95 (5da1114) into msm-4.19 2020-03-27 10:48:20 -07:00
Makefile Merge android-4.19-stable.125 (a483478) into msm-4.19 2020-09-20 23:45:10 +05:30
memblock.c Merge android-4.19-stable.125 (a483478) into msm-4.19 2020-09-20 23:45:10 +05:30
memcontrol.c mm/memcg: fix device private memcg accounting 2020-10-29 09:55:15 +01:00
memfd.c This is the 4.19.85 stable release 2019-11-20 20:43:17 +01:00
memory-failure.c Merge android-4.19-q.81 (9045ee1) into msm-4.19 2019-10-29 04:52:53 -07:00
memory.c mm: skip speculative path for non-anonymous COW faults 2021-01-04 10:25:29 +05:30
memory_hotplug.c Merge android-4.19-stable.152 (13abe23) into msm-4.19 2020-10-28 17:52:20 +05:30
mempolicy.c Merge android-4.19-stable.157 (8ee67bc) into msm-4.19 2020-12-18 18:35:06 +05:30
mempool.c
memtest.c
migrate.c Merge android-4.19.110 (1984fff) into msm-4.19 2020-05-23 05:08:22 -07:00
mincore.c UPSTREAM: mm: untag user pointers passed to memory syscalls 2019-10-07 15:27:40 -04:00
mlock.c Merge android-4.19.95 (5da1114) into msm-4.19 2020-03-27 10:48:20 -07:00
mm_event.c ANDROID: GKI: support mm_event for FS/IO/UFS path 2020-05-30 00:09:49 +00:00
mm_init.c
mmap.c ANDROID: mm: use raw seqcount variants in vm_write_* 2021-10-21 22:57:43 -07:00
mmu_context.c
mmu_notifier.c mm/mmu_notifier: use hlist_add_head_rcu() 2019-07-31 07:27:08 +02:00
mmzone.c ANDROID: GKI: mm: Export symbols __next_zones_zonelist and zone_watermark_ok_safe 2020-04-10 02:39:42 +00:00
mprotect.c Merge android-4.19.110 (1984fff) into msm-4.19 2020-05-23 05:08:22 -07:00
mremap.c ANDROID: mm: use raw seqcount variants in vm_write_* 2021-10-21 22:57:43 -07:00
msync.c UPSTREAM: mm: untag user pointers passed to memory syscalls 2019-10-07 15:27:40 -04:00
nobootmem.c
nommu.c x86/mm: split vmalloc_sync_all() 2020-03-25 08:06:13 +01:00
oom_kill.c mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary 2020-10-29 09:55:15 +01:00
page-writeback.c mm/page-writeback.c: avoid potential division by zero in wb_min_max_ratio() 2020-01-23 08:21:31 +01:00
page_alloc.c Merge android-4.19-stable.152 (13abe23) into msm-4.19 2020-10-28 17:52:20 +05:30
page_counter.c mm/page_counter.c: fix protection usage propagation 2020-08-21 11:05:33 +02:00
page_ext.c mm: fix the page_owner initializing issue for arm32 2020-09-02 16:37:28 +08:00
page_idle.c mm/page_idle.c: fix oops because end_pfn is larger than max_pfn 2019-07-03 13:14:45 +02:00
page_io.c Merge android-4.19.110 (1984fff) into msm-4.19 2020-05-23 05:08:22 -07:00
page_isolation.c
page_owner.c Merge "Merge android-4.19-q.81 (9045ee1) into msm-4.19" 2019-11-06 06:40:33 -08:00
page_poison.c Merge android-4.19.95 (5da1114) into msm-4.19 2020-03-27 10:48:20 -07:00
page_vma_mapped.c
pagewalk.c mm: pagewalk: fix termination condition in walk_pte_range() 2020-10-01 13:14:32 +02:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c This is the 4.19.147 stable release 2020-09-24 12:48:04 +02:00
pgtable-generic.c
process_reclaim.c mm: process_reclaim: allow nomap-only reclaim 2020-07-02 04:35:02 -07:00
process_vm_access.c
quicklist.c
readahead.c Fixing Android Net Test compilation 2020-03-21 18:06:06 -07:00
rmap.c Merge android-4.19-q.68 (f3e9c9b) into msm-4.19 2019-08-28 23:55:13 -07:00
rodata_test.c
shmem.c Merge android-4.19-stable.125 (a483478) into msm-4.19 2020-09-20 23:45:10 +05:30
showmem.c
slab.c UPSTREAM: mm, slab: combine kmalloc_caches and kmalloc_dma_caches 2019-12-13 14:04:05 -08:00
slab.h UPSTREAM: kasan, kmemleak: pass tagged pointers to kmemleak 2019-09-24 17:44:14 -07:00
slab_common.c This is the 4.19.135 stable release 2020-07-29 13:22:30 +02:00
slob.c UPSTREAM: mm/sl[uo]b: export __kmalloc_track(_node)_caller 2020-11-02 16:12:14 +00:00
slub.c Merge android-4.19-stable.157 (8ee67bc) into msm-4.19 2020-12-18 18:35:06 +05:30
sparse-vmemmap.c
sparse.c mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section 2020-01-29 16:43:26 +01:00
swap.c Merge android-4.19.63 (75ff56e) into msm-4.19 2019-08-13 01:20:38 -07:00
swap_cgroup.c
swap_ratio.c
swap_slots.c mm: swap: Add null pointer check 2019-06-18 12:45:01 -07:00
swap_state.c Merge android-4.19-stable.149 (9ce79d9) into msm-4.19 2020-10-21 09:25:49 +05:30
swapfile.c Merge android-4.19-stable.149 (9ce79d9) into msm-4.19 2020-10-21 09:25:49 +05:30
truncate.c
usercopy.c Merge android-4.19-q.79 (40321f2) into msm-4.19 2019-10-21 05:07:30 -07:00
userfaultfd.c
util.c This is the 4.19.129 stable release 2020-06-22 10:50:54 +02:00
vmacache.c
vmalloc.c Merge android-4.19-stable.125 (a483478) into msm-4.19 2020-09-20 23:45:10 +05:30
vmpressure.c Merge android-4.19-q.80 (fd673e8) into msm-4.19 2019-10-21 05:33:39 -07:00
vmscan.c Merge android-4.19-stable.149 (9ce79d9) into msm-4.19 2020-10-21 09:25:49 +05:30
vmstat.c Merge android-4.19-stable.136 (204dd19) into msm-4.19 2020-10-14 20:04:29 +05:30
workingset.c
z3fold.c
zbud.c
zpool.c
zsmalloc.c Merge android-4.19-q.94 (dabb11d) into msm-4.19 2020-02-03 21:41:48 -08:00
zswap.c