android_kernel_motorola_sm6225/kernel
Daniel Borkmann 42049e65d3 bpf: Adjust insufficient default bpf_jit_limit
[ Upstream commit 10ec8ca8ec1a2f04c4ed90897225231c58c124a7 ]

We've seen recent AWS EKS (Kubernetes) user reports like the following:

  After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
  clusters after a few days a number of the nodes have containers stuck
  in ContainerCreating state or liveness/readiness probes reporting the
  following error:

    Readiness probe errored: rpc error: code = Unknown desc = failed to
    exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
    OCI runtime exec failed: exec failed: unable to start container process:
    unable to init seccomp: error loading seccomp filter into kernel:
    error loading seccomp filter: errno 524: unknown

  However, we had not been seeing this issue on previous AMIs and it only
  started to occur on v20230217 (following the upgrade from kernel 5.4 to
  5.10) with no other changes to the underlying cluster or workloads.

  We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
  which helped to immediately allow containers to be created and probes to
  execute but after approximately a day the issue returned and the value
  returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
  was steadily increasing.

I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
seccomp BPF and native (e)BPF programs, and the behavior all looks sane
and expected, that is nothing "leaking" from an upstream perspective.

The bpf_jit_limit knob was originally added in order to avoid a situation
where unprivileged applications loading BPF programs (e.g. seccomp BPF
policies) consuming all the module memory space via BPF JIT such that loading
of kernel modules would be prevented. The default limit was defined back in
2018 and while good enough back then, we are generally seeing far more BPF
consumers today.

Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
module memory space to better reflect today's needs and avoid more users
running into potentially hard to debug issues.

Fixes: fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Reported-by: Stephen Haynes <sh@synk.net>
Reported-by: Lefteris Alexakis <lefteris.alexakis@kpn.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
Link: https://github.com/awslabs/amazon-eks-ami/issues/1219
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230320143725.8394-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-04-05 11:15:34 +02:00
..
bpf bpf: Adjust insufficient default bpf_jit_limit 2023-04-05 11:15:34 +02:00
cgroup memcg: fix possible use-after-free in memcg_write_event_control() 2022-12-14 11:28:27 +01:00
configs
debug kdb: Make memory allocations more robust 2021-03-04 09:39:31 +01:00
dma swiotlb: skip swiotlb_bounce when orig_addr is zero 2022-07-02 16:27:40 +02:00
events perf: Fix possible memleak in pmu_dev_alloc() 2023-01-18 11:30:05 +01:00
gcov gcov: add support for checksum field 2023-01-18 11:30:39 +01:00
irq irqdomain: Drop bogus fwspec-mapping error handling 2023-03-11 16:31:52 +01:00
livepatch livepatch: fix race between fork and KLP transition 2022-10-26 13:19:23 +02:00
locking locking/lockdep: Avoid RCU-induced noinstr fail 2021-11-26 11:36:04 +01:00
power PM: hibernate: Allow hybrid sleep to work with s2idle 2022-11-03 23:52:31 +09:00
printk printk: fix return value of printk.devkmsg __setup handler 2022-04-15 14:14:45 +02:00
rcu rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait() 2023-03-11 16:31:45 +01:00
sched panic: Consolidate open-coded panic_on_warn checks 2023-02-06 07:49:46 +01:00
time timers: Prevent union confusion from unexpected restart_syscall() 2023-03-11 16:31:46 +01:00
trace ftrace: Fix invalid address access in lookup_rec() when index is 0 2023-03-22 13:27:12 +01:00
.gitignore
acct.c acct: fix potential integer overflow in encode_comp_t() 2023-01-18 11:30:34 +01:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 11:58:38 +01:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-02-08 18:23:13 +01:00
audit.h audit: fix a net reference leak in audit_list_rules_send() 2020-06-22 09:05:13 +02:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-09-05 10:26:28 +02:00
audit_tree.c audit: Embed key into chunk 2019-12-13 08:51:11 +01:00
audit_watch.c audit: CONFIG_CHANGE don't log internal bookkeeping as an event 2020-10-01 13:14:33 +02:00
auditfilter.c audit: fix a net reference leak in audit_list_rules_send() 2020-06-22 09:05:13 +02:00
auditsc.c audit: print empty EXECVE args 2019-12-01 09:17:17 +01:00
backtracetest.c
bounds.c
capability.c LSM: generalize flag passing to security_capable 2020-01-23 08:21:29 +01:00
compat.c make 'user_access_begin()' do 'access_ok()' 2020-06-22 09:04:58 +02:00
configs.c
context_tracking.c
cpu.c random: clear fast pool, crng, and batches in cpuhp bring up 2022-06-25 11:49:07 +02:00
cpu_pm.c kernel/cpu_pm: Fix uninitted local in cpu_pm 2020-06-22 09:05:28 +02:00
crash_core.c
crash_dump.c
cred.c memcg: account security cred as well to kmemcg 2020-01-09 10:19:00 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-06 07:49:46 +01:00
extable.c
fail_function.c fail_function: Remove a redundant mutex unlock 2020-11-24 13:27:23 +01:00
fork.c mm/hugetlb: initialize hugetlb_usage in mm_init 2021-09-22 11:48:09 +02:00
freezer.c
futex.c mm, futex: fix shared futex pgoff on shmem huge page 2021-07-11 12:49:30 +02:00
groups.c
hung_task.c
iomem.c
irq_work.c irq_work: Do not raise an IPI when queueing work on the local CPU 2019-05-31 06:46:19 -07:00
jump_label.c locking/static_key: Fix false positive warnings on concurrent dec/inc 2021-03-04 09:39:30 +01:00
kallsyms.c kallsyms: Refactor kallsyms_show_value() to take cred 2020-07-16 08:17:26 +02:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c
kexec.c
kexec_core.c kernel: kexec: remove the lock operation of system_transition_mutex 2021-02-03 23:23:23 +01:00
kexec_file.c kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add] 2022-07-02 16:27:39 +02:00
kexec_internal.h
kmod.c kmod: make request_module() return an error when autoloading is disabled 2020-04-17 10:48:52 +02:00
kprobes.c x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range 2023-03-11 16:31:51 +01:00
ksysfs.c
kthread.c kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() 2021-07-11 12:49:31 +02:00
latencytop.c
Makefile elfcore: fix building with clang 2021-02-10 09:21:06 +01:00
memremap.c mm/memory_hotplug: shrink zones when offlining memory 2020-01-29 16:43:27 +01:00
module-internal.h
module.c module: Don't wait for GOING modules 2023-02-06 07:49:41 +01:00
module_signing.c
notifier.c x86/mm: split vmalloc_sync_all() 2020-03-25 08:06:13 +01:00
nsproxy.c
padata.c padata: add separate cpuhp node for CPUHP_PADATA_DEAD 2021-08-08 08:54:30 +02:00
panic.c exit: Use READ_ONCE() for all oops/warn limit reads 2023-02-06 07:49:46 +01:00
params.c
pid.c
pid_namespace.c memcg: enable accounting for pids in nested pid namespaces 2021-09-22 11:48:09 +02:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-25 11:15:20 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-14 16:59:14 +02:00
range.c
reboot.c reboot: fix overflow parsing reboot cpu number 2020-11-18 19:18:52 +01:00
relay.c relay: fix type mismatch when allocating memory in relay_create_buf() 2023-01-18 11:30:08 +01:00
resource.c resource: fix locking in find_next_iomem_res() 2019-09-16 08:22:20 +02:00
rseq.c
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:51:47 +01:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-21 21:09:32 +02:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:12:50 +02:00
smpboot.c kthread: Extract KTHREAD_IS_PER_CPU 2021-02-07 14:48:38 +01:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c
sys.c prlimit: do_prlimit needs to have a speculation check 2023-01-24 07:11:49 +01:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-09-05 10:26:28 +02:00
sysctl.c proc: proc_skip_spaces() shouldn't think it is working on C strings 2022-12-08 11:18:32 +01:00
sysctl_binary.c
task_work.c
taskstats.c taskstats: fix data-race 2020-01-09 10:18:59 +01:00
test_kprobes.c
torture.c
tracepoint.c tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing 2021-07-20 16:15:42 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-02-23 11:58:39 +01:00
ucount.c
uid16.c
uid16.h
umh.c usermodehelper: reset umask to default before executing user process 2020-10-14 10:31:21 +02:00
up.c smp: Fix smp_call_function_single_async prototype 2021-05-22 10:59:39 +02:00
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
watchdog.c watchdog: export lockup_detector_reconfigure 2022-08-25 11:15:46 +02:00
watchdog_hld.c
workqueue.c workqueue: don't skip lockdep work dependency in cancel_work_sync() 2022-09-28 11:02:58 +02:00
workqueue_internal.h