Kernel sources for Moto G9 (Play) (Guamp)
Find a file
Tejun Heo 515d077ee3 blk-iolatency: Fix inflight count imbalances and IO hangs on offline
commit 8a177a36da6c54c98b8685d4f914cb3637d53c0d upstream.

iolatency needs to track the number of inflight IOs per cgroup. As this
tracking can be expensive, it is disabled when no cgroup has iolatency
configured for the device. To ensure that the inflight counters stay
balanced, iolatency_set_limit() freezes the request_queue while manipulating
the enabled counter, which ensures that no IO is in flight and thus all
counters are zero.

Unfortunately, iolatency_set_limit() isn't the only place where the enabled
counter is manipulated. iolatency_pd_offline() can also dec the counter and
trigger disabling. As this disabling happens without freezing the q, this
can easily happen while some IOs are in flight and thus leak the counts.

This can be easily demonstrated by turning on iolatency on an one empty
cgroup while IOs are in flight in other cgroups and then removing the
cgroup. Note that iolatency shouldn't have been enabled elsewhere in the
system to ensure that removing the cgroup disables iolatency for the whole
device.

The following keeps flipping on and off iolatency on sda:

  echo +io > /sys/fs/cgroup/cgroup.subtree_control
  while true; do
      mkdir -p /sys/fs/cgroup/test
      echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency
      sleep 1
      rmdir /sys/fs/cgroup/test
      sleep 1
  done

and there's concurrent fio generating direct rand reads:

  fio --name test --filename=/dev/sda --direct=1 --rw=randread \
      --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k

while monitoring with the following drgn script:

  while True:
    for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()):
        for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list):
            blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node')
            pd = blkg.pd[prog['blkcg_policy_iolatency'].plid]
            if pd.value_() == 0:
                continue
            iolat = container_of(pd, 'struct iolatency_grp', 'pd')
            inflight = iolat.rq_wait.inflight.counter.value_()
            if inflight:
                print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} '
                      f'{cgroup_path(css.cgroup).decode("utf-8")}')
    time.sleep(1)

The monitoring output looks like the following:

  inflight=1 sda /user.slice
  inflight=1 sda /user.slice
  ...
  inflight=14 sda /user.slice
  inflight=13 sda /user.slice
  inflight=17 sda /user.slice
  inflight=15 sda /user.slice
  inflight=18 sda /user.slice
  inflight=17 sda /user.slice
  inflight=20 sda /user.slice
  inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19
  inflight=19 sda /user.slice
  inflight=19 sda /user.slice

If a cgroup with stuck inflight ends up getting throttled, the throttled IOs
will never get issued as there's no completion event to wake it up leading
to an indefinite hang.

This patch fixes the bug by unifying enable handling into a work item which
is automatically kicked off from iolatency_set_min_lat_nsec() which is
called from both iolatency_set_limit() and iolatency_pd_offline() paths.
Punting to a work item is necessary as iolatency_pd_offline() is called
under spinlocks while freezing a request_queue requires a sleepable context.

This also simplifies the code reducing LOC sans the comments and avoids the
unnecessary freezes which were happening whenever a cgroup's latency target
is newly set or cleared.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Liu Bo <bo.liu@linux.alibaba.com>
Fixes: 8c772a9bfc7c ("blk-iolatency: fix IO hang due to negative inflight counter")
Cc: stable@vger.kernel.org # v5.0+
Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-06-14 16:59:30 +02:00
arch arm64: dts: qcom: ipq8074: fix the sleep clock frequency 2022-06-14 16:59:30 +02:00
block blk-iolatency: Fix inflight count imbalances and IO hangs on offline 2022-06-14 16:59:30 +02:00
certs certs: Trigger creation of RSA module signing key if it's not an RSA key 2021-09-22 11:47:51 +02:00
crypto crypto: authenc - Fix sleep in atomic context in decrypt_tail 2022-04-15 14:14:42 +02:00
Documentation dt-bindings: gpio: altera: correct interrupt-cells 2022-06-14 16:59:30 +02:00
drivers phy: qcom-qmp: fix struct clk leak on probe errors 2022-06-14 16:59:30 +02:00
firmware
fs ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock 2022-06-14 16:59:28 +02:00
include nodemask.h: fix compilation error with GCC12 2022-06-14 16:59:29 +02:00
init init/main.c: return 1 from handled __setup() functions 2022-04-15 14:15:03 +02:00
ipc shm: extend forced shm destroy to support objects from several IPC nses 2021-12-08 08:50:11 +01:00
kernel tracing: Fix potential double free in create_var_ref() 2022-06-14 16:59:27 +02:00
lib assoc_array: Fix BUG_ON during garbage collect 2022-06-06 08:24:20 +02:00
LICENSES
mm hugetlb: fix huge_pmd_unshare address update 2022-06-14 16:59:29 +02:00
net mac80211: upgrade passive scan to active scan on DFS channels after beacon rx 2022-06-14 16:59:29 +02:00
samples samples/kretprobes: Fix return value if register_kretprobe() failed 2021-11-26 11:36:11 +01:00
scripts scripts/faddr2line: Fix overlapping text section failures 2022-06-14 16:59:22 +02:00
security Fix incorrect type in assignment of ipv6 port for audit 2022-04-15 14:14:54 +02:00
sound ASoC: rt5514: Fix event generation for "DSP Voice Wake Up" control 2022-06-14 16:59:29 +02:00
tools perf jevents: Fix event syntax error caused by ExtSel 2022-06-14 16:59:26 +02:00
usr
virt KVM: Prevent module exit until all VMs are freed 2022-04-15 14:14:57 +02:00
.clang-format
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore
.mailmap
COPYING
CREDITS
Kbuild
Kconfig
MAINTAINERS
Makefile Linux 4.19.246 2022-06-06 08:24:22 +02:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.