platform/kernel/linux-rpi.git
2 years agoio-wq: remove worker to owner tw dependency
Pavel Begunkov [Fri, 29 Oct 2021 12:11:33 +0000 (13:11 +0100)]
io-wq: remove worker to owner tw dependency

INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds.
      Not tainted 5.15.0-rc5-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-wrk-6609    state:D stack:27944 pid: 6612 ppid:  6526 flags:0x00004006
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0xb44/0x5960 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 io_worker_exit fs/io-wq.c:183 [inline]
 io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

io-wq worker may submit a task_work to the master task and upon
io_worker_exit() wait for the tw to get executed. The problem appears
when the master task is waiting in coredump.c:

468                     freezer_do_not_count();
469                     wait_for_completion(&core_state->startup);
470                     freezer_count();

Apparently having some dependency on children threads getting everything
stuck. Workaround it by cancelling the taks_work callback that causes it
before going into io_worker_exit() waiting.

p.s. probably a better option is to not submit tw elevating the refcount
in the first place, but let's leave this excercise for the future.

Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+27d62ee6f256b186883e@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/142a716f4ed936feae868959059154362bfa8c19.1635509451.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: harder fdinfo sq/cq ring iterating
Jens Axboe [Fri, 29 Oct 2021 12:36:45 +0000 (06:36 -0600)]
io_uring: harder fdinfo sq/cq ring iterating

The ring iteration is racy, which isn't necessarily a problem except it
can cause us to iterate the whole thing. That isn't desired or ideal,
and it can lead to excessive runtimes of reading fdinfo.

Cap the iteration at tail - head OR the ring size. While in there, clean
up the ring masking and just dump the raw values along with the masks.
That provides more useful debug info.

Fixes: 83f84356bc8f ("io_uring: add more uring info to fdinfo for debug")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: don't assign write hint in the read path
Jens Axboe [Mon, 25 Oct 2021 19:45:12 +0000 (13:45 -0600)]
io_uring: don't assign write hint in the read path

Move this out of the generic read/write prep path, and place it in the
write specific kiocb setup instead.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: clusterise ki_flags access in rw_prep
Pavel Begunkov [Sat, 23 Oct 2021 11:14:02 +0000 (12:14 +0100)]
io_uring: clusterise ki_flags access in rw_prep

ioprio setup doesn't depend on other fields that are modified in
io_prep_rw() and we can move it down in the function without worrying
about performance. It's useful as it makes iocb->ki_flags
accesses/modifications closer together, so it's more likely the compiler
will cache it in a register and avoid extra reloads.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8ee98779c06f1b59f6039b1e292db4332efd664b.1634987320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: kill unused param from io_file_supports_nowait
Pavel Begunkov [Sat, 23 Oct 2021 11:14:01 +0000 (12:14 +0100)]
io_uring: kill unused param from io_file_supports_nowait

io_file_supports_nowait() doesn't use rw argument anymore, remove it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4bd6709fc573d70c866ea656cb7a7dbe94be8026.1634987320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: clean up timeout async_data allocation
Pavel Begunkov [Sat, 23 Oct 2021 11:14:00 +0000 (12:14 +0100)]
io_uring: clean up timeout async_data allocation

opcode prep functions are one of the first things that are called, we
can't have ->async_data allocated at this point and it's certainly a
bug. Reflect this assumption in io_timeout_prep() and add a WARN_ONCE
just in case.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/75a28ca7dbcc5af8b6cd9092819e8384c24dedd4.1634987320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: don't try io-wq polling if not supported
Pavel Begunkov [Sat, 23 Oct 2021 11:13:59 +0000 (12:13 +0100)]
io_uring: don't try io-wq polling if not supported

If an opcode doesn't support polling, just let it be executed
synchronously in iowq, otherwise it will do a nonblock attempt just to
fail in io_arm_poll_handler() and return back to blocking execution.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6401256db01b88f448f15fcd241439cb76f5b940.1634987320.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: check if opcode needs poll first on arming
Pavel Begunkov [Sat, 23 Oct 2021 11:13:58 +0000 (12:13 +0100)]
io_uring: check if opcode needs poll first on arming

->pollout or ->pollin are set only for opcodes that need a file, so if
io_arm_poll_handler() tests them first we can be sure that the request
has file set and the ->file check can be removed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9adfe4f543d984875e516fce6da35348aab48668.1634987320.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: clean iowq submit work cancellation
Pavel Begunkov [Sat, 23 Oct 2021 11:13:57 +0000 (12:13 +0100)]
io_uring: clean iowq submit work cancellation

If we've got IO_WQ_WORK_CANCEL in io_wq_submit_work(), handle the error
on the same lines as the check instead of having a weird code flow. The
main loop doesn't change but goes one indention left.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ff4a09cf41f7a22bbb294b6f1faea721e21fe615.1634987320.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: clean io_wq_submit_work()'s main loop
Pavel Begunkov [Sat, 23 Oct 2021 11:13:56 +0000 (12:13 +0100)]
io_uring: clean io_wq_submit_work()'s main loop

Do a bit of cleaning for the main loop of io_wq_submit_work(). Get rid
of switch, just replace it with a single if as we're retrying in both
other cases. Kill issue_sqe label, Get rid of needs_poll nesting and
disambiguate a bit the comment.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ed12ce0c64e051f9a6b8a37a24f8ea554d299c29.1634987320.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio-wq: use helper for worker refcounting
Pavel Begunkov [Sat, 23 Oct 2021 11:13:55 +0000 (12:13 +0100)]
io-wq: use helper for worker refcounting

Use io_worker_release() instead of hand coding it in io_worker_exit().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6f95f09d2cdbafcbb2e22ad0d1a2bc4d3962bf65.1634987320.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: implement async hybrid mode for pollable requests
Hao Xu [Mon, 18 Oct 2021 13:34:45 +0000 (21:34 +0800)]
io_uring: implement async hybrid mode for pollable requests

The current logic of requests with IOSQE_ASYNC is first queueing it to
io-worker, then execute it in a synchronous way. For unbound works like
pollable requests(e.g. read/write a socketfd), the io-worker may stuck
there waiting for events for a long time. And thus other works wait in
the list for a long time too.
Let's introduce a new way for unbound works (currently pollable
requests), with this a request will first be queued to io-worker, then
executed in a nonblock try rather than a synchronous way. Failure of
that leads it to arm poll stuff and then the worker can begin to handle
other works.
The detail process of this kind of requests is:

step1: original context:
           queue it to io-worker
step2: io-worker context:
           nonblock try(the old logic is a synchronous try here)
               |
               |--fail--> arm poll
                            |
                            |--(fail/ready)-->synchronous issue
                            |
                            |--(succeed)-->worker finish it's job, tw
                                           take over the req

This works much better than the old IOSQE_ASYNC logic in cases where
unbound max_worker is relatively small. In this case, number of
io-worker eazily increments to max_worker, new worker cannot be created
and running workers stuck there handling old works in IOSQE_ASYNC mode.

In my 64-core machine, set unbound max_worker to 20, run echo-server,
turns out:
(arguments: register_file, connetion number is 1000, message size is 12
Byte)
original IOSQE_ASYNC: 76664.151 tps
after this patch: 166934.985 tps

Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211018133445.103438-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: Use ERR_CAST() instead of ERR_PTR(PTR_ERR())
Changcheng Deng [Wed, 20 Oct 2021 08:49:48 +0000 (08:49 +0000)]
io_uring: Use ERR_CAST() instead of ERR_PTR(PTR_ERR())

Use ERR_CAST() instead of ERR_PTR(PTR_ERR()).
This makes it more readable and also fix this warning detected by
err_cast.cocci:
./fs/io_uring.c: WARNING: 3208: 11-18: ERR_CAST can be used with buf

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Link: https://lore.kernel.org/r/20211020084948.1038420-1-deng.changcheng@zte.com.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: split logic of force_nonblock
Hao Xu [Mon, 18 Oct 2021 13:34:31 +0000 (21:34 +0800)]
io_uring: split logic of force_nonblock

Currently force_nonblock stands for three meanings:
 - nowait or not
 - in an io-worker or not(hold uring_lock or not)

Let's split the logic to two flags, IO_URING_F_NONBLOCK and
IO_URING_F_UNLOCKED for convenience of the next patch.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20211018133431.103298-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: warning about unused-but-set parameter
Arnd Bergmann [Tue, 19 Oct 2021 15:34:53 +0000 (17:34 +0200)]
io_uring: warning about unused-but-set parameter

When enabling -Wunused warnings by building with W=1, I get an
instance of the -Wunused-but-set-parameter warning in the io_uring code:

fs/io_uring.c: In function 'io_queue_async_work':
fs/io_uring.c:1445:61: error: parameter 'locked' set but not used [-Werror=unused-but-set-parameter]
 1445 | static void io_queue_async_work(struct io_kiocb *req, bool *locked)
      |                                                       ~~~~~~^~~~~~

There are very few warnings of this type, so it would be nice to enable
this by default and fix all the existing instances. As the assignment
serves no purpose by itself other than to prevent developers from using
the variable, an easy workaround is to remove the assignment and just
rename the argument to "dont_use".

Fixes: f237c30a5610 ("io_uring: batch task work locking")
Link: https://lore.kernel.org/lkml/20210920121352.93063-1-arnd@kernel.org/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211019153507.348480-1-arnd@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: inform block layer of how many requests we are submitting
Jens Axboe [Wed, 6 Oct 2021 17:01:42 +0000 (11:01 -0600)]
io_uring: inform block layer of how many requests we are submitting

The block layer can use this knowledge to make smarter decisions on
how to handle the request, if it knows that N more may be coming. Switch
to using blk_start_plug_nr_ios() to pass in that information.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: simplify io_file_supports_nowait()
Pavel Begunkov [Sat, 16 Oct 2021 23:07:10 +0000 (00:07 +0100)]
io_uring: simplify io_file_supports_nowait()

Make sure that REQ_F_SUPPORT_NOWAIT is always set io_prep_rw(), and so
we can stop caring about setting it down the line simplifying
io_file_supports_nowait().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/60c8f1f5e2cb45e00f4897b2cec10c5b3669da91.1634425438.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags
Pavel Begunkov [Sat, 16 Oct 2021 23:07:09 +0000 (00:07 +0100)]
io_uring: combine REQ_F_NOWAIT_{READ,WRITE} flags

Merge REQ_F_NOWAIT_READ and REQ_F_NOWAIT_WRITE into one flag, i.e.
REQ_F_SUPPORT_NOWAIT. First it gets rid of dependence on CONFIG_64BIT
but also simplifies the code.

One thing to consider is when we don't have ->{read,write}_iter and go
through loop_rw_iter(). Just fail it with -EAGAIN if we expect nowait
behaviour but not sure whether it supports it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f832a20e5186c2e79c6519280c238f559a1d2bbc.1634425438.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: arm poll for non-nowait files
Pavel Begunkov [Sat, 16 Oct 2021 23:07:08 +0000 (00:07 +0100)]
io_uring: arm poll for non-nowait files

Don't check if we can do nowait before arming apoll, there are several
reasons for that. First, we don't care much about files that don't
support nowait. Second, it may be useful -- we don't want to be taking
away extra workers from io-wq when it can go in some async. Even if it
will go through io-wq eventually, it make difference in the numbers of
workers actually used. And the last one, it's needed to clean nowait in
future commits.

[kernel test robot: fix unused-var]

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9d06f3cb2c8b686d970269a87986f154edb83043.1634425438.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agofs/io_uring: Prioritise checking faster conditions first in io_write
Noah Goldstein [Sun, 17 Oct 2021 01:32:29 +0000 (20:32 -0500)]
fs/io_uring: Prioritise checking faster conditions first in io_write

This commit reorders the conditions in a branch in io_write. The
reorder to check 'ret2 == -EAGAIN' first as checking
'(req->ctx->flags & IORING_SETUP_IOPOLL)' will likely be more
expensive due to 2x memory derefences.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Link: https://lore.kernel.org/r/20211017013229.4124279-1-goldstein.w.n@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: clean io_prep_rw()
Pavel Begunkov [Fri, 15 Oct 2021 16:09:16 +0000 (17:09 +0100)]
io_uring: clean io_prep_rw()

We already store req->file in a variable in io_prep_rw(), just use it
instead of a couple of left references to kicob->ki_filp.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2f5889fc7ab670daefd5ccaedd99416d8355f0ad.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise fixed rw rsrc node setting
Pavel Begunkov [Fri, 15 Oct 2021 16:09:15 +0000 (17:09 +0100)]
io_uring: optimise fixed rw rsrc node setting

Move fixed rw io_req_set_rsrc_node() from rw prep into
io_import_fixed(), if we're using fixed buffers it will always be called
during submission as we save the state in advance,

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/68c06f66d5aa9661f1e4b88d08c52d23528297ec.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: return iovec from __io_import_iovec
Pavel Begunkov [Fri, 15 Oct 2021 16:09:14 +0000 (17:09 +0100)]
io_uring: return iovec from __io_import_iovec

We pass iovec** into __io_import_iovec(), which should keep it,
initialise and modify accordingly. It's expensive, return it directly
from __io_import_iovec encoding errors with ERR_PTR if needed.

io_import_iovec keeps the old interface, but it's inline and so
everything is optimised nicely.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6230e9769982f03a8f86fa58df24666088c44d3e.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise io_import_iovec fixed path
Pavel Begunkov [Fri, 15 Oct 2021 16:09:13 +0000 (17:09 +0100)]
io_uring: optimise io_import_iovec fixed path

Delay loading req->rw.{addr,len} in io_import_iovec until it's really
needed, so removing extra loads for the fixed path, which doesn't use
them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3cc48dd0c4f1a37c4ce9aab5784281a2d83ad8be.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: kill io_wq_current_is_worker() in iopoll
Pavel Begunkov [Fri, 15 Oct 2021 16:09:12 +0000 (17:09 +0100)]
io_uring: kill io_wq_current_is_worker() in iopoll

Don't decide about locking based on io_wq_current_is_worker(), it's not
consistent with all other code and is expensive, use issue_flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7546d5a58efa4360173541c6fe02ee6b8c7b4ea7.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise req->ctx reloads
Pavel Begunkov [Fri, 15 Oct 2021 16:09:11 +0000 (17:09 +0100)]
io_uring: optimise req->ctx reloads

Don't load req->ctx in advance, it takes an extra register and the field
stays valid even after opcode handlers. It also optimises out req->ctx
load in io_iopoll_req_issued() once it's inlined.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1e45ff671c44be0eb904f2e448a211734893fa0b.1634314022.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: rearrange io_read()/write()
Pavel Begunkov [Thu, 14 Oct 2021 15:10:19 +0000 (16:10 +0100)]
io_uring: rearrange io_read()/write()

Combine force_nonblock branches (which is already optimised by
compiler), flip branches so the most hot/common path is the first, e.g.
as with non on-stack iov setup, and add extra likely/unlikely
attributions for errror paths.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2c2536c5896d70994de76e387ea09a0402173a3f.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: clean up io_import_iovec
Pavel Begunkov [Thu, 14 Oct 2021 15:10:18 +0000 (16:10 +0100)]
io_uring: clean up io_import_iovec

Make io_import_iovec taking struct io_rw_state instead of an iter
pointer. First it takes care of initialising iovec pointer, which can be
forgotten. Even more, we can not init it if not needed, e.g. in case of
IORING_OP_READ_FIXED or IORING_OP_READ. Also hide saving iter_state
inside of it by splitting out an inline function of it to avoid extra
ifs.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b1bbc213a95e5272d4da5867bb977d9acb6f2109.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise io_import_iovec nonblock passing
Pavel Begunkov [Thu, 14 Oct 2021 15:10:17 +0000 (16:10 +0100)]
io_uring: optimise io_import_iovec nonblock passing

First, change IO_URING_F_NONBLOCK to take sign bit of the int, so
checking for it can be turned into test + sign-based-jump, makes the
binary smaller and may be faster.

Then, instead of passing need_lock boolean into io_import_iovec() just
give it issue_flags, which is already stored somewhere. Saves some space
on stack, a couple of test + cmov operations and other conversions.

note: we still leave
force_nonblock = issue_flags & IO_URING_F_NONBLOCK
variable, but it's optimised out by the compiler into testing
issue_flags directly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ee96547e692f6c975c229cd82fc721679571a734.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise read/write iov state storing
Pavel Begunkov [Thu, 14 Oct 2021 15:10:16 +0000 (16:10 +0100)]
io_uring: optimise read/write iov state storing

Currently io_read() and io_write() keep separate pointers to an iter and
to struct iov_iter_state, which is not great for register spilling and
requires more on-stack copies. They are both either on-stack or in
req->async_data at the same time, so use struct io_rw_state and keep a
pointer only to it, so having all the state with just one pointer.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5c5e7ffd7dc25fc35075c70411ba99df72f237fa.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: encapsulate rw state
Pavel Begunkov [Thu, 14 Oct 2021 15:10:15 +0000 (16:10 +0100)]
io_uring: encapsulate rw state

Add a new struct io_rw_state storing all iov related bits: fast iov,
iterator and iterator state. Not much changes here, simply convert
struct io_async_rw to use it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e8245ffcb568b228a009ec1eb79c993c813679f1.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise rw comletion handlers
Pavel Begunkov [Thu, 14 Oct 2021 15:10:14 +0000 (16:10 +0100)]
io_uring: optimise rw comletion handlers

Don't override req->result in io_complete_rw_iopoll() when it's already
of the same value, we have an if just above it, so move the assignment
there. Also, add one simle unlikely() in __io_complete_rw_common().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8dfeb4f84026a20172bcf82c05010abe955874ae.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: prioritise read success path over fails
Pavel Begunkov [Thu, 14 Oct 2021 15:10:13 +0000 (16:10 +0100)]
io_uring: prioritise read success path over fails

Rearrange io_read return handling so first we expect it completing
successfully and only then checking for errors, which is a colder path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c91c7c2da11815ec8b04b5d872f60dc4cde662c5.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: consistent typing for issue_flags
Pavel Begunkov [Thu, 14 Oct 2021 15:10:12 +0000 (16:10 +0100)]
io_uring: consistent typing for issue_flags

Some of the functions keep issue_flags as int, change those to unsigned.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/04ad43797783bc9cc7567f287ab545518f8e8cf2.1634144845.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise rsrc referencing
Pavel Begunkov [Sat, 9 Oct 2021 22:14:41 +0000 (23:14 +0100)]
io_uring: optimise rsrc referencing

Apparently, percpu_ref_put/get() are expensive enough if done per
request, get them in a batch and cache on the submission side to avoid
getting it over and over again. Also, if we're completing under
uring_lock, return refs back into the cache instead of
perfcpu_ref_put(). Pretty similar to how we do tctx->cached_refs
accounting, but fall back to normal putting when we already changed a
rsrc node by the time of free.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b40d8c5bc77d3c9550df8a319117a374ac85f8f4.1633817310.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise io_req_set_rsrc_node()
Pavel Begunkov [Sat, 9 Oct 2021 22:14:40 +0000 (23:14 +0100)]
io_uring: optimise io_req_set_rsrc_node()

io_req_set_rsrc_node() reloads loads req->ctx, however it's already in
registers in all use cases, so better to pass it as a parameter.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/67a25557b8a51e90bfd578447a6f1671911b05ae.1633817310.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: fix io_free_batch_list races
Pavel Begunkov [Tue, 12 Oct 2021 14:02:14 +0000 (15:02 +0100)]
io_uring: fix io_free_batch_list races

[  158.514382] WARNING: CPU: 5 PID: 15251 at fs/io_uring.c:1141 io_free_batch_list+0x269/0x360
[  158.514426] RIP: 0010:io_free_batch_list+0x269/0x360
[  158.514437] Call Trace:
[  158.514440]  __io_submit_flush_completions+0xde/0x180
[  158.514444]  tctx_task_work+0x14a/0x220
[  158.514447]  task_work_run+0x64/0xa0
[  158.514448]  __do_sys_io_uring_enter+0x7c/0x970
[  158.514450]  __x64_sys_io_uring_enter+0x22/0x30
[  158.514451]  do_syscall_64+0x43/0x90
[  158.514453]  entry_SYSCALL_64_after_hwframe+0x44/0xae

We should not touch request internals including req->comp_list.next
after putting our ref if it's not final, e.g. we can start freeing
requests from the free cache.

Fixed: 62ca9cb93e7f8 ("io_uring: optimise io_free_batch_list()")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b1f4df38fbb8f111f52911a02fd418d0283a4e6f.1634047298.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: remove extra io_ring_exit_work wake up
Pavel Begunkov [Wed, 6 Oct 2021 15:06:50 +0000 (16:06 +0100)]
io_uring: remove extra io_ring_exit_work wake up

task_work_add() takes care of waking up the thread, remove useless
wake_up_process().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/de9a71ee255112dcaed3b5d426be24934e74722c.1633532552.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise out req->opcode reloading
Pavel Begunkov [Wed, 6 Oct 2021 15:06:49 +0000 (16:06 +0100)]
io_uring: optimise out req->opcode reloading

Looking at the assembly, the compiler decided to reload req->opcode in
io_op_defs[opcode].needs_file instead of one it had in a register, so
store it in a temp variable so it can be optimised out. Also move the
personality block later, it's better for spilling/etc. as it only
depends on @sqe, which we're keeping anyway.

By the way, zero req->opcode if it over IORING_OP_LAST, not a problem,
at the moment but is safer.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6ba869f5f8b7b0f991c87fdf089f0abf87cbe06b.1633532552.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: reshuffle io_submit_state bits
Pavel Begunkov [Wed, 6 Oct 2021 15:06:48 +0000 (16:06 +0100)]
io_uring: reshuffle io_submit_state bits

struct io_submit_state's ->free_list and ->link are hotter and smaller
than ->plug, place them first.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6ad3c15849f50b27ad012c042c73e6e069d22df7.1633532552.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: safer fallback_work free
Pavel Begunkov [Wed, 6 Oct 2021 15:06:47 +0000 (16:06 +0100)]
io_uring: safer fallback_work free

Add extra wq flushing for fallback_work, that's not necessary but safer
if invariants of io_fallback_req_func() change.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/24179419d6748516299600bc914f50b9e0b02275.1633532552.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise plugging
Pavel Begunkov [Wed, 6 Oct 2021 15:06:46 +0000 (16:06 +0100)]
io_uring: optimise plugging

Plugging is only needed with requests that also need a file, so hide
plugging under a ->needs_file check. Also, place ->needs_file and ->plug
bits into the same byte of io_op_defs, it may matter for compilers, e.g.
only with the change a tested one decided to optimise two memory testb
into a more with two register testb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1600d1287bb7d16451d4ef3343252787a5314927.1633532552.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: correct fill events helpers types
Pavel Begunkov [Mon, 4 Oct 2021 19:03:00 +0000 (20:03 +0100)]
io_uring: correct fill events helpers types

CQE result is a 32-bit integer, so the functions generating CQEs are
better to accept not long but ints. Convert io_cqring_fill_event() and
other helpers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7ca6f15255e9117eae28adcac272744cae29b113.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: inline io_poll_complete
Pavel Begunkov [Mon, 4 Oct 2021 19:02:59 +0000 (20:02 +0100)]
io_uring: inline io_poll_complete

Inline io_poll_complete(), it's simple and doesn't have any particular
purpose.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/933d7ee3e4450749a2d892235462c8f18d030293.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: inline io_req_needs_clean()
Pavel Begunkov [Mon, 4 Oct 2021 19:02:58 +0000 (20:02 +0100)]
io_uring: inline io_req_needs_clean()

There is only a single user of io_req_needs_clean() inline it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6111d0221ef4b439cad401e135dd6a5f990a0501.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: remove struct io_completion
Pavel Begunkov [Mon, 4 Oct 2021 19:02:57 +0000 (20:02 +0100)]
io_uring: remove struct io_completion

We keep struct io_completion only as a temporal storage of cflags, Place
it in io_kiocb, it's cleaner, removes extra bits and even might be used
for future optimisations.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5299bd5c223204065464bd87a515d0e405316086.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: control ->async_data with a REQ_F flag
Pavel Begunkov [Mon, 4 Oct 2021 19:02:56 +0000 (20:02 +0100)]
io_uring: control ->async_data with a REQ_F flag

->async_data is a slow path, so it won't matter much if we do the clean
up inside io_clean_op(). Moreover, in many cases it's allocated together
with setting one or more of IO_REQ_CLEAN_FLAGS flags, so it'd go through
io_clean_op() anyway.

Control ->async_data allocation with a new flag REQ_F_ASYNC_DATA, so we
can do all the maintainence under io_req_needs_clean() fast check.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6892cf5883c459f36bda26f30ceb16742b20b84b.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise io_free_batch_list()
Pavel Begunkov [Mon, 4 Oct 2021 19:02:55 +0000 (20:02 +0100)]
io_uring: optimise io_free_batch_list()

Delay reading the next node in io_free_batch_list(), allows the compiler
to load the value a bit later improving register spilling in some cases.
With gcc 11.1 it helped to move @task_refs variable from the stack to a
register and optimises out a couple of per request instructions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cc9fdfb6f72a4e8bc9918a5e9f2d97869a263ae4.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: mark cold functions
Pavel Begunkov [Mon, 4 Oct 2021 19:02:54 +0000 (20:02 +0100)]
io_uring: mark cold functions

Attribute cold functions so compilers can optimise them for size. It
shrinks the binary by 2.5-3%

   text    data     bss     dec     hex filename
  90670   14002       8  104680   198e8 ./fs/io_uring.o
  88053   14002       8  102063   18eaf ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b53d385f91dca45170b67d7f11c7abd787e821f6.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise ctx referencing by requests
Pavel Begunkov [Mon, 4 Oct 2021 19:02:53 +0000 (20:02 +0100)]
io_uring: optimise ctx referencing by requests

Currenlty, we allocate one ctx reference per request at submission time
and put them at free. It's batched and not so expensive but it still
bloats the kernel, adds 2 function calls for rcu and adds some overhead
for request counting in io_free_batch_list().

Always keep one reference with a request, even when it's freed and in
io_uring request caches. There is extra work at ring exit / quiesce
paths, which now need to put all cached requests. io_ring_exit_work() is
already looping, so it's not a problem. Add hybrid-busy waiting to
io_ctx_quiesce() as well for now.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/99613fbe396e80777228cde39bbda1aa8938554e.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: merge CQ and poll waitqueues
Pavel Begunkov [Mon, 4 Oct 2021 19:02:52 +0000 (20:02 +0100)]
io_uring: merge CQ and poll waitqueues

->cq_wait and ->poll_wait and waken up in the same manner, use a single
waitqueue for both of them. CQ waiters are queued exclusively, so wake
up should first go over all pollers and that's what we need.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/00fe603e50000365774cf8435ef5fe03f049c1c9.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: don't wake sqpoll in io_cqring_ev_posted
Pavel Begunkov [Mon, 4 Oct 2021 19:02:51 +0000 (20:02 +0100)]
io_uring: don't wake sqpoll in io_cqring_ev_posted

io_cqring_ev_posted() doesn't need to wake SQPOLL, it's either done by
userspace or with task_work, but no action is required on request
completion. Rip off bits waking it up in io_cqring_ev_posted().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b49dab27b64cf11f4c50f2f90dcaac123430e05d.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise INIT_WQ_LIST
Pavel Begunkov [Mon, 4 Oct 2021 19:02:50 +0000 (20:02 +0100)]
io_uring: optimise INIT_WQ_LIST

The invariant of io_wq_work_list is that it's empty IFF ->first is NULL,
so no need to initially set ->last. With now having more users of the
list it may play a role, i.e. used in each tw iteration and on every
completion flushing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c464ab5cab6e46a858c6d39c107e92b3b5291f13.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise request allocation
Pavel Begunkov [Mon, 4 Oct 2021 19:02:49 +0000 (20:02 +0100)]
io_uring: optimise request allocation

Even after fully inlining io_alloc_req() my compiler does a NULL check
in the path of successful allocation, no hacks like an empty dereference
help it. Restructure io_alloc_req() by splitting out refilling part, so
the compiler generate a slightly better binary.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/eda17571bdc7248d8e617b23e7132a5416e4680b.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: delay req queueing into compl-batch list
Pavel Begunkov [Mon, 4 Oct 2021 19:02:48 +0000 (20:02 +0100)]
io_uring: delay req queueing into compl-batch list

io_req_complete_state() is inlined and used in lots of places, so we
want to keep it concise. Move adding a request into a completion batch
list from io_req_complete_state() into the consumer, i.e.
__io_queue_sqe().

before vs after
   text    data     bss     dec     hex filename
  91894   14002       8  105904   19db0 ./fs/io_uring.o
  91046   14002       8  105056   19a60 ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4afca4e11abfd4cc8e99777fdcaf4d34cf4d022d.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: add more likely/unlikely() annotations
Pavel Begunkov [Mon, 4 Oct 2021 19:02:47 +0000 (20:02 +0100)]
io_uring: add more likely/unlikely() annotations

Add two extra unlikely() in io_submit_sqes() and one around
io_req_needs_clean() to help the compiler to avoid extra jumps
in hot paths.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/88e087afe657e7660194353aada9b00f11d480f9.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise kiocb layout
Pavel Begunkov [Mon, 4 Oct 2021 19:02:46 +0000 (20:02 +0100)]
io_uring: optimise kiocb layout

We want ->comp_list in the second cacheline, which is hotter comparing
to the 3rd. Swap the field with ->link, which is not as hot and
controlled by flags and so not accessed unless there is a link.

By the way add a couple of comments for io_kiocb fields.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9d9dde31f8f62279a5f48c575bbc27b8290edc0c.1633373302.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: add flag to not fail link after timeout
Pavel Begunkov [Sat, 2 Oct 2021 18:36:14 +0000 (19:36 +0100)]
io_uring: add flag to not fail link after timeout

For some reason non-off IORING_OP_TIMEOUT always fails links, it's
pretty inconvenient and unnecessary limits chaining after it to hard
linking, which is far from ideal, e.g. doesn't pair well with timeout
cancellation. Add a flag forcing it to not fail links on -ETIME.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/17c7ec0fb7a6113cc6be8cdaedcada0ba836ac0e.1633199723.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: clean up buffer select
Pavel Begunkov [Fri, 1 Oct 2021 17:07:03 +0000 (18:07 +0100)]
io_uring: clean up buffer select

Hiding a pointer to a struct io_buffer in rw.addr is error prone. We
have some place in io_kiocb, so keep kbuf's in a separate field
without aliasing and risks of it being misused.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3e63a6a953b04cad81d9ea827b12344dd57b37b4.1633107393.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: init opcode in io_init_req()
Pavel Begunkov [Fri, 1 Oct 2021 17:07:02 +0000 (18:07 +0100)]
io_uring: init opcode in io_init_req()

Move io_req_prep() call inside of io_init_req(), it simplifies a bit
error handling for callers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a0f59291fd52da4672c323542fd56fd899e23f8f.1633107393.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: don't return from io_drain_req()
Pavel Begunkov [Fri, 1 Oct 2021 17:07:01 +0000 (18:07 +0100)]
io_uring: don't return from io_drain_req()

Never return from io_drain_req() but punt to tw if we've got there but
it's a false positive and we shouldn't actually drain.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93583cee51b8783706b76c73196c155b28d9e762.1633107393.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: extra a helper for drain init
Pavel Begunkov [Fri, 1 Oct 2021 17:07:00 +0000 (18:07 +0100)]
io_uring: extra a helper for drain init

Add a helper io_init_req_drain for initialising requests with
IOSQE_DRAIN set. Also move bits from preambule of io_drain_req() in
there, because we already modify all the bits needed inside the helper.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dcb412825b35b1cb8891245a387d7d69f8d14cef.1633107393.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: disable draining earlier
Pavel Begunkov [Fri, 24 Sep 2021 21:00:04 +0000 (22:00 +0100)]
io_uring: disable draining earlier

Clear ->drain_active in two more cases where we check for a need of
draining. It's not a bug, but still may lead to some extra requests
being punted to io-wq, and that may be not desirable.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d20b265f77bb4e8860b15b9987252c7c711dfcba.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: comment why inline complete calls io_clean_op()
Pavel Begunkov [Fri, 24 Sep 2021 21:00:03 +0000 (22:00 +0100)]
io_uring: comment why inline complete calls io_clean_op()

io_req_complete_state() calls io_clean_op() and it may be not entirely
obvious, leave a comment.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/21806f862151e223fdf439e5e8ed7178a8d66979.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: kill off ->inflight_entry field
Pavel Begunkov [Fri, 24 Sep 2021 21:00:02 +0000 (22:00 +0100)]
io_uring: kill off ->inflight_entry field

->inflight_entry is not used anymore after converting everything to
single linked lists, remove it. Also adjust io_kiocb layout, so all hot
bits are in first 3 cachelines.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fd8d68087ede26c4e1707ce6b175aa1eb2381f2b.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: restructure submit sqes to_submit checks
Pavel Begunkov [Fri, 24 Sep 2021 21:00:01 +0000 (22:00 +0100)]
io_uring: restructure submit sqes to_submit checks

Put an explicit check for number of requests to submit. First,
we can turn while into do-while and it generates better code, and second
that if can be cheaper, e.g. by using CPU flags after sub in
io_sqring_entries().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5926baadd20c28feab7a5e1725fedf32e4553ff7.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: reshuffle queue_sqe completion handling
Pavel Begunkov [Fri, 24 Sep 2021 21:00:00 +0000 (22:00 +0100)]
io_uring: reshuffle queue_sqe completion handling

If a request completed inline the result should only be zero, it's a
grave error otherwise. So, when we see REQ_F_COMPLETE_INLINE it's not
even necessary to check the return code, and the flag check can be moved
earlier.

It's one "if" less for inline completions, and same two checks for it
normally completing (ret == 0). Those are two cases we care about the
most.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ebd4e397a9c26d96c99b24447acc309741041a83.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: inline hot path of __io_queue_sqe()
Pavel Begunkov [Fri, 24 Sep 2021 20:59:59 +0000 (21:59 +0100)]
io_uring: inline hot path of __io_queue_sqe()

Extract slow paths from __io_queue_sqe() into a function and inline the
hot path. With that we have everything completely inlined on the
submission path up until io_issue_sqe().

-> io_submit_sqes()
  -> io_submit_sqe() (inlined)
    -> io_queue_sqe() (inlined)
       -> __io_queue_sqe() (inlined)
         -> io_issue_sqe()

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f1606864d95d7f26dc28c7eec3dc6ed6ec32618a.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: split slow path from io_queue_sqe
Pavel Begunkov [Fri, 24 Sep 2021 20:59:58 +0000 (21:59 +0100)]
io_uring: split slow path from io_queue_sqe

We don't want the slow path of io_queue_sqe to be inlined, so extract a
function from it.

   text    data     bss     dec     hex filename
  91950   13986       8  105944   19dd8 ./fs/io_uring.o
  91758   13986       8  105752   19d18 ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fb01253911f8fb374268f65b1ba939b54ca6583f.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: remove drain_active check from hot path
Pavel Begunkov [Fri, 24 Sep 2021 20:59:57 +0000 (21:59 +0100)]
io_uring: remove drain_active check from hot path

req->ctx->active_drain is a bit too expensive, partially because of two
dereferences. Do a trick, if we see it set in io_init_req(), set
REQ_F_FORCE_ASYNC and it automatically goes through a slower path where
we can catch it. It's nearly free to do in io_init_req() because there
is already ->restricted check and it's in the same byte of a bitmask.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7e7ddc63c15e8a300833132abb3eb8fd3918aef.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: deduplicate io_queue_sqe() call sites
Pavel Begunkov [Fri, 24 Sep 2021 20:59:56 +0000 (21:59 +0100)]
io_uring: deduplicate io_queue_sqe() call sites

There are two call sites of io_queue_sqe() in io_submit_sqe(), combine
them into one, because io_queue_sqe() is inline and we don't want to
bloat binary, and will become even bigger

   text    data     bss     dec     hex filename
  92126   13986       8  106120   19e88 ./fs/io_uring.o
  91966   13986       8  105960   19de8 ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/506124b8e767f0a4576f7a459f6aea3d13fb4dda.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: don't pass state to io_submit_state_end
Pavel Begunkov [Fri, 24 Sep 2021 20:59:55 +0000 (21:59 +0100)]
io_uring: don't pass state to io_submit_state_end

Submission state and ctx and coupled together, no need to passs

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e22d77a5786ef77e0c49b933ad74bae55cfb6ca6.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: don't pass tail into io_free_batch_list
Pavel Begunkov [Fri, 24 Sep 2021 20:59:54 +0000 (21:59 +0100)]
io_uring: don't pass tail into io_free_batch_list

io_free_batch_list() iterates all requests in the passed in list,
so we don't really need to know the tail but can keep iterating until
meet NULL. Just pass the first node into it and it will be enough.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4a12c84b6d887d980e05f417ba4172d04c64acae.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: inline completion batching helpers
Pavel Begunkov [Fri, 24 Sep 2021 20:59:53 +0000 (21:59 +0100)]
io_uring: inline completion batching helpers

We now have a single function for batched put of requests, just inline
struct req_batch and all related helpers into it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/595a2917f80dd94288cd7203052c7934f5446580.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise batch completion
Pavel Begunkov [Fri, 24 Sep 2021 20:59:52 +0000 (21:59 +0100)]
io_uring: optimise batch completion

First, convert rest of iopoll bits to single linked lists, and also
replace per-request list_add_tail() with splicing a part of slist.

With that, use io_free_batch_list() to put/free requests. The main
advantage of it is that it's now the only user of struct req_batch and
friends, and so they can be inlined. The main overhead there was
per-request call to not-inlined io_req_free_batch(), which is expensive
enough.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b37fc6d5954b241e025eead7ab92c6f44a42f229.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: convert iopoll_completed to store_release
Pavel Begunkov [Fri, 24 Sep 2021 20:59:51 +0000 (21:59 +0100)]
io_uring: convert iopoll_completed to store_release

Convert explicit barrier around iopoll_completed to smp_load_acquire()
and smp_store_release(). Similar on the callback side, but replaces a
single smp_rmb() with per-request smp_load_acquire(), neither imply any
extra CPU ordering for x86. Use READ_ONCE as usual where it doesn't
matter.

Use it to move filling CQEs by iopoll earlier, that will be necessary
to avoid traversing the list one extra time in the future.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8bd663cb15efdc72d6247c38ee810964e744a450.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: add a helper for batch free
Pavel Begunkov [Fri, 24 Sep 2021 20:59:50 +0000 (21:59 +0100)]
io_uring: add a helper for batch free

Add a helper io_free_batch_list(), which takes a single linked list and
puts/frees all requests from it in an efficient manner. Will be reused
later.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4fc8306b542c6b1dd1d08e8021ef3bdb0ad15010.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: use single linked list for iopoll
Pavel Begunkov [Fri, 24 Sep 2021 20:59:49 +0000 (21:59 +0100)]
io_uring: use single linked list for iopoll

Use single linked lists for keeping iopoll requests, takes less space,
may be faster, but mostly will be of benefit for further patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/314033676b100cd485518c3bc55e1b95a0dcd71f.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: split iopoll loop
Pavel Begunkov [Fri, 24 Sep 2021 20:59:48 +0000 (21:59 +0100)]
io_uring: split iopoll loop

The main loop of io_do_iopoll() iterates and does ->iopoll() until it
meets a first completed request, then it continues from that position
and splices requests to pass them through io_iopoll_complete().

Split the loop in two for clearness, iopolling and reaping completed
requests from the list.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a7f6fd27a94845e5dc925a47a4a9765a92e514fb.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: replace list with stack for req caches
Pavel Begunkov [Fri, 24 Sep 2021 20:59:47 +0000 (21:59 +0100)]
io_uring: replace list with stack for req caches

Replace struct list_head free_list serving for caching requests with
singly linked stack, which is faster.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1bc942b82422fb2624b8353bd93aca183a022846.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio-wq: add io_wq_work_node based stack
Pavel Begunkov [Fri, 24 Sep 2021 20:59:46 +0000 (21:59 +0100)]
io-wq: add io_wq_work_node based stack

Apart from just using lists (i.e. io_wq_work_list), we also want to have
stacks, which are a bit faster, and have some interoperability between
them. Add a stack implementation based on io_wq_work_node and some
helpers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5d3a412a5ac0d47e0f0499d70d2207d70a68925e.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: remove allocation cache array
Pavel Begunkov [Fri, 24 Sep 2021 20:59:45 +0000 (21:59 +0100)]
io_uring: remove allocation cache array

We have several of request allocation layers, remove the last one, which
is the submit->reqs array, and always use submit->free_reqs instead.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8547095c35f7a87bab14f6447ecd30a273ed7500.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: use slist for completion batching
Pavel Begunkov [Fri, 24 Sep 2021 20:59:44 +0000 (21:59 +0100)]
io_uring: use slist for completion batching

Currently we collect requests for completion batching in an array.
Replace them with a singly linked list. It's as fast as arrays but
doesn't take some much space in ctx, and will be used in future patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a666826f2854d17e9fb9417fb302edfeb750f425.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: make io_do_iopoll return number of reqs
Pavel Begunkov [Fri, 24 Sep 2021 20:59:43 +0000 (21:59 +0100)]
io_uring: make io_do_iopoll return number of reqs

Don't pass nr_events pointer around but return directly, it's less
expensive than pointer increments.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f771a8153a86f16f12ff4272524e9e549c5de40b.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: force_nonspin
Pavel Begunkov [Fri, 24 Sep 2021 20:59:42 +0000 (21:59 +0100)]
io_uring: force_nonspin

We don't really need to pass the number of requests to complete into
io_do_iopoll(), a flag whether to enforce non-spin mode is enough.

Should be straightforward, maybe except io_iopoll_check(). We pass !min
there, because we do never enter with the number of already reaped
requests is larger than the specified @min, apart from the first
iteration, where nr_events is 0 and so the final check should be
identical.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/782b39d1d8ec584eae15bca0a1feb6f0571fe5b8.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: mark having different creds unlikely
Pavel Begunkov [Fri, 24 Sep 2021 20:59:41 +0000 (21:59 +0100)]
io_uring: mark having different creds unlikely

Hint the compiler that it's not as likely to have creds different from
current attached to a request. The current code generation is far from
ideal, hopefully it can help to some compilers to remove duplicated jump
tables and so.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e7815251ac4bf5a4a23d298c752f029ae19f3837.1632516769.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: return boolean value for io_alloc_async_data
Hao Xu [Wed, 22 Sep 2021 10:15:22 +0000 (18:15 +0800)]
io_uring: return boolean value for io_alloc_async_data

boolean value is good enough for io_alloc_async_data.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210922101522.9179-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: optimise io_req_init() sqe flags checks
Pavel Begunkov [Wed, 15 Sep 2021 11:03:38 +0000 (12:03 +0100)]
io_uring: optimise io_req_init() sqe flags checks

IOSQE_IO_DRAIN is quite marginal and we don't care too much about
IOSQE_BUFFER_SELECT. Save to ifs and hide both of them under
SQE_VALID_FLAGS check. Now we first check whether it uses a "safe"
subset, i.e. without DRAIN and BUFFER_SELECT, and only if it's not
true we test the rest of the flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dccfb9ab2ab0969a2d8dc59af88fa0ce44eeb1d5.1631703764.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: remove ctx referencing from complete_post
Pavel Begunkov [Wed, 15 Sep 2021 11:04:20 +0000 (12:04 +0100)]
io_uring: remove ctx referencing from complete_post

Now completions are done from task context, that means that it's either
the task itself, task_work or io-wq worker. In all those cases the ctx
will be staying alive by mutexing, explicit referencing or req references
by iowq. Remove extra ctx pinning from io_req_complete_post().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/60a0e96434c16ab4fe587651448290d61ec9a113.1631703756.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: add more uring info to fdinfo for debug
Hao Xu [Mon, 13 Sep 2021 13:08:54 +0000 (21:08 +0800)]
io_uring: add more uring info to fdinfo for debug

Developers may need some uring info to help themselves debug and address
issues in production. This includes sqring/cqring head/tail and the
detailed sqe/cqe info, which is very useful when an application is hung
on a ring.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20210913130854.38542-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: kill extra wake_up_process in tw add
Pavel Begunkov [Wed, 8 Sep 2021 15:40:53 +0000 (16:40 +0100)]
io_uring: kill extra wake_up_process in tw add

TWA_SIGNAL already wakes the thread, no need in wake_up_process() after
it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e90cf643f633e857443e0c9e72471b221735c50.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: dedup CQE flushing non-empty checks
Pavel Begunkov [Wed, 8 Sep 2021 15:40:52 +0000 (16:40 +0100)]
io_uring: dedup CQE flushing non-empty checks

We don't do io_submit_flush_completions() when there is no requests
enqueued, and every single caller checks for it. Hide that check into
the function not forgetting about inlining. That will make it much
easier for changing the empty check condition in the future.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d7ff8cef5da1b38e8ea648f5aad9a315ddfc7b57.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: inline linked part of io_req_find_next
Pavel Begunkov [Wed, 8 Sep 2021 15:40:51 +0000 (16:40 +0100)]
io_uring: inline linked part of io_req_find_next

Inline part of __io_req_find_next() that returns a request but doesn't
need io_disarm_next(). It's just two places, but makes links a bit
faster.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4126d13f23d0e91b39b3558e16bd86cafa7fcef2.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: inline io_dismantle_req
Pavel Begunkov [Wed, 8 Sep 2021 15:40:50 +0000 (16:40 +0100)]
io_uring: inline io_dismantle_req

io_dismantle_req() is hot, and not _too_ huge. Inline it, there are 3
call sites, which hopefully will turn into 2 in the future.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bdd2dc30716cac270c2403e99bccd6286e4ae201.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: kill off ios_left
Pavel Begunkov [Wed, 8 Sep 2021 15:40:49 +0000 (16:40 +0100)]
io_uring: kill off ios_left

->ios_left is only used to decide whether to plug or not, kill it to
avoid this extra accounting, just use the initial submission number.
There is no much difference in regards of enabling plugging, where this
one does it in a few more cases, but all major ones should be covered
well.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f13993bcf5b477f9a7d52881fc49f9457ea9870a.1631115443.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio-wq: Remove duplicate code in io_workqueue_create()
Bixuan Cui [Sat, 11 Sep 2021 08:58:47 +0000 (16:58 +0800)]
io-wq: Remove duplicate code in io_workqueue_create()

While task_work_add() in io_workqueue_create() is true,
then duplicate code is executed:

  -> clear_bit_unlock(0, &worker->create_state);
  -> io_worker_release(worker);
  -> atomic_dec(&acct->nr_running);
  -> io_worker_ref_put(wq);
  -> return false;

  -> clear_bit_unlock(0, &worker->create_state); // back to io_workqueue_create()
  -> io_worker_release(worker);
  -> kfree(worker);

The io_worker_release() and clear_bit_unlock() are executed twice.

Fixes: 3146cba99aa2 ("io-wq: make worker creation resilient against signals")
Signed-off-by: Bixuan Cui <cuibixuan@huawei.com>
Link: https://lore.kernel.org/r/20210911085847.34849-1-cuibixuan@huawei.com
Reviwed-by: Hao Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: dump sqe contents if issue fails
Jens Axboe [Sat, 11 Sep 2021 22:04:50 +0000 (16:04 -0600)]
io_uring: dump sqe contents if issue fails

I recently had to look at a production problem where a request ended
up getting the dreaded -EINVAL error on submit. The most used and
hence useless of error codes, as it just tells you that something
was wrong with your request, but not more than that.

Let's dump the full sqe contents if we run into an issue failure,
that'll allow easier diagnosing of a wide variety of issues.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoblock: fix too broad elevator check in blk_mq_free_request()
Jens Axboe [Tue, 19 Oct 2021 02:54:39 +0000 (20:54 -0600)]
block: fix too broad elevator check in blk_mq_free_request()

We added RQF_ELV to tell whether there's an IO scheduler attached, and
RQF_ELVPRIV tells us whether there's an IO scheduler with private data
attached. Don't check RQF_ELV in blk_mq_free_request(), what we care
about here is just if we have scheduler private data attached.

This fixes a boot crash

Fixes: 2ff0682da6e0 ("block: store elevator state in request")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: syzbot+eb8104072aeab6cc1195@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agonvme: wire up completion batching for the IRQ path
Jens Axboe [Mon, 18 Oct 2021 14:45:39 +0000 (08:45 -0600)]
nvme: wire up completion batching for the IRQ path

Trivial to do now, just need our own io_comp_batch on the stack and pass
that in to the usual command completion handling.

I pondered making this dependent on how many entries we had to process,
but even for a single entry there's no discernable difference in
performance or latency. Running a sync workload over io_uring:

t/io_uring -b512 -d1 -s1 -c1 -p0 -F1 -B1 -n2 /dev/nvme1n1 /dev/nvme2n1

yields the below performance before the patch:

IOPS=254820, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251174, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=250806, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)

and the following after:

IOPS=255972, BW=124MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251920, BW=123MiB/s, IOS/call=1/1, inflight=(1 1)
IOPS=251794, BW=122MiB/s, IOS/call=1/1, inflight=(1 1)

which definitely isn't slower, about the same if you factor in a bit of
variance. For peak performance workloads, benchmarking shows a 2%
improvement.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2 years agoio_uring: utilize the io batching infrastructure for more efficient polled IO
Jens Axboe [Tue, 12 Oct 2021 15:28:46 +0000 (09:28 -0600)]
io_uring: utilize the io batching infrastructure for more efficient polled IO

Wire up using an io_comp_batch for f_op->iopoll(). If the lower stack
supports it, we can handle high rates of polled IO more efficiently.

This raises the single core efficiency on my system from ~6.1M IOPS to
~6.6M IOPS running a random read workload at depth 128 on two gen2
Optane drives.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>