Pavel Begunkov [Thu, 16 Jun 2022 09:22:03 +0000 (10:22 +0100)]
io_uring: pass poll_find lock back
Instead of using implicit knowledge of what is locked or not after
io_poll_find() and co returns, pass back a pointer to the locked
bucket if any. If set the user must to unlock the spinlock.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dae1dc5749aa34367812ecf62f82fd3f053aae44.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hao Xu [Thu, 16 Jun 2022 09:22:02 +0000 (10:22 +0100)]
io_uring: switch cancel_hash to use per entry spinlock
Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/05d1e135b0c8bce9d1441e6346776589e5783e26.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hao Xu [Thu, 16 Jun 2022 09:22:01 +0000 (10:22 +0100)]
io_uring: poll: remove unnecessary req->ref set
We now don't need to set req->refcount for poll requests since the
reworked poll code ensures no request release race.
Signed-off-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec6fee45705890bdb968b0c175519242753c0215.1655371007.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:22:00 +0000 (10:22 +0100)]
io_uring: don't inline io_put_kbuf
io_put_kbuf() is huge, don't bloat the kernel with inlining.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2e21ccf0be471ffa654032914b9430813cae53f8.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:21:59 +0000 (10:21 +0100)]
io_uring: refactor io_req_task_complete()
Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ae3148ac7eb5cce3e06895cde306e9e959d6f6ae.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:21:58 +0000 (10:21 +0100)]
io_uring: kill REQ_F_COMPLETE_INLINE
REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.
As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/600ba20a9338b8a39b249b23d3d177803613dde4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 16 Jun 2022 09:21:57 +0000 (10:21 +0100)]
io_uring: rw: delegate sync completions to core io_uring
io_issue_sqe() from the io_uring core knows how to complete requests
based on the returned error code, we can delegate io_read()/io_write()
completion to it. Make kiocb_done() to return the right completion
code and propagate it.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/32ef005b45d23bf6b5e6837740dc0331bb051bd4.1655371007.git.asml.silence@gmail.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 15 Jun 2022 22:28:17 +0000 (16:28 -0600)]
io_uring: remove unused IO_REQ_CACHE_SIZE defined
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:56 +0000 (17:33 +0100)]
io_uring: don't set REQ_F_COMPLETE_INLINE in tw
io_req_task_complete() enqueues requests for state completion itself, no
need for REQ_F_COMPLETE_INLINE, which is only serve the purpose of not
bloating the kernel.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/aca80f71464ad02c06f1311d998a2d6ee0b31573.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:55 +0000 (17:33 +0100)]
io_uring: remove check_cq checking from hot paths
All ctx->check_cq events are slow path, don't test every single flag one
by one in the hot path, but add a common guarding if.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dff026585cea7ff3a172a7c83894a3b0111bbf6a.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:54 +0000 (17:33 +0100)]
io_uring: never defer-complete multi-apoll
Luckily, nnobody completes multi-apoll requests outside the polling
functions, but don't set IO_URING_F_COMPLETE_DEFER in any case as
there is nobody who is catching REQ_F_COMPLETE_INLINE, and so will leak
requests if used.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a65ed3f5effd9321ee06e6edea294a03be3e15a0.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:53 +0000 (17:33 +0100)]
io_uring: inline ->registered_rings
There can be only 16 registered rings, no need to allocate an array for
them separately but store it in tctx.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/495f0b953c87994dd9e13de2134019054fa5830d.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:52 +0000 (17:33 +0100)]
io_uring: explain io_wq_work::cancel_seq placement
Add a comment on why we keep ->cancel_seq in struct io_wq_work instead
of struct io_kiocb despite it needed only by io_uring but not io-wq.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/988e87eec9dc700b5dae933df3aefef303502f6c.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:51 +0000 (17:33 +0100)]
io_uring: move small helpers to headers
There is a bunch of inline helpers that will be useful not only to the
core of io_uring, move them to headers.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/22df99c83723e44cba7e945e8519e64e3642c064.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:50 +0000 (17:33 +0100)]
io_uring: refactor ctx slow data placement
Shove all slow path data at the end of ctx and get rid of extra
indention.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bcaf200298dd469af20787650550efc66d89bef2.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:49 +0000 (17:33 +0100)]
io_uring: better caching for ctx timeout fields
Following timeout fields access patterns, move all of them into a
separate cache line inside ctx, so they don't intervene with normal
completion caching, especially since timeout removals and completion
are separated and the later is done via tw.
It also sheds some bytes from io_ring_ctx, 1216B -> 1152B
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4b163793072840de53b3cb66e0c2995e7226ff78.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:48 +0000 (17:33 +0100)]
io_uring: move defer_list to slow data
draining is slow path, move defer_list to the end where slow data lives
inside the context.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e16379391ca72b490afdd24e8944baab849b4a7b.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Wed, 15 Jun 2022 16:33:47 +0000 (17:33 +0100)]
io_uring: make reg buf init consistent
The default (i.e. empty) state of register buffer is dummy_ubuf, so set
it to dummy on init instead of NULL.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c5456aecf03d9627fbd6e65e100e2b5293a6151e.1655310733.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 1 Jun 2022 18:36:42 +0000 (12:36 -0600)]
io_uring: deprecate epoll_ctl support
As far as we know, nobody ever adopted the epoll_ctl management via
io_uring. Deprecate it now with a warning, and plan on removing it in
a later kernel version. When we do remove it, we can revert the following
commits as well:
39220e8d4a2a ("eventpoll: support non-blocking do_epoll_ctl() calls")
58e41a44c488 ("eventpoll: abstract out epoll_ctl() handler")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/io-uring/CAHk-=wiTyisXBgKnVHAGYCNvkmjk=50agS2Uk6nr+n3ssLZg2w@mail.gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Fri, 27 May 2022 16:55:07 +0000 (10:55 -0600)]
io_uring: add support for level triggered poll
By default, the POLL_ADD command does edge triggered poll - if we get
a non-zero mask on the initial poll attempt, we complete the request
successfully.
Support level triggered by always waiting for a notification, regardless
of whether or not the initial mask matches the file state.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 15 Jun 2022 22:27:42 +0000 (16:27 -0600)]
io_uring: move opcode table to opdef.c
We already have the declarations in opdef.h, move the rest into its own
file rather than in the main io_uring.c file.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 13:27:03 +0000 (07:27 -0600)]
io_uring: move read/write related opcodes to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 May 2022 15:44:31 +0000 (09:44 -0600)]
io_uring: move remaining file table manipulation to filetable.c
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 13:12:45 +0000 (07:12 -0600)]
io_uring: move rsrc related data, core, and commands
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 13:07:23 +0000 (07:07 -0600)]
io_uring: split provided buffers handling into its own file
Move both the opcodes related to it, and the internals code dealing with
it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 May 2022 02:36:47 +0000 (20:36 -0600)]
io_uring: move cancelation into its own file
This also helps cleanup the io_uring.h cancel parts, as we can make
things static in the cancel.c file, mostly.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 26 May 2022 02:31:09 +0000 (20:31 -0600)]
io_uring: move poll handling into its own file
Add a io_poll_issue() rather than export the general task_work locking
and io_issue_sqe(), and put the io_op_defs definition and structure into
a separate header file so that poll can use it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 17:57:03 +0000 (11:57 -0600)]
io_uring: add opcode name to io_op_defs
This kills the last per-op switch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 17:48:35 +0000 (11:48 -0600)]
io_uring: include and forward-declaration sanitation
Remove some dead headers we no longer need, and get rid of the
io_ring_ctx and io_uring_fops forward declarations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 17:01:04 +0000 (11:01 -0600)]
io_uring: move io_uring_task (tctx) helpers into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 16:40:19 +0000 (10:40 -0600)]
io_uring: move fdinfo helpers to its own file
This also means moving a bit more of the fixed file handling to the
filetable side, which makes sense separately too.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 16:28:04 +0000 (10:28 -0600)]
io_uring: use io_is_uring_fops() consistently
Convert the last spots that check for io_uring_fops to use the provided
helper instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 15:13:39 +0000 (09:13 -0600)]
io_uring: move SQPOLL related handling into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 14:57:27 +0000 (08:57 -0600)]
io_uring: move timeout opcodes and handling into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 14:56:52 +0000 (08:56 -0600)]
io_uring: move our reference counting into a header
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:42:08 +0000 (06:42 -0600)]
io_uring: move msg_ring into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:25:13 +0000 (06:25 -0600)]
io_uring: split network related opcodes into its own file
While at it, convert the handlers to just use io_eopnotsupp_prep()
if CONFIG_NET isn't set.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:12:18 +0000 (06:12 -0600)]
io_uring: move statx handling to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:09:18 +0000 (06:09 -0600)]
io_uring: move epoll handler to its own file
Would be nice to sort out Kconfig for this and don't even compile
epoll.c if we don't have epoll configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 12:04:14 +0000 (06:04 -0600)]
io_uring: add a dummy -EOPNOTSUPP prep handler
Add it and use it for the epoll handling, if epoll isn't configured.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 11:59:19 +0000 (05:59 -0600)]
io_uring: move uring_cmd handling to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:54:43 +0000 (21:54 -0600)]
io_uring: split out open/close operations
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:43:10 +0000 (21:43 -0600)]
io_uring: separate out file table handling code
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:28:33 +0000 (21:28 -0600)]
io_uring: split out fadvise/madvise operations
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:25:19 +0000 (21:25 -0600)]
io_uring: split out fs related sync/fallocate functions
This splits out sync_file_range, fsync, and fallocate.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:19:47 +0000 (21:19 -0600)]
io_uring: split out splice related operations
This splits out splice and tee support.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Wed, 25 May 2022 03:13:00 +0000 (21:13 -0600)]
io_uring: split out filesystem related operations
This splits out renameat, unlinkat, mkdirat, symlinkat, and linkat.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 17:56:42 +0000 (11:56 -0600)]
io_uring: move nop into its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 17:46:43 +0000 (11:46 -0600)]
io_uring: move xattr related opcodes to its own file
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 21:21:00 +0000 (15:21 -0600)]
io_uring: handle completions in the core
Normally request handlers complete requests themselves, if they don't
return an error. For the latter case, the core will complete it for
them.
This is unhandy for pushing opcode handlers further out, as we don't
want a bunch of inline completion code and we don't want to make the
completion path slower than it is now.
Let the core handle any completion, unless the handler explicitly
asks us not to.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 18:45:38 +0000 (12:45 -0600)]
io_uring: set completion results upfront
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:56:14 +0000 (10:56 -0600)]
io_uring: add io_uring_types.h
This adds definitions of structs that both the core and the various
opcode handlers need to know about.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:26:28 +0000 (10:26 -0600)]
io_uring: define a request type cleanup handler
This can move request type specific cleanup into a private handler,
removing the need for the core io_uring parts to know what types
they are dealing with.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:19:47 +0000 (10:19 -0600)]
io_uring: unify struct io_symlink and io_hardlink
They are really just a subset of each other, just use the one type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:09:32 +0000 (10:09 -0600)]
io_uring: convert iouring_cmd to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:06:46 +0000 (10:06 -0600)]
io_uring: convert xattr to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:05:49 +0000 (10:05 -0600)]
io_uring: convert rsrc_update to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:03:49 +0000 (10:03 -0600)]
io_uring: convert msg and nop to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:01:47 +0000 (10:01 -0600)]
io_uring: convert splice to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 16:01:09 +0000 (10:01 -0600)]
io_uring: convert epoll to io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:59:28 +0000 (09:59 -0600)]
io_uring: convert file system request types to use io_cmd_type
This converts statx, rename, unlink, mkdir, symlink, and hardlink to
use io_cmd_type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:51:05 +0000 (09:51 -0600)]
io_uring: convert madvise/fadvise to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:49:25 +0000 (09:49 -0600)]
io_uring: convert open/close path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:45:22 +0000 (09:45 -0600)]
io_uring: convert timeout path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:33:01 +0000 (09:33 -0600)]
io_uring: convert cancel path to use io_cmd_type
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:30:45 +0000 (09:30 -0600)]
io_uring: convert the sync and fallocate paths to use io_cmd_type
They all share the same struct io_sync, convert them to use the
io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:27:38 +0000 (09:27 -0600)]
io_uring: convert net related opcodes to use io_cmd_type
This converts accept, connect, send/recv, sendmsg/recvmsg, shutdown, and
socket to use io_cmd_type.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:24:42 +0000 (09:24 -0600)]
io_uring: remove recvmsg knowledge from io_arm_poll_handler()
There's a special case for recvmsg with MSG_ERRQUEUE set. This is
problematic as it means the core needs to know about this special
request type.
For now, just add a generic flag for it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:16:40 +0000 (09:16 -0600)]
io_uring: convert poll_update path to use io_cmd_type
Remove struct io_poll_update from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 15:13:46 +0000 (09:13 -0600)]
io_uring: convert poll path to use io_cmd_type
Remove struct io_poll_iocb from io_kiocb, and convert the poll path to
use the io_cmd_type approach instead.
While at it, rename io_poll_iocb to io_poll which is consistent with the
other request type private structures.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 13 Jun 2022 12:57:44 +0000 (06:57 -0600)]
io_uring: convert read/write path to use io_cmd_type
Remove struct io_rw from io_kiocb, and convert the read/write path to
use the io_cmd_type approach instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Tue, 24 May 2022 14:32:05 +0000 (08:32 -0600)]
io_uring: add generic command payload type to struct io_kiocb
Each opcode generally has a command structure in io_kiocb which it can
use to store data associated with that request.
In preparation for having the core layer not know about what's inside
these fields, add a generic io_cmd_data type and put in the union as
well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 23 May 2022 23:30:37 +0000 (17:30 -0600)]
io_uring: move req async preparation into opcode handler
Define an io_op_def->prep_async() handler and push the async preparation
to there. Since we now have that, we can drop ->needs_async_setup, as
they mean the same thing.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 23 May 2022 23:05:03 +0000 (17:05 -0600)]
io_uring: move to separate directory
In preparation for splitting io_uring up a bit, move it into its own
top level directory. It didn't really belong in fs/ anyway, as it's
not a file system only API.
This adds io_uring/ and moves the core files in there, and updates the
MAINTAINERS file for the new location.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Mon, 23 May 2022 22:56:21 +0000 (16:56 -0600)]
io_uring: define a 'prep' and 'issue' handler for each opcode
Rather than have two giant switches for doing request preparation and
then for doing request issue, add a prep and issue handler for each
of them in the io_op_defs[] request definition.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 24 Jul 2022 20:26:27 +0000 (13:26 -0700)]
Linux 5.19-rc8
Adam Borowski [Mon, 18 Jul 2022 13:50:34 +0000 (15:50 +0200)]
certs: make system keyring depend on x509 parser
This code requires x509_load_certificate_list() to be built-in.
Fixes:
60050ffe3d77 ("certs: Move load_certificate_list() to be with the asymmetric keys code")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/202206221515.DqpUuvbQ-lkp@intel.com/
Link: https://lore.kernel.org/all/20220712104554.408dbf42@gandalf.local.home/
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 24 Jul 2022 16:55:53 +0000 (09:55 -0700)]
Merge tag 'perf_urgent_for_v5.19_rc8' of git://git./linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- Reorganize the perf LBR init code so that a TSX quirk is applied
early enough in order for the LBR MSR access to not #GP
* tag 'perf_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/lbr: Fix unchecked MSR access error on HSW
Linus Torvalds [Sun, 24 Jul 2022 16:50:53 +0000 (09:50 -0700)]
Merge tag 'sched_urgent_for_v5.19_rc8' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
"A single fix to correct a wrong BUG_ON() condition for deboosted
tasks"
* tag 'sched_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix BUG_ON condition for deboosted tasks
Linus Torvalds [Sun, 24 Jul 2022 16:40:17 +0000 (09:40 -0700)]
Merge tag 'x86_urgent_for_v5.19_rc8' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"A couple more retbleed fallout fixes.
It looks like their urgency is decreasing so it seems like we've
managed to catch whatever snafus the limited -rc testing has exposed.
Maybe we're getting ready... :)
- Make retbleed mitigations 64-bit only (32-bit will need a bit more
work if even needed, at all).
- Prevent return thunks patching of the LKDTM modules as it is not
needed there
- Avoid writing the SPEC_CTRL MSR on every kernel entry on eIBRS
parts
- Enhance error output of apply_returns() when it fails to patch a
return thunk
- A sparse fix to the sev-guest module
- Protect EFI fw calls by issuing an IBPB on AMD"
* tag 'x86_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Make all RETbleed mitigations 64-bit only
lkdtm: Disable return thunks in rodata.c
x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
x86/alternative: Report missing return thunk details
virt: sev-guest: Pass the appropriate argument type to iounmap()
x86/amd: Use IBPB for firmware calls
Linus Torvalds [Sun, 24 Jul 2022 16:33:13 +0000 (09:33 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fix from Stephen Boyd:
"One more fix to set the correct IO mapping for a clk gate in the
lan966x driver"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: lan966x: Fix the lan966x clock gate register address
Linus Torvalds [Sat, 23 Jul 2022 17:22:26 +0000 (10:22 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
- Check for invalid flags to KVM_CAP_X86_USER_SPACE_MSR
- Fix use of sched_setaffinity in selftests
- Sync kernel headers to tools
- Fix KVM_STATS_UNIT_MAX
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Protect the unused bits in MSR exiting flags
tools headers UAPI: Sync linux/kvm.h with the kernel sources
KVM: selftests: Fix target thread to be migrated in rseq_test
KVM: stats: Fix value for KVM_STATS_UNIT_MAX for boolean stats
Ben Hutchings [Sat, 23 Jul 2022 15:22:47 +0000 (17:22 +0200)]
x86/speculation: Make all RETbleed mitigations 64-bit only
The mitigations for RETBleed are currently ineffective on x86_32 since
entry_32.S does not use the required macros. However, for an x86_32
target, the kconfig symbols for them are still enabled by default and
/sys/devices/system/cpu/vulnerabilities/retbleed will wrongly report
that mitigations are in place.
Make all of these symbols depend on X86_64, and only enable RETHUNK by
default on X86_64.
Fixes:
f43b9876e857 ("x86/retbleed: Add fine grained Kconfig knobs")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YtwSR3NNsWp1ohfV@decadent.org.uk
Linus Torvalds [Fri, 22 Jul 2022 23:40:03 +0000 (16:40 -0700)]
Merge tag 'spi-fix-v5.19-rc7' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A few more small driver specific fixes"
* tag 'spi-fix-v5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi-rspi: Fix PIO fallback on RZ platforms
spi: spi-cadence: Fix SPI NO Slave Select macro definition
spi: bcm2835: bcm2835_spi_handle_err(): fix NULL pointer deref for non DMA transfers
Linus Torvalds [Fri, 22 Jul 2022 20:02:05 +0000 (13:02 -0700)]
Merge tag 'riscv-for-linus-5.19-rc8' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- Two kexec-related build fixes
- A DTS update to make the GPIO nodes match the upcoming dtschema
- A fix that passes -mno-relax directly to the assembler when building
modules, to work around compilers that fail to do so
* tag 'riscv-for-linus-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: add as-options for modules with assembly compontents
riscv: dts: align gpio-key node names with dtschema
RISC-V: kexec: Fix build error without CONFIG_KEXEC
RISCV: kexec: Fix build error without CONFIG_MODULES
Linus Torvalds [Fri, 22 Jul 2022 19:56:49 +0000 (12:56 -0700)]
Merge tag 'acpi-5.19-rc8' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix yet another piece of ACPI CPPC changes fallout on AMD platforms
(Mario Limonciello)"
* tag 'acpi-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Don't require flexible address space if X86_FEATURE_CPPC is supported
Linus Torvalds [Fri, 22 Jul 2022 19:47:09 +0000 (12:47 -0700)]
Merge tag 'io_uring-5.19-2022-07-21' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Fix for a bad kfree() introduced in this cycle, and a quick fix for
disabling buffer recycling for IORING_OP_READV.
The latter will get reworked for 5.20, but it gets the job done for
5.19"
* tag 'io_uring-5.19-2022-07-21' of git://git.kernel.dk/linux-block:
io_uring: do not recycle buffer in READV
io_uring: fix free of unallocated buffer list
Linus Torvalds [Fri, 22 Jul 2022 19:41:14 +0000 (12:41 -0700)]
Merge tag 'block-5.19-2022-07-21' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Just a single fix for missing error propagation for an allocation
failure in raid5"
* tag 'block-5.19-2022-07-21' of git://git.kernel.dk/linux-block:
md/raid5: missing error code in setup_conf()
Linus Torvalds [Fri, 22 Jul 2022 19:36:59 +0000 (12:36 -0700)]
Merge tag 'i2c-for-5.19-rc8' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Two driver bugfixes and a typo fix"
* tag 'i2c-for-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: cadence: Change large transfer count reset logic to be unconditional
i2c: imx: fix typo in comment
i2c: mlxcpld: Fix register setting for 400KHz frequency
Linus Torvalds [Fri, 22 Jul 2022 19:28:47 +0000 (12:28 -0700)]
Merge tag 'gpio-fixes-for-v5.19-rc8' of git://git./linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix several regmap usage issues in gpio-pca953x
- fix out-of-tree build for GPIO selftests
- fix integer overflow in gpio-xilinx
* tag 'gpio-fixes-for-v5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: gpio-xilinx: Fix integer overflow
selftests: gpio: fix include path to kernel headers for out of tree builds
gpio: pca953x: use the correct register address when regcache sync during init
gpio: pca953x: use the correct range when do regmap sync
gpio: pca953x: only use single read/write for No AI mode
Linus Torvalds [Fri, 22 Jul 2022 19:24:04 +0000 (12:24 -0700)]
Merge tag 'pinctrl-v5.19-3' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Only driver fixes:
- NULL check for the ralink and sunplus drivers
- Add Jacky Bai as maintainer for the Freescale pin controllers
- Fix pin config ops for the Ocelot LAN966x and SparX5
- Disallow AMD pin control to be a module: the GPIO lines need to be
active in early boot, so no can do
- Fix the Armada 37xx to use raw spinlocks in the interrupt handler
path to avoid wait context"
* tag 'pinctrl-v5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: armada-37xx: use raw spinlocks for regmap to avoid invalid wait context
pinctrl: armada-37xx: make irq_lock a raw spinlock to avoid invalid wait context
pinctrl: Don't allow PINCTRL_AMD to be a module
pinctrl: ocelot: Fix pincfg
pinctrl: ocelot: Fix pincfg for lan966x
MAINTAINERS: Update freescale pin controllers maintainer
pinctrl: sunplus: Add check for kcalloc
pinctrl: ralink: Check for null return of devm_kcalloc
Linus Torvalds [Fri, 22 Jul 2022 19:19:02 +0000 (12:19 -0700)]
Merge tag 'sound-5.19-rc8' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Only undoes the Rockchip BCLK changes to address a regression"
* tag 'sound-5.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: rockchip-i2s: Undo BCLK pinctrl changes
ASoC: rockchip: i2s: Fix NULL pointer dereference when pinctrl is not found
Linus Torvalds [Fri, 22 Jul 2022 19:14:13 +0000 (12:14 -0700)]
Merge tag 'mmc-v5.19-rc6' of git://git./linux/kernel/git/ulfh/mmc
Pull MMC fix from Ulf Hansson:
- sdhci-omap: Fix a lockdep warning while probing
* tag 'mmc-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-omap: Fix a lockdep warning for PM runtime init
Linus Torvalds [Fri, 22 Jul 2022 19:03:19 +0000 (12:03 -0700)]
Merge tag 'drm-fixes-2022-07-22' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Fixes for this week.
The main one is the i915 firmware fix for the phoronix reported issue.
I've written some firmware guidelines as a result, should land in
-next soon. Otherwise a few amdgpu fixes, a scheduler fix, ttm fix and
two other minor ones.
scheduler:
- scheduling while atomic fix
ttm:
- locking fix
edp:
- variable typo fix
i915:
- add back support for v69 firmware on ADL-P
amdgpu:
- Drop redundant buffer cleanup that can lead to a segfault
- Add a bo_list mutex to avoid possible list corruption in CS
- dmub notification fix
imx:
- fix error path"
* tag 'drm-fixes-2022-07-22' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: Protect the amdgpu_bo_list list with a mutex v2
drm/imx/dcss: Add missing of_node_put() in fail path
drm/i915/guc: support v69 in parallel to v70
drm/i915/guc: Support programming the EU priority in the GuC descriptor
drm/panel-edp: Fix variable typo when saving hpd absent delay from DT
drm/amdgpu: Remove one duplicated ef removal
drm/ttm: fix locking in vmap/vunmap TTM GEM helpers
drm/scheduler: Don't kill jobs in interrupt context
drm/amd/display: Fix new dmub notification enabling in DM
Linus Torvalds [Fri, 22 Jul 2022 17:01:20 +0000 (10:01 -0700)]
Merge tag 'rcu-urgent.2022.07.21a' of git://git./linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
"This contains a pair of commits that fix
282d8998e997 ("srcu: Prevent
expedited GPs and blocking readers from consuming CPU"), which was
itself a fix to an SRCU expedited grace-period problem that could
prevent kernel live patching (KLP) from completing.
That SRCU fix for KLP introduced large (as in minutes) boot-time
delays to embedded Linux kernels running on qemu/KVM. These delays
were due to the emulation of certain MMIO operations controlling
memory layout, which were emulated with one expedited grace period per
access. Common configurations required thousands of boot-time MMIO
accesses, and thus thousands of boot-time expedited SRCU grace
periods.
In these configurations, the occasional sleeps that allowed KLP to
proceed caused excessive boot delays. These commits preserve enough
sleeps to permit KLP to proceed, but few enough that the virtual
embedded kernels still boot reasonably quickly.
This represents a regression introduced in the v5.19 merge window, and
the bug is causing significant inconvenience"
* tag 'rcu-urgent.2022.07.21a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
srcu: Make expedited RCU grace periods block even less frequently
srcu: Block less aggressively for expedited grace periods
Linus Torvalds [Fri, 22 Jul 2022 16:28:34 +0000 (09:28 -0700)]
mmu_gather: fix the CONFIG_MMU_GATHER_NO_RANGE case
Sudip reports that alpha doesn't build properly, with errors like
include/asm-generic/tlb.h:401:1: error: redefinition of 'tlb_update_vma_flags'
401 | tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma)
| ^~~~~~~~~~~~~~~~~~~~
include/asm-generic/tlb.h:372:1: note: previous definition of 'tlb_update_vma_flags' with type 'void(struct mmu_gather *, struct vm_area_struct *)'
372 | tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_area_struct *vma) { }
the cause being that We have this odd situation where some architectures
were never converted to the newer TLB flushing interfaces that have a
range for the flush. Instead people left them alone, and we have them
select the MMU_GATHER_NO_RANGE config option to make the tlb header
files account for this.
Peter Zijlstra cleaned some of these nasty header file games up in
commits
1e9fdf21a433 ("mmu_gather: Remove per arch tlb_{start,end}_vma()")
18ba064e42df ("mmu_gather: Let there be one tlb_{start,end}_vma() implementation")
but tlb_update_vma_flags() was left alone, and then commit
b67fbebd4cf9
("mmu_gather: Force tlb-flush VM_PFNMAP vmas") ended up removing only
_one_ of the two stale duplicate dummy inline functions.
This removes the other stale one.
Somebody braver than me should try to remove MMU_GATHER_NO_RANGE
entirely, but it requires fixing up the oddball architectures that use
it: alpha, m68k, microblaze, nios2 and openrisc.
The fixups should be fairly straightforward ("fix the build errors it
exposes by adding the appropriate range arguments"), but the reason this
wasn't done in the first place is that so few people end up working on
those architectures. But it could be done one architecture at a time,
hint, hint.
Reported-by: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com>
Fixes:
b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas")
Link: https://lore.kernel.org/all/YtpXh0QHWwaEWVAY@debian/
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Srinivas Neeli [Thu, 21 Jul 2022 07:39:09 +0000 (13:09 +0530)]
gpio: gpio-xilinx: Fix integer overflow
Current implementation is not able to configure more than 32 pins
due to incorrect data type. So type casting with unsigned long
to avoid it.
Fixes:
02b3f84d9080 ("xilinx: Switch to use bitmap APIs")
Signed-off-by: Srinivas Neeli <srinivas.neeli@xilinx.com>
Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
Dave Airlie [Fri, 22 Jul 2022 02:16:15 +0000 (12:16 +1000)]
Merge tag 'drm-misc-fixes-2022-07-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
A scheduling-while-atomic fix for drm/scheduler, a locking fix for TTM,
a typo fix for panel-edp and a resource removal fix for imx/dcss
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220721085550.hrwbukj34y56rzva@houat
Ben Dooks [Sun, 29 May 2022 15:22:00 +0000 (16:22 +0100)]
riscv: add as-options for modules with assembly compontents
When trying to load modules built for RISC-V which include assembly files
the kernel loader errors with "unexpected relocation type 'R_RISCV_ALIGN'"
due to R_RISCV_ALIGN relocations being generated by the assembler.
The R_RISCV_ALIGN relocations can be removed at the expense of code space
by adding -mno-relax to gcc and as. In commit
7a8e7da42250138
("RISC-V: Fixes to module loading") -mno-relax is added to the build
variable KBUILD_CFLAGS_MODULE. See [1] for more info.
The issue is that when kbuild builds a .S file, it invokes gcc with
the -mno-relax flag, but this is not being passed through to the
assembler. Adding -Wa,-mno-relax to KBUILD_AFLAGS_MODULE ensures that
the assembler is invoked correctly. This may have now been fixed in
gcc[2] and this addition should not stop newer gcc and as from working.
[1] https://github.com/riscv/riscv-elf-psabi-doc/issues/183
[2] https://github.com/gcc-mirror/gcc/commit/
3b0a7d624e64eeb81e4d5e8c62c46d86ef521857
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Link: https://lore.kernel.org/r/20220529152200.609809-1-ben.dooks@codethink.co.uk
Fixes:
ab1ef68e5401 ("RISC-V: Add sections of PLT and GOT for kernel module")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Linus Torvalds [Thu, 21 Jul 2022 18:28:26 +0000 (11:28 -0700)]
Merge tag 'mtd/fixes-for-5.19-final' of git://git./linux/kernel/git/mtd/linux
Pull MTD fix from Richard Weinberger:
"A aingle NAND controller fix:
- gpmi: Fix busy timeout setting (wrong calculation, yes again)"
* tag 'mtd/fixes-for-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: gpmi: Set WAIT_FOR_READY timeout based on program/erase times