platform/kernel/linux-rpi.git
3 years agoio_uring: don't change sqpoll creds if not needed
Pavel Begunkov [Thu, 24 Jun 2021 14:09:55 +0000 (15:09 +0100)]
io_uring: don't change sqpoll creds if not needed

SQPOLL doesn't need to change creds if it's not submitting requests.
Move creds overriding into __io_sq_thread() after checking if there are
SQEs pending.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c54368da2357ac539e0a333f7cfff70d5fb045b2.1624543113.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: Create define to modify a SQPOLL parameter
Olivier Langlois [Wed, 23 Jun 2021 18:50:18 +0000 (11:50 -0700)]
io_uring: Create define to modify a SQPOLL parameter

The magic number used to cap the number of entries extracted from an
io_uring instance SQ before moving to the other instances is an
interesting parameter to experiment with.

A define has been created to make it easy to change its value from a
single location.

Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b401640063e77ad3e9f921e09c9b3ac10a8bb923.1624473200.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: Fix race condition when sqp thread goes to sleep
Olivier Langlois [Wed, 23 Jun 2021 18:50:11 +0000 (11:50 -0700)]
io_uring: Fix race condition when sqp thread goes to sleep

If an asynchronous completion happens before the task is preparing
itself to wait and set its state to TASK_INTERRUPTIBLE, the completion
will not wake up the sqp thread.

Cc: stable@vger.kernel.org
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d1419dc32ec6a97b453bee34dc03fa6a02797142.1624473200.git.olivier@trillion01.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: improve in tctx_task_work() resubmission
Pavel Begunkov [Thu, 17 Jun 2021 17:14:10 +0000 (18:14 +0100)]
io_uring: improve in tctx_task_work() resubmission

If task_state is cleared, io_req_task_work_add() will go the slow path
adding a task_work, setting the task_state, waking up the task and so
on. Not to mention it's expensive. tctx_task_work() first clears the
state and then executes all the work items queued, so if any of them
resubmits or adds new task_work items, it would unnecessarily go through
the slow path of io_req_task_work_add().

Let's clear the ->task_state at the end. We still have to check
->task_list for emptiness afterward to synchronise with
io_req_task_work_add(), do that, and set the state back if we're going
to retry, because clearing not-ours task_state on the next iteration
would be buggy.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ef72cdac7022adf0cd7ce4bfe3bb5c82a62eb93.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't resched with empty task_list
Pavel Begunkov [Thu, 17 Jun 2021 17:14:09 +0000 (18:14 +0100)]
io_uring: don't resched with empty task_list

Entering tctx_task_work() with empty task_list is a strange scenario,
that can happen only on rare occasion during task exit, so let's not
check for task_list emptiness in advance and do it do-while style. The
code still correct for the empty case, just would do extra work about
which we don't care.

Do extra step and do the check before cond_resched(), so we don't
resched if have nothing to execute.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c4173e288e69793d03c7d7ce826f9d28afba718a.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor tctx task_work list splicing
Pavel Begunkov [Thu, 17 Jun 2021 17:14:08 +0000 (18:14 +0100)]
io_uring: refactor tctx task_work list splicing

We don't need a full copy of tctx->task_list in tctx_task_work(), but
only a first one, so just assign node directly.

Taking into account that task_works are run in a context of a task,
it's very unlikely to first see non-empty tctx->task_list and then
splice it empty, can only happen with task_work cancellations that is
not-normal slow path anyway. Hence, get rid of the check in the end,
it's there not for validity but "performance" purposes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d076c83fedb8253baf43acb23b8fafd7c5da1714.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise task_work submit flushing
Pavel Begunkov [Thu, 17 Jun 2021 17:14:07 +0000 (18:14 +0100)]
io_uring: optimise task_work submit flushing

tctx_task_work() tries to fetch a next batch of requests, but before it
would flush completions from the previous batch that may be sub-optimal.
E.g. io_req_task_queue() executes a head of the link where all the
linked may be enqueued through the same io_req_task_queue(). And there
are more cases for that.

Do the flushing at the end, so it can cache completions of several waves
of a single tctx_task_work(), and do the flush at the very end.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3cac83934e4fbce520ff8025c3524398b3ae0270.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: inline __tctx_task_work()
Pavel Begunkov [Thu, 17 Jun 2021 17:14:06 +0000 (18:14 +0100)]
io_uring: inline __tctx_task_work()

Inline __tctx_task_work() into tctx_task_work() in preparation for
further optimisations.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f9c05c4bc9763af7bd8e25ebc3c5f7b6f69148f8.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_get_sequence()
Pavel Begunkov [Thu, 17 Jun 2021 17:14:05 +0000 (18:14 +0100)]
io_uring: refactor io_get_sequence()

Clean up io_get_sequence() and add a comment describing the magic around
sequence correction.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f55dc409936b8afa4698d24b8677a34d31077ccb.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: clean all flags in io_clean_op() at once
Pavel Begunkov [Thu, 17 Jun 2021 17:14:04 +0000 (18:14 +0100)]
io_uring: clean all flags in io_clean_op() at once

Clean all flags in io_clean_op() in the end in one operation, will save
us a couple of operation and binary size.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b8efe1f022a037f74e7fe497c69fb554d59bfeaf.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify iovec freeing in io_clean_op()
Pavel Begunkov [Thu, 17 Jun 2021 17:14:03 +0000 (18:14 +0100)]
io_uring: simplify iovec freeing in io_clean_op()

We don't get REQ_F_NEED_CLEANUP for rw unless there is ->free_iovec set,
so remove the optimisation of NULL checking it inline, kfree() will take
care if that would ever be the case.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a233dc655d3d45bd4f69b73d55a61de46d914415.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: track request creds with a flag
Pavel Begunkov [Thu, 17 Jun 2021 17:14:02 +0000 (18:14 +0100)]
io_uring: track request creds with a flag

Currently, if req->creds is not NULL, then there are creds assigned.
Track the invariant with a new flag in req->flags. No need to clear the
field at init, and also cleanup can be efficiently moved into
io_clean_op().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5f8baeb8d3b909487f555542350e2eac97005556.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move creds from io-wq work to io_kiocb
Pavel Begunkov [Thu, 17 Jun 2021 17:14:01 +0000 (18:14 +0100)]
io_uring: move creds from io-wq work to io_kiocb

io-wq now doesn't have anything to do with creds now, so move ->creds
from struct io_wq_work into request (aka struct io_kiocb).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8520c72ab8b8f4b96db12a228a2ab4c094ae64e1.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_submit_flush_completions()
Pavel Begunkov [Thu, 17 Jun 2021 17:14:00 +0000 (18:14 +0100)]
io_uring: refactor io_submit_flush_completions()

struct io_comp_state is always contained in struct io_ring_ctx, don't
pass them into io_submit_flush_completions() separately, it makes the
interface cleaner and simplifies it for the compiler.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/44d6ca57003a82484338e95197024dbd65a1b376.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix false WARN_ONCE
Pavel Begunkov [Thu, 17 Jun 2021 17:13:59 +0000 (18:13 +0100)]
io_uring: fix false WARN_ONCE

WARNING: CPU: 1 PID: 11749 at fs/io-wq.c:244 io_wqe_wake_worker fs/io-wq.c:244 [inline]
WARNING: CPU: 1 PID: 11749 at fs/io-wq.c:244 io_wqe_enqueue+0x7f6/0x910 fs/io-wq.c:751

A WARN_ON_ONCE() in io_wqe_wake_worker() can be triggered by a valid
userspace setup. Replace it with pr_warn.

Reported-by: syzbot+ea2f1484cffe5109dc10@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f7ede342c3342c4c26668f5168e2993e38bbd99c.1623949695.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: allow user configurable IO thread CPU affinity
Jens Axboe [Thu, 17 Jun 2021 16:19:54 +0000 (10:19 -0600)]
io_uring: allow user configurable IO thread CPU affinity

io-wq defaults to per-node masks for IO workers. This works fine by
default, but isn't particularly handy for workloads that prefer more
specific affinities, for either performance or isolation reasons.

This adds IORING_REGISTER_IOWQ_AFF that allows the user to pass in a CPU
mask that is then applied to IO thread workers, and an
IORING_UNREGISTER_IOWQ_AFF that simply resets the masks back to the
default of per-node.

Note that no care is given to existing IO threads, they will need to go
through a reschedule before the affinity is correct if they are already
running or sleeping.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: use private CPU mask
Jens Axboe [Thu, 17 Jun 2021 16:08:11 +0000 (10:08 -0600)]
io-wq: use private CPU mask

In preparation for allowing user specific CPU masks for IO thread
creation, switch to using a mask embedded in the per-node wqe
structure.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: remove header files not needed anymore
Olivier Langlois [Mon, 31 May 2021 06:54:59 +0000 (02:54 -0400)]
io-wq: remove header files not needed anymore

mm related header files are not needed for io-wq module.
remove them for a small clean-up.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: minor clean up in trace events definition
Olivier Langlois [Mon, 31 May 2021 06:54:15 +0000 (02:54 -0400)]
io_uring: minor clean up in trace events definition

Fix tabulation to make nice columns

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: Add to traces the req pointer when available
Olivier Langlois [Mon, 31 May 2021 06:36:37 +0000 (02:36 -0400)]
io_uring: Add to traces the req pointer when available

The req pointer uniquely identify a specific request.
Having it in traces can provide valuable insights that is not possible
to have if the calling process is reusing the same user_data value.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Olivier Langlois <olivier@trillion01.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise io_commit_cqring()
Pavel Begunkov [Tue, 15 Jun 2021 15:47:58 +0000 (16:47 +0100)]
io_uring: optimise io_commit_cqring()

In most cases io_commit_cqring() is just an smp_store_release(), and
it's hot enough, especially for IRQ rw, to want it to save on a function
call. Mark it inline and extract a non-inlined slow path doing drain
and timeout flushing. The inlined part is pretty slim to not cause
binary bloating.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7350f8b6b92caa50a48a80be39909f0d83eddd93.1623772051.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: shove more drain bits out of hot path
Pavel Begunkov [Tue, 15 Jun 2021 15:47:57 +0000 (16:47 +0100)]
io_uring: shove more drain bits out of hot path

Place all drain_next logic into io_drain_req(), so it's never executed
if there was no drained requests before. The only thing we need is to
set ->drain_active if we see a request with IOSQE_IO_DRAIN, do that in
io_init_req() where flags are definitely in registers.

Also, all drain-related code is encapsulated in io_drain_req(), makes it
cleaner.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/68bf4f7395ddaafbf1a26bd97b57d57d45a9f900.1623772051.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: switch !DRAIN fast path when possible
Pavel Begunkov [Tue, 15 Jun 2021 15:47:56 +0000 (16:47 +0100)]
io_uring: switch !DRAIN fast path when possible

->drain_used is one way, which is not optimal if users use DRAIN but
very rarely. However, we can just clear it in io_drain_req() when all
drained before requests are gone. Also rename the flag to reflect the
change and be more clear about it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7f37a240857546a94df6348507edddacab150460.1623772051.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix min types mismatch in table alloc
Pavel Begunkov [Tue, 15 Jun 2021 12:20:13 +0000 (13:20 +0100)]
io_uring: fix min types mismatch in table alloc

fs/io_uring.c: In function 'io_alloc_page_table':
include/linux/minmax.h:20:28: warning: comparison of distinct pointer
types lacks a cast

Cast everything to size_t using min_t.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 9123c8ffce16 ("io_uring: add helpers for 2 level table alloc")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/50f420a956bca070a43810d4a805293ed54f39d8.1623759527.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: Fix comment of io_get_sqe
Fam Zheng [Fri, 4 Jun 2021 16:42:56 +0000 (17:42 +0100)]
io_uring: Fix comment of io_get_sqe

The sqe_ptr argument has been gone since 709b302faddf (io_uring:
simplify io_get_sqring, 2020-04-08), made the return value of the
function. Update the comment accordingly.

Signed-off-by: Fam Zheng <fam.zheng@bytedance.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210604164256.12242-1-fam.zheng@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise non-drain path
Pavel Begunkov [Mon, 14 Jun 2021 22:37:31 +0000 (23:37 +0100)]
io_uring: optimise non-drain path

Replace drain checks with one-way flag set upon seeing the first
IOSQE_IO_DRAIN request. There are several places where it cuts cycles
well:

1) It's much faster than the fast check with two
conditions in io_drain_req() including pretty complex
list_empty_careful().

2) We can mark io_queue_sqe() inline now, that's a huge win.

3) It replaces timeout and drain checks in io_commit_cqring() with a
single flags test. Also great not touching ->defer_list there without a
reason so limiting cache bouncing.

It adds a small amount of overhead to drain path, but it's negligible.
The main nuisance is that once it meets any DRAIN request in io_uring
instance lifetime it will _always_ go through a slower path, so
drain-less and offset-mode timeout less applications are preferable.
The overhead in that case would be not big, but it's worth to bear in
mind.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/98d2fff8c4da5144bb0d08499f591d4768128ea3.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_req_defer()
Pavel Begunkov [Mon, 14 Jun 2021 22:37:30 +0000 (23:37 +0100)]
io_uring: refactor io_req_defer()

Rename io_req_defer() into io_drain_req() and refactor it uncoupling it
from io_queue_sqe() error handling and preparing for coming
optimisations. Also, prioritise non IOSQE_ASYNC path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4f17dd56e7fbe52d1866f8acd8efe3284d2bebcb.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move uring_lock location
Pavel Begunkov [Mon, 14 Jun 2021 22:37:29 +0000 (23:37 +0100)]
io_uring: move uring_lock location

->uring_lock is prevalently used for submission, even though it protects
many other things like iopoll, registeration, selected bufs, and more.
And it's placed together with ->cq_wait poked on completion and CQ
waiting sides. Move them apart, ->uring_lock goes to the submission
data, and cq_wait to completion related chunk. The last one requires
some reshuffling so everything needed by io_cqring_ev_posted*() is in
one cacheline.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dea5e845caee4c98aa0922b46d713154d81f7bd8.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: wait heads renaming
Pavel Begunkov [Mon, 14 Jun 2021 22:37:28 +0000 (23:37 +0100)]
io_uring: wait heads renaming

We use several wait_queue_head's for different purposes, but namings are
confusing. First rename ctx->cq_wait into ctx->poll_wait, because this
one is used for polling an io_uring instance. Then rename ctx->wait into
ctx->cq_wait, which is responsible for CQE waiting.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/47b97a097780c86c67b20b6ccc4e077523dce682.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: clean up check_overflow flag
Pavel Begunkov [Mon, 14 Jun 2021 22:37:27 +0000 (23:37 +0100)]
io_uring: clean up check_overflow flag

There are no users of ->sq_check_overflow, only ->cq_check_overflow is
used. Combine it and move out of completion related part of struct
io_ring_ctx.

A not so obvious benefit of it is fitting all completion side fields
into a single cacheline. It was taking 2 lines before with 56B padding,
and io_cqring_ev_posted*() were still touching both of them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/25927394964df31d113e3c729416af573afff5f5.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: small io_submit_sqe() optimisation
Pavel Begunkov [Mon, 14 Jun 2021 22:37:26 +0000 (23:37 +0100)]
io_uring: small io_submit_sqe() optimisation

submit_state.link is used only to assemble a link and not used for
actual submission, so clear it before io_queue_sqe() in io_submit_sqe(),
awhile it's hot and in caches and queueing doesn't spoil it. May also
potentially help compiler with spilling or to do other optimisations.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1579939426f3ad6b55af3005b1389bbbed7d780d.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: optimise completion timeout flushing
Pavel Begunkov [Mon, 14 Jun 2021 22:37:25 +0000 (23:37 +0100)]
io_uring: optimise completion timeout flushing

io_commit_cqring() might be very hot and we definitely don't want to
touch ->timeout_list there, because 1) it's shared with the submission
side so might lead to cache bouncing and 2) may need to load an extra
cache line, especially for IRQ completions.

We're interested in it at the completion side only when there are
offset-mode timeouts, which are not so popular. Replace
list_empty(->timeout_list) hot path check with a new one-way flag, which
is set when we prepare the first offset-mode timeout.

note: the flag sits in the same line as briefly used after ->rings

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e4892ec68b71a69f92ffbea4a1499be3ec0d463b.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't cache number of dropped SQEs
Pavel Begunkov [Mon, 14 Jun 2021 22:37:24 +0000 (23:37 +0100)]
io_uring: don't cache number of dropped SQEs

Kill ->cached_sq_dropped and wire DRAIN sequence number correction via
->cq_extra, which is there exactly for that purpose. User visible
dropped counter will be populated by incrementing it instead of keeping
a copy, similarly as it was done not so long ago with cq_overflow.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/088aceb2707a534d531e2770267c4498e0507cc1.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_get_sqe()
Pavel Begunkov [Mon, 14 Jun 2021 22:37:23 +0000 (23:37 +0100)]
io_uring: refactor io_get_sqe()

The line of io_get_sqe() evaluating @head consists of too many
operations including READ_ONCE(), it's not convenient for probing.
Refactor it also improving readability.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/866ad6e4ef4851c7c61f6b0e08dbd0a8d1abce84.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: shuffle more fields into SQ ctx section
Pavel Begunkov [Mon, 14 Jun 2021 22:37:22 +0000 (23:37 +0100)]
io_uring: shuffle more fields into SQ ctx section

Since moving locked_free_* out of struct io_submit_state
ctx->submit_state is accessed on submission side only, so move it into
the submission section. Same goes for rsrc table pointers/nodes/etc.,
they must be taken and checked during submission because sync'ed by
uring_lock, so move them there as well.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8a5899a50afc6ccca63249e716f580b246f3dec6.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: move ctx->flags from SQ cacheline
Pavel Begunkov [Mon, 14 Jun 2021 22:37:21 +0000 (23:37 +0100)]
io_uring: move ctx->flags from SQ cacheline

ctx->flags are heavily used by both, completion and submission sides, so
move it out from the ctx fields related to submissions. Instead, place
it together with ctx->refs, because it's already cacheline-aligned and
so pads lots of space, and both almost never change. Also, in most
occasions they are accessed together as refs are taken at submission
time and put back during completion.

Do same with ctx->rings, where the pointer itself is never modified
apart from ring init/free.

Note: in percpu mode, struct percpu_ref doesn't modify the struct itself
but takes indirection with ref->percpu_count_ptr.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4c48c173e63d35591383ba2b87e8b8e8dfdbd23d.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: keep SQ pointers in a single cacheline
Pavel Begunkov [Mon, 14 Jun 2021 22:37:20 +0000 (23:37 +0100)]
io_uring: keep SQ pointers in a single cacheline

sq_array and sq_sqes are always used together, however they are in
different cachelines, where the borderline is right before
cq_overflow_list is rather rarely touched. Move the fields together so
it loads only one cacheline.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3ef2411a94874da06492506a8897eff679244f49.1623709150.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: remove redundant initialization of variable ret
Colin Ian King [Tue, 15 Jun 2021 14:34:24 +0000 (15:34 +0100)]
io-wq: remove redundant initialization of variable ret

The variable ret is being initialized with a value that is never read, the
assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210615143424.60449-1-colin.king@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: Fix incorrect sizeof operator for copy_from_user call
Colin Ian King [Tue, 15 Jun 2021 13:00:11 +0000 (14:00 +0100)]
io_uring: Fix incorrect sizeof operator for copy_from_user call

Static analysis is warning that the sizeof being used is should be
of *data->tags[i] and not data->tags[i]. Although these are the same
size on 64 bit systems it is not a portable assumption to assume
this is true for all cases.  Fix this by using a temporary pointer
tag_slot to make the code a clearer.

Addresses-Coverity: ("Sizeof not portable")
Fixes: d878c81610e1 ("io_uring: hide rsrc tag copy into generic helpers")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20210615130011.57387-1-colin.king@canonical.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: inline io_iter_do_read()
Pavel Begunkov [Mon, 14 Jun 2021 01:36:24 +0000 (02:36 +0100)]
io_uring: inline io_iter_do_read()

There are only two calls in source code of io_iter_do_read(), the
function is small and pretty hot though is failed to get inlined.
Makr it as inline.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/25a26dae7660da73fbc2244b361b397ef43d3caf.1623634182.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: unify SQPOLL and user task cancellations
Pavel Begunkov [Mon, 14 Jun 2021 01:36:23 +0000 (02:36 +0100)]
io_uring: unify SQPOLL and user task cancellations

Merge io_uring_cancel_sqpoll() and __io_uring_cancel() as it's easier to
have a conditional ctx traverse inside than keeping them in sync.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/adfe24d6dad4a3883a40eee54352b8b65ac851bb.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: cache task struct refs
Pavel Begunkov [Mon, 14 Jun 2021 01:36:22 +0000 (02:36 +0100)]
io_uring: cache task struct refs

tctx in submission part is always synchronised because is executed from
the task's context, so we can batch allocate tctx/task references and
store them across syscall boundaries. It avoids enough of operations,
including an atomic for getting task ref and a percpu_counter_add()
function call, which still fallback to spinlock for large batching
cases (around >=32). Should be good for SQPOLL submitting in small
portions and coming at some moment bpf submissions.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/14b327b973410a3eec1f702ecf650e100513aca9.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't vmalloc rsrc tags
Pavel Begunkov [Mon, 14 Jun 2021 01:36:21 +0000 (02:36 +0100)]
io_uring: don't vmalloc rsrc tags

We don't really need vmalloc for keeping tags, it's not a hot path and
is there out of convenience, so replace it with two level tables to not
litter kernel virtual memory mappings.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/241a3422747113a8909e7e1030eb585d4a349e0d.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: add helpers for 2 level table alloc
Pavel Begunkov [Mon, 14 Jun 2021 01:36:20 +0000 (02:36 +0100)]
io_uring: add helpers for 2 level table alloc

Some parts like fixed file table use 2 level tables, factor out helpers
for allocating/deallocating them as more users are to come.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1709212359cd82eb416d395f86fc78431ccfc0aa.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove rsrc put work irq save/restore
Pavel Begunkov [Mon, 14 Jun 2021 01:36:19 +0000 (02:36 +0100)]
io_uring: remove rsrc put work irq save/restore

io_rsrc_put_work() is executed by workqueue in non-irq context, so no
need for irqsave/restore variants of spinlocking.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2a7f77220735f4ad404ac885b4d73bdf42d2f836.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: hide rsrc tag copy into generic helpers
Pavel Begunkov [Mon, 14 Jun 2021 01:36:18 +0000 (02:36 +0100)]
io_uring: hide rsrc tag copy into generic helpers

Make io_rsrc_data_alloc() taking care of rsrc tags loading on
registration, so we don't need to repeat it for each new rsrc type.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5609680697bd09735de10561b75edb95283459da.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: simplify worker exiting
Pavel Begunkov [Mon, 14 Jun 2021 01:36:17 +0000 (02:36 +0100)]
io-wq: simplify worker exiting

io_worker_handle_work() already takes care of the empty list case and
releases spinlock, so get rid of ugly conditional unlocking and
unconditionally call handle_work()

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7521e485677f381036676943e876a0afecc23017.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: don't repeat IO_WQ_BIT_EXIT check by worker
Pavel Begunkov [Mon, 14 Jun 2021 01:36:16 +0000 (02:36 +0100)]
io-wq: don't repeat IO_WQ_BIT_EXIT check by worker

io_wqe_worker()'s main loop does check IO_WQ_BIT_EXIT flag, so no need
for a second test_bit at the end as it will immediately jump to the
first check afterwards.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6af4a51c86523a527fb5417c9fbc775c4b26497.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: rename function *task_file
Pavel Begunkov [Mon, 14 Jun 2021 01:36:15 +0000 (02:36 +0100)]
io_uring: rename function *task_file

What at some moment was references to struct file used to control
lifetimes of task/ctx is now just internal tctx structures/nodes,
so rename outdated *task_file() routines into something more sensible.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e2fbce42932154c2631ce58ffbffaa232afe18d5.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: refactor io_iopoll_req_issued
Pavel Begunkov [Mon, 14 Jun 2021 01:36:14 +0000 (02:36 +0100)]
io_uring: refactor io_iopoll_req_issued

A simple refactoring of io_iopoll_req_issued(), move in_async inside so
we don't pass it around and save on double checking it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1513bfde4f0c835be25ac69a82737ab0668d7665.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: remove unused io-wq refcounting
Pavel Begunkov [Mon, 14 Jun 2021 01:36:13 +0000 (02:36 +0100)]
io-wq: remove unused io-wq refcounting

iowq->refs is initialised to one and killed on exit, so it's not used
and we can kill it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/401007393528ea7c102360e69a29b64498e15db2.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: embed wqe ptr array into struct io_wq
Pavel Begunkov [Mon, 14 Jun 2021 01:36:12 +0000 (02:36 +0100)]
io-wq: embed wqe ptr array into struct io_wq

io-wq keeps an array of pointers to struct io_wqe, allocate this array
as a part of struct io-wq, it's easier to code and saves an extra
indirection for nearly each io-wq call.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1482c6a001923bbed662dc38a8a580fb08b1ed8c.1623634181.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: fix blocking inline submission
Pavel Begunkov [Wed, 9 Jun 2021 11:07:25 +0000 (12:07 +0100)]
io_uring: fix blocking inline submission

There is a complaint against sys_io_uring_enter() blocking if it submits
stdin reads. The problem is in __io_file_supports_async(), which
sees that it's a cdev and allows it to be processed inline.

Punt char devices using generic rules of io_file_supports_async(),
including checking for presence of *_iter() versions of rw callbacks.
Apparently, it will affect most of cdevs with some exceptions like
null and zero devices.

Cc: stable@vger.kernel.org
Reported-by: Birk Hirdman <lonjil@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d60270856b8a4560a639ef5f76e55eb563633599.1623236455.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: enable shmem/memfd memory registration
Pavel Begunkov [Wed, 9 Jun 2021 14:26:54 +0000 (15:26 +0100)]
io_uring: enable shmem/memfd memory registration

Relax buffer registration restictions, which filters out file backed
memory, and allow shmem/memfd as they have normal anonymous pages
underneath.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: don't bounce submit_state cachelines
Pavel Begunkov [Sun, 16 May 2021 21:58:12 +0000 (22:58 +0100)]
io_uring: don't bounce submit_state cachelines

struct io_submit_state contains struct io_comp_state and so
locked_free_*, that renders cachelines around ->locked_free* being
invalidated on most non-inline completions, that may terrorise caches if
submissions and completions are done by different tasks.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/290cb5412b76892e8631978ee8ab9db0c6290dd5.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: rename io_get_cqring
Pavel Begunkov [Sun, 16 May 2021 21:58:11 +0000 (22:58 +0100)]
io_uring: rename io_get_cqring

Rename io_get_cqring() into io_get_cqe() for consistency with SQ, and
just because the old name is not as clear.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a46a53e3f781de372f5632c184e61546b86515ce.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: kill cached_cq_overflow
Pavel Begunkov [Sun, 16 May 2021 21:58:10 +0000 (22:58 +0100)]
io_uring: kill cached_cq_overflow

There are two copies of cq_overflow, shared with userspace and internal
cached one. It was needed for DRAIN accounting, but now we have yet
another knob to tune the accounting, i.e. cq_extra, and we can throw
away the internal counter and just increment the one in the shared ring.

If user modifies it as so never gets the right overflow value ever
again, it's its problem, even though before we would have restored it
back by next overflow.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8427965f5175dd051febc63804909861109ce859.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: deduce cq_mask from cq_entries
Pavel Begunkov [Sun, 16 May 2021 21:58:09 +0000 (22:58 +0100)]
io_uring: deduce cq_mask from cq_entries

No need to cache cq_mask, it's exactly cq_entries - 1, so just deduce
it to not carry it around.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d439efad0503c8398451dae075e68a04362fbc8d.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove dependency on ring->sq/cq_entries
Pavel Begunkov [Sun, 16 May 2021 21:58:08 +0000 (22:58 +0100)]
io_uring: remove dependency on ring->sq/cq_entries

We have numbers of {sq,cq} entries cached in ctx, don't look up them in
user-shared rings as 1) it may fetch additional cacheline 2) user may
change it and so it's always error prone.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/745d31bc2da41283ddd0489ef784af5c8d6310e9.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: better locality for rsrc fields
Pavel Begunkov [Sun, 16 May 2021 21:58:07 +0000 (22:58 +0100)]
io_uring: better locality for rsrc fields

ring has two types of resource-related fields: used for request
submission, and field needed for update/registration. Reshuffle them
into these two groups for better locality and readability. The second
group is not in the hot path, so it's natural to place them somewhere in
the end. Also update an outdated comment.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/05b34795bb4440f4ec4510f08abd5a31830f8ca0.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: shuffle rarely used ctx fields
Pavel Begunkov [Sun, 16 May 2021 21:58:06 +0000 (22:58 +0100)]
io_uring: shuffle rarely used ctx fields

There is a bunch of scattered around ctx fields that are almost never
used, e.g. only on ring exit, plunge them to the end, better locality,
better aesthetically.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/782ff94b00355923eae757d58b1a47821b5b46d4.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: make fail flag not link specific
Pavel Begunkov [Sun, 16 May 2021 21:58:05 +0000 (22:58 +0100)]
io_uring: make fail flag not link specific

The main difference is in req_set_fail_links() renamed into
req_set_fail(), which now sets REQ_F_FAIL_LINK/REQ_F_FAIL flag
unconditional on whether it has been a link or not. It only matters in
io_disarm_next(), which already handles it well, and all calls to it
have a fast path checking REQ_F_LINK/HARDLINK.

It looks cleaner, and sheds binary size
   text    data     bss     dec     hex filename
  84235   12390       8   96633   17979 ./fs/io_uring.o
  84151   12414       8   96573   1793d ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e2224154dd6e53b665ac835d29436b177872fa10.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: get rid of files in exit cancel
Pavel Begunkov [Sun, 16 May 2021 21:58:04 +0000 (22:58 +0100)]
io_uring: get rid of files in exit cancel

We don't match against files on cancellation anymore, so no need to drag
around files_struct anymore, just pass a flag telling whether only
inflight or all requests should be killed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7bfc5409a78f8e2d6b27dec3293ec2d248677348.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: simplify waking sqo_sq_wait
Pavel Begunkov [Sun, 16 May 2021 21:58:03 +0000 (22:58 +0100)]
io_uring: simplify waking sqo_sq_wait

Going through submission in __io_sq_thread() and still having a full SQ
is rather unexpected, so remove a check for SQ fullness and just wake up
whoever wait on sqo_sq_wait. Also skip if it doesn't do submission in
the first place, likely may to happen for SQPOLL sharing and/or IOPOLL.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e2e91751e87b1a39f8d63ef884aaff578123f61e.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: remove unused park_task_work
Pavel Begunkov [Sun, 16 May 2021 21:58:02 +0000 (22:58 +0100)]
io_uring: remove unused park_task_work

As sqpoll cancel via task_work is killed, remove everything related to
park_task_work as it's not used anymore.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/310d8b76a2fbbf3e139373500e04ad9af7ee3dbb.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: improve sq_thread waiting check
Pavel Begunkov [Sun, 16 May 2021 21:58:01 +0000 (22:58 +0100)]
io_uring: improve sq_thread waiting check

If SQPOLL task finds a ring requesting it to continue running, no need
to set wake flag to rest of the rings as it will be cleared in a moment
anyway, so hide it in a single sqd->ctx_list loop.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1ee5a696d9fd08645994c58ee147d149a8957d94.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio_uring: improve sqpoll event/state handling
Pavel Begunkov [Sun, 16 May 2021 21:58:00 +0000 (22:58 +0100)]
io_uring: improve sqpoll event/state handling

As sqd->state changes rarely, don't check every event one by one but
look them all at once. Add a helper function. Also don't go into event
waiting sleeping with STOP flag set.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/645025f95c7eeec97f88ff497785f4f1d6f3966f.1621201931.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoLinux 5.13-rc6
Linus Torvalds [Sun, 13 Jun 2021 21:43:10 +0000 (14:43 -0700)]
Linux 5.13-rc6

3 years agoMerge tag 'perf-tools-fixes-for-v5.13-2021-06-13' of git://git.kernel.org/pub/scm...
Linus Torvalds [Sun, 13 Jun 2021 19:41:47 +0000 (12:41 -0700)]
Merge tag 'perf-tools-fixes-for-v5.13-2021-06-13' of git://git./linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Correct buffer copying when peeking events

 - Sync cpufeatures/disabled-features.h header with the kernel sources

* tag 'perf-tools-fixes-for-v5.13-2021-06-13' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  tools headers cpufeatures: Sync with the kernel sources
  perf session: Correct buffer copying when peeking events

3 years agoMerge tag 'nfs-for-5.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Sun, 13 Jun 2021 19:32:59 +0000 (12:32 -0700)]
Merge tag 'nfs-for-5.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Fix use-after-free in nfs4_init_client()

  Bugfixes:

   - Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode()

   - Fix second deadlock in nfs4_evict_inode()

   - nfs4_proc_set_acl should not change the value of NFS_CAP_UIDGID_NOMAP

   - Fix setting of the NFS_CAP_SECURITY_LABEL capability"

* tag 'nfs-for-5.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix second deadlock in nfs4_evict_inode()
  NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode()
  NFS: FMODE_READ and friends are C macros, not enum types
  NFS: Fix a potential NULL dereference in nfs_get_client()
  NFS: Fix use-after-free in nfs4_init_client()
  NFS: Ensure the NFS_CAP_SECURITY_LABEL capability is set when appropriate
  NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.

3 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sun, 13 Jun 2021 19:25:33 +0000 (12:25 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four reasonably small fixes to the core for scsi host allocation
  failure paths.

  The root problem is that we're not freeing the memory allocated by
  dev_set_name(), which involves a rejig of may of the free on error
  paths to do put_device() instead of kfree which, in turn, has several
  other knock on ramifications and inspection turned up a few other
  lurking bugs"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: core: Only put parent device if host state differs from SHOST_CREATED
  scsi: core: Put .shost_dev in failure path if host state changes to RUNNING
  scsi: core: Fix failure handling of scsi_add_host_with_dma()
  scsi: core: Fix error handling of scsi_host_alloc()

3 years agoMerge tag 'riscv-for-linus-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Jun 2021 20:57:49 +0000 (13:57 -0700)]
Merge tag 'riscv-for-linus-5.13-rc6' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A pair of XIP fixes: one to fix alternatives, and one to turn off the
   rest of the features that require code modification

 - A fix to a type that was causing some alternatives to break

 - A build fix for BUILTIN_DTB

* tag 'riscv-for-linus-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix BUILTIN_DTB for sifive and microchip soc
  riscv: alternative: fix typo in macro name
  riscv: code patching only works on !XIP_KERNEL
  riscv: xip: support runtime trap patching

3 years agomm: relocate 'write_protect_seq' in struct mm_struct
Feng Tang [Fri, 11 Jun 2021 01:54:42 +0000 (09:54 +0800)]
mm: relocate 'write_protect_seq' in struct mm_struct

0day robot reported a 9.2% regression for will-it-scale mmap1 test
case[1], caused by commit 57efa1fe5957 ("mm/gup: prevent gup_fast from
racing with COW during fork").

Further debug shows the regression is due to that commit changes the
offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some
cache alignment changes.

From the perf data, the contention for 'mmap_lock' is very severe and
takes around 95% cpu cycles, and it is a rw_semaphore

        struct rw_semaphore {
                atomic_long_t count; /* 8 bytes */
                atomic_long_t owner; /* 8 bytes */
                struct optimistic_spin_queue osq; /* spinner MCS lock */
                ...

Before commit 57efa1fe5957 adds the 'write_protect_seq', it happens to
have a very optimal cache alignment layout, as Linus explained:

 "and before the addition of the 'write_protect_seq' field, the
  mmap_sem was at offset 120 in 'struct mm_struct'.

  Which meant that count and owner were in two different cachelines,
  and then when you have contention and spend time in
  rwsem_down_write_slowpath(), this is probably *exactly* the kind
  of layout you want.

  Because first the rwsem_write_trylock() will do a cmpxchg on the
  first cacheline (for the optimistic fast-path), and then in the
  case of contention, rwsem_down_write_slowpath() will just access
  the second cacheline.

  Which is probably just optimal for a load that spends a lot of
  time contended - new waiters touch that first cacheline, and then
  they queue themselves up on the second cacheline."

After the commit, the rw_semaphore is at offset 128, which means the
'count' and 'owner' fields are now in the same cacheline, and causes
more cache bouncing.

Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will
affect its offset:

  CONFIG_MMU
  CONFIG_MEMBARRIER
  CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES

The layout above is on 64 bits system with 0day's default kernel config
(similar to RHEL-8.3's config), in which all these 3 options are 'y'.
And the layout can vary with different kernel configs.

Relayouting a structure is usually a double-edged sword, as sometimes it
can helps one case, but hurt other cases.  For this case, one solution
is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t
(when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes
hole in 'mm_struct' will not change other fields' alignment, while
restoring the regression.

Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoMerge tag 'usb-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 12 Jun 2021 19:34:49 +0000 (12:34 -0700)]
Merge tag 'usb-5.13-rc6' of git://git./linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are a number of tiny USB fixes for 5.13-rc6.

  There are more than I would normally like, but there's been a bunch of
  people banging on the gadget and dwc3 and typec code recently for I
  think an Android release, which has resulted in a number of small
  fixes. It's nice to see companies send fixes upstream for this type of
  work, a notable change from years ago.

  Anyway, fixes in here are:

   - usb-serial device id updates

   - usb-serial cp210x driver fixes for broken firmware versions

   - typec fixes for crazy charging devices and other reported problems

   - dwc3 fixes for reported problems found

   - gadget fixes for reported problems

   - tiny xhci fixes

   - other small fixes for reported issues.

   - revert of a problem fix found by linux-next testing

  All of these have passed 0-day and linux-next testing with no reported
  problems (the revert for the found linux-next build problem included)"

* tag 'usb-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (44 commits)
  Revert "usb: gadget: fsl: Re-enable driver for ARM SoCs"
  usb: typec: mux: Fix copy-paste mistake in typec_mux_match
  usb: typec: ucsi: Clear PPM capability data in ucsi_init() error path
  usb: gadget: fsl: Re-enable driver for ARM SoCs
  usb: typec: wcove: Use LE to CPU conversion when accessing msg->header
  USB: serial: cp210x: fix CP2102N-A01 modem control
  USB: serial: cp210x: fix alternate function for CP2102N QFN20
  usb: misc: brcmstb-usb-pinmap: check return value after calling platform_get_resource()
  usb: dwc3: ep0: fix NULL pointer exception
  usb: gadget: eem: fix wrong eem header operation
  usb: typec: intel_pmc_mux: Put ACPI device using acpi_dev_put()
  usb: typec: intel_pmc_mux: Add missed error check for devm_ioremap_resource()
  usb: typec: intel_pmc_mux: Put fwnode in error case during ->probe()
  usb: typec: tcpm: Do not finish VDM AMS for retrying Responses
  usb: fix various gadget panics on 10gbps cabling
  usb: fix various gadgets null ptr deref on 10gbps cabling.
  usb: pci-quirks: disable D3cold on xhci suspend for s2idle on AMD Renoir
  usb: f_ncm: only first packet of aggregate needs to start timer
  USB: f_ncm: ncm_bitrate (speed) is unsigned
  MAINTAINERS: usb: add entry for isp1760
  ...

3 years agoMerge tag 'tty-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Linus Torvalds [Sat, 12 Jun 2021 19:27:05 +0000 (12:27 -0700)]
Merge tag 'tty-5.13-rc6' of git://git./linux/kernel/git/gregkh/tty

Pull serial driver fix from Greg KH:
 "A single 8250_exar serial driver fix for a reported problem with a
  change that happened in 5.13-rc1.

  It has been in linux-next with no reported problems"

* tag 'tty-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  serial: 8250_exar: Avoid NULL pointer dereference at ->exit()

3 years agoMerge tag 'staging-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 12 Jun 2021 19:23:54 +0000 (12:23 -0700)]
Merge tag 'staging-5.13-rc6' of git://git./linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
 "Two tiny staging driver fixes:

   - ralink-gdma driver authorship information fixed up

   - rtl8723bs driver fix for reported regression

  Both have been in linux-next for a while with no reported problems"

* tag 'staging-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: ralink-gdma: Remove incorrect author information
  staging: rtl8723bs: Fix uninitialized variables

3 years agoMerge tag 'driver-core-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 12 Jun 2021 19:18:49 +0000 (12:18 -0700)]
Merge tag 'driver-core-5.13-rc6' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
 "A single debugfs fix for 5.13-rc6, fixing a bug in
  debugfs_read_file_str() that showed up in 5.13-rc1.

  It has been in linux-next for a full week with no
  reported problems"

* tag 'driver-core-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  debugfs: Fix debugfs_read_file_str()

3 years agoMerge tag 'char-misc-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sat, 12 Jun 2021 19:13:55 +0000 (12:13 -0700)]
Merge tag 'char-misc-5.13-rc6' of git://git./linux/kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are some small misc driver fixes for 5.13-rc6 that fix some
  reported problems:

   - Tiny phy driver fixes for reported issues

   - rtsx regression for when the device suspended

   - mhi driver fix for a use-after-free

  All of these have been in linux-next for a few days with no reported
  issues"

* tag 'char-misc-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG
  bus: mhi: pci-generic: Fix hibernation
  bus: mhi: pci_generic: Fix possible use-after-free in mhi_pci_remove()
  bus: mhi: pci_generic: T99W175: update channel name from AT to DUN
  phy: Sparx5 Eth SerDes: check return value after calling platform_get_resource()
  phy: ralink: phy-mt7621-pci: drop 'of_match_ptr' to fix -Wunused-const-variable
  phy: ti: Fix an error code in wiz_probe()
  phy: phy-mtk-tphy: Fix some resource leaks in mtk_phy_init()
  phy: cadence: Sierra: Fix error return code in cdns_sierra_phy_probe()
  phy: usb: Fix misuse of IS_ENABLED

3 years agoMerge tag 'pinctrl-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sat, 12 Jun 2021 19:06:24 +0000 (12:06 -0700)]
Merge tag 'pinctrl-v5.13-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

 - Fix some documentation warnings for Allwinner

 - Fix duplicated GPIO groups on Qualcomm SDX55

 - Fix a double enablement bug in the Ralink driver

 - Fix the Qualcomm SC8180x Kconfig so the driver can be selected.

* tag 'pinctrl-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: qcom: Make it possible to select SC8180x TLMM
  pinctrl: ralink: rt2880: avoid to error in calls is pin is already enabled
  pinctrl: qcom: Fix duplication in gpio_groups
  pinctrl: aspeed: Fix minor documentation error

3 years agoMerge tag 'block-5.13-2021-06-12' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 12 Jun 2021 18:59:58 +0000 (11:59 -0700)]
Merge tag 'block-5.13-2021-06-12' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A few fixes that should go into 5.13:

   - Fix a regression deadlock introduced in this release between open
     and remove of a bdev (Christoph)

   - Fix an async_xor md regression in this release (Xiao)

   - Fix bcache oversized read issue (Coly)"

* tag 'block-5.13-2021-06-12' of git://git.kernel.dk/linux-block:
  block: loop: fix deadlock between open and remove
  async_xor: check src_offs is not NULL before updating it
  bcache: avoid oversized read request in cache missing code path
  bcache: remove bcache device self-defined readahead

3 years agoMerge tag 'io_uring-5.13-2021-06-12' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 12 Jun 2021 18:53:20 +0000 (11:53 -0700)]
Merge tag 'io_uring-5.13-2021-06-12' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Just an API change for the registration changes that went into this
  release. Better to get it sorted out now than before it's too late"

* tag 'io_uring-5.13-2021-06-12' of git://git.kernel.dk/linux-block:
  io_uring: add feature flag for rsrc tags
  io_uring: change registration/upd/rsrc tagging ABI

3 years agoMerge tag 'sched-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Jun 2021 18:41:28 +0000 (11:41 -0700)]
Merge tag 'sched-urgent-2021-06-12' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
 "Misc fixes:

   - Fix performance regression caused by lack of intended batching of
     RCU callbacks by over-eager NOHZ-full code.

   - Fix cgroups related corruption of load_avg and load_sum metrics.

   - Three fixes to fix blocked load, util_sum/runnable_sum and util_est
     tracking bugs"

* tag 'sched-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling
  sched/pelt: Ensure that *_sum is always synced with *_avg
  tick/nohz: Only check for RCU deferred wakeup on user/guest entry when needed
  sched/fair: Make sure to update tg contrib for blocked load
  sched/fair: Keep load_avg and load_sum synced

3 years agoMerge tag 'perf-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 12 Jun 2021 18:34:49 +0000 (11:34 -0700)]
Merge tag 'perf-urgent-2021-06-12' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Misc fixes:

   - Fix the NMI watchdog on ancient Intel CPUs

   - Remove a misguided, NMI-unsafe KASAN callback from the NMI-safe
     irq_work path used by perf.

   - Fix uncore events on Ice Lake servers.

   - Someone booted maxcpus=1 on an SNB-EP, and the uncore driver
     emitted warnings and was probably buggy. Fix it.

   - KCSAN found a genuine data race in the core perf code. Somewhat
     ironically the bug was introduced through a recent race fix. :-/
     In our defense, the new race window was much more narrow. Fix it"

* tag 'perf-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi_watchdog: Fix old-style NMI watchdog regression on old Intel CPUs
  irq_work: Make irq_work_queue() NMI-safe again
  perf/x86/intel/uncore: Fix M2M event umask for Ice Lake server
  perf/x86/intel/uncore: Fix a kernel WARNING triggered by maxcpus=1
  perf: Fix data race between pin_count increment/decrement

3 years agoMerge tag 'objtool-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 12 Jun 2021 18:10:28 +0000 (11:10 -0700)]
Merge tag 'objtool-urgent-2021-06-12' of git://git./linux/kernel/git/tip/tip

Pull objtool fixes from Ingo Molnar:
 "Two objtool fixes:

   - fix a bug that corrupts the code by mistakenly rewriting
     conditional jumps

   - fix another bug generating an incorrect ELF symbol table
     during retpoline rewriting"

* tag 'objtool-urgent-2021-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Only rewrite unconditional retpoline thunk calls
  objtool: Fix .symtab_shndx handling for elf_create_undef_symbol()

3 years agoriscv: Fix BUILTIN_DTB for sifive and microchip soc
Alexandre Ghiti [Fri, 4 Jun 2021 12:06:39 +0000 (14:06 +0200)]
riscv: Fix BUILTIN_DTB for sifive and microchip soc

Fix BUILTIN_DTB config which resulted in a dtb that was actually not
built into the Linux image: in the same manner as Canaan soc does,
create an object file from the dtb file that will get linked into the
Linux image.

Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoMerge tag 'trace-v5.13-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rosted...
Linus Torvalds [Sat, 12 Jun 2021 00:05:03 +0000 (17:05 -0700)]
Merge tag 'trace-v5.13-rc5-2' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Fix the length check in the temp buffer filter

 - Fix build failure in bootconfig tools for "fallthrough" macro

 - Fix error return of bootconfig apply_xbc() routine

* tag 'trace-v5.13-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Correct the length check which causes memory corruption
  ftrace: Do not blindly read the ip address in ftrace_bug()
  tools/bootconfig: Fix a build error accroding to undefined fallthrough
  tools/bootconfig: Fix error return code in apply_xbc()

3 years agoMerge tag 'clang-features-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Jun 2021 23:29:53 +0000 (16:29 -0700)]
Merge tag 'clang-features-v5.13-rc6' of git://git./linux/kernel/git/kees/linux

Pull clang LTO fix from Kees Cook:
 "Clang 13 fixed some IR behavior for LTO, but this broke work-arounds
  used in the kernel.

  Handle changes to needed LTO flags in Clang 13 (Tor Vic)"

* tag 'clang-features-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  x86, lto: Pass -stack-alignment only on LLD < 13.0.0

3 years agoMerge tag 'gpio-fixes-for-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Jun 2021 23:27:18 +0000 (16:27 -0700)]
Merge tag 'gpio-fixes-for-v5.13-rc6' of git://git./linux/kernel/git/brgl/linux

Pull gpio fix from Bartosz Golaszewski:
 "Fix a shift-out-of-bounds error in gpio-wcd934x"

* tag 'gpio-fixes-for-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: wcd934x: Fix shift-out-of-bounds error

3 years agoMerge tag 'drm-fixes-2021-06-11' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 11 Jun 2021 19:33:38 +0000 (12:33 -0700)]
Merge tag 'drm-fixes-2021-06-11' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Another week of fixes, nothing too crazy, but a few all over the
  place.

  Two locking fixes in the core/ttm area, a couple of small driver fixes
  (radeon, sun4i, mcde, vc4). Then msm and amdgpu have a set of fixes
  each, mostly for smaller things, though the msm has a DSI fix for a
  black screen.

  I haven't seen any intel fixes this week so they may have a few that
  may or may not wait for next week.

  drm:
   - auth locking fix

  ttm:
   - locking fix

  amdgpu:
   - Use kvzmalloc in amdgu_bo_create
   - Use drm_dbg_kms for reporting failure to get a GEM FB
   - Fix some register offsets for Sienna Cichlid
   - Fix fall-through warning

  radeon:
   - memcpy_to/from_io fixes

  msm:
   - NULL ptr deref fix
   - CP_PROTECT reg programming fix
   - incorrect register shift fix
   - DSI blank screen fix

  sun4i:
   - hdmi output probing fix

  mcde:
   - DSI pipeline calc fix

  vc4:
   - out of bounds fix"

* tag 'drm-fixes-2021-06-11' of git://anongit.freedesktop.org/drm/drm:
  drm/msm/dsi: Stash away calculated vco frequency on recalc
  drm: Lock pointer access in drm_master_release()
  drm/mcde: Fix off by 10^3 in calculation
  drm/msm/a6xx: avoid shadow NULL reference in failure path
  drm/msm/a6xx: fix incorrectly set uavflagprd_inv field for A650
  drm/msm/a6xx: update/fix CP_PROTECT initialization
  radeon: use memcpy_to/fromio for UVD fw upload
  drm/amd/pm: Fix fall-through warning for Clang
  drm/amdgpu: Fix incorrect register offsets for Sienna Cichlid
  drm/amdgpu: Use drm_dbg_kms for reporting failure to get a GEM FB
  drm/amdgpu: switch kzalloc to kvzalloc in amdgpu_bo_create
  drm/msm: Init mm_list before accessing it for use_vram path
  drm: Fix use-after-free read in drm_getunique()
  drm/vc4: fix vc4_atomic_commit_tail() logic
  drm/ttm: fix deref of bo->ttm without holding the lock v2
  drm/sun4i: dw-hdmi: Make HDMI PHY into a platform device

3 years agoMerge tag 'devicetree-fixes-for-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 11 Jun 2021 18:02:56 +0000 (11:02 -0700)]
Merge tag 'devicetree-fixes-for-5.13-3' of git://git./linux/kernel/git/robh/linux

Pull devicetree fix from Rob Herring:
 "A single fix for broken media/renesas,drif.yaml binding schema"

* tag 'devicetree-fixes-for-5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  media: dt-bindings: media: renesas,drif: Fix fck definition

3 years agoMerge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md...
Jens Axboe [Fri, 11 Jun 2021 17:56:08 +0000 (11:56 -0600)]
Merge branch 'md-fixes' of https://git./linux/kernel/git/song/md into block-5.13

Pull MD related fix from Song.

* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  async_xor: check src_offs is not NULL before updating it

3 years agoMerge tag 'acpi-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 11 Jun 2021 17:53:43 +0000 (10:53 -0700)]
Merge tag 'acpi-5.13-rc6' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These revert a problematic recent commit and fix a regression
  introduced during the 5.12 development cycle.

  Specifics:

   - Revert recent commit that attempted to fix the FACS table reference
     counting but introduced a problem with accessing the hardware
     signature after hibernation (Zhang Rui).

   - Fix regression in the _OSC handling that broke the loading of ACPI
     tables on some systems (Mika Westerberg)"

* tag 'acpi-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: Pass the same capabilities to the _OSC regardless of the query flag
  Revert "ACPI: sleep: Put the FACS table after using it"

3 years agoblock: loop: fix deadlock between open and remove
Christoph Hellwig [Sat, 5 Jun 2021 14:09:50 +0000 (17:09 +0300)]
block: loop: fix deadlock between open and remove

Commit c76f48eb5c08 ("block: take bd_mutex around delete_partitions in
del_gendisk") adds disk->part0->bd_mutex in del_gendisk(), this way
causes the following AB/BA deadlock between removing loop and opening
loop:

 1) loop_control_ioctl(LOOP_CTL_REMOVE)
     -> mutex_lock(&loop_ctl_mutex)
     -> del_gendisk
         -> mutex_lock(&disk->part0->bd_mutex)

 2) blkdev_get_by_dev
     -> mutex_lock(&disk->part0->bd_mutex)
     -> lo_open
         -> mutex_lock(&loop_ctl_mutex)

Add a new Lo_deleting state to remove the need for clearing
->private_data and thus holding loop_ctl_mutex in the ioctl
LOOP_CTL_REMOVE path.

Based on an analysis and earlier patch from
Ming Lei <ming.lei@redhat.com>.

Reported-by: Colin Ian King <colin.king@canonical.com>
Fixes: c76f48eb5c08 ("block: take bd_mutex around delete_partitions in del_gendisk")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210605140950.5800-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'sound-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 11 Jun 2021 17:47:10 +0000 (10:47 -0700)]
Merge tag 'sound-5.13-rc6' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A bit more commits than expected at this time, but likely it's the
  last shot before the final.

  Many of changes are device-specific fix-ups for various ASoC drivers,
  while a few usual HD-audio quirks and a FireWire fix, as well as a
  couple of ALSA / ASoC core fixes.

  All look nice and small, and nothing to scare much"

* tag 'sound-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: seq: Fix race of snd_seq_timer_open()
  ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power G8
  ALSA: hda/realtek: headphone and mic don't work on an Acer laptop
  ASoC: qcom: lpass-cpu: Fix pop noise during audio capture begin
  ALSA: firewire-lib: fix the context to call snd_pcm_stop_xrun()
  ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 840 Aero G8
  ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP EliteBook x360 1040 G8
  ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Elite Dragonfly G2
  ASoC: rt5682: Fix the fast discharge for headset unplugging in soundwire mode
  ASoC: tas2562: Fix TDM_CFG0_SAMPRATE values
  ASoC: meson: gx-card: fix sound-dai dt schema
  ASoC: AMD Renoir: Remove fix for DMI entry on Lenovo 2020 platforms
  ASoC: AMD Renoir - add DMI entry for Lenovo 2020 AMD platforms
  ASoC: SOF: reset enabled_cores state at suspend
  ASoC: fsl-asoc-card: Set .owner attribute when registering card.
  ASoC: topology: Fix spelling mistake "vesion" -> "version"
  ASoC: rt5659: Fix the lost powers for the HDA header
  ASoC: core: Fix Null-point-dereference in fmt_single_name()

3 years agox86, lto: Pass -stack-alignment only on LLD < 13.0.0
Tor Vic [Thu, 10 Jun 2021 20:58:06 +0000 (20:58 +0000)]
x86, lto: Pass -stack-alignment only on LLD < 13.0.0

Since LLVM commit 3787ee4, the '-stack-alignment' flag has been dropped
[1], leading to the following error message when building a LTO kernel
with Clang-13 and LLD-13:

    ld.lld: error: -plugin-opt=-: ld.lld: Unknown command line argument
    '-stack-alignment=8'.  Try 'ld.lld --help'
    ld.lld: Did you mean '--stackrealign=8'?

It also appears that the '-code-model' flag is not necessary anymore
starting with LLVM-9 [2].

Drop '-code-model' and make '-stack-alignment' conditional on LLD < 13.0.0.

These flags were necessary because these flags were not encoded in the
IR properly, so the link would restart optimizations without them. Now
there are properly encoded in the IR, and these flags exposing
implementation details are no longer necessary.

[1] https://reviews.llvm.org/D103048
[2] https://reviews.llvm.org/D52322

Cc: stable@vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1377
Signed-off-by: Tor Vic <torvic9@mailbox.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/f2c018ee-5999-741e-58d4-e482d5246067@mailbox.org
3 years agoMerge tag 'hwmon-for-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 11 Jun 2021 17:07:50 +0000 (10:07 -0700)]
Merge tag 'hwmon-for-v5.13-rc6' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Fixes for tps23861, scpi-hwmon, and corsair-psu drivers, plus a
  bindings fix for TI ADS7828"

* tag 'hwmon-for-v5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (tps23861) correct shunt LSB values
  hwmon: (tps23861) set current shunt value
  hwmon: (tps23861) define regmap max register
  hwmon: (scpi-hwmon) shows the negative temperature properly
  hwmon: (corsair-psu) fix suspend behavior
  dt-bindings: hwmon: Fix typo in TI ADS7828 bindings

3 years agoMerge tag 'mmc-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Linus Torvalds [Fri, 11 Jun 2021 17:02:30 +0000 (10:02 -0700)]
Merge tag 'mmc-v5.13-rc3' of git://git./linux/kernel/git/ulfh/mmc

Pull MMC fixes from Ulf Hansson:
 "A couple of MMC fixes to the Renesas SDHI driver:

   - Fix HS400 on R-Car M3-W+

   - Abort tuning when timeout detected"

* tag 'mmc-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: renesas_sdhi: Fix HS400 on R-Car M3-W+
  mmc: renesas_sdhi: abort tuning when timeout detected

3 years agoMerge branch 'acpi-bus'
Rafael J. Wysocki [Fri, 11 Jun 2021 15:57:24 +0000 (17:57 +0200)]
Merge branch 'acpi-bus'

* acpi-bus:
  ACPI: Pass the same capabilities to the _OSC regardless of the query flag

3 years agotools headers cpufeatures: Sync with the kernel sources
Arnaldo Carvalho de Melo [Tue, 8 Jun 2021 16:46:18 +0000 (13:46 -0300)]
tools headers cpufeatures: Sync with the kernel sources

To pick the changes in:

  fb35d30fe5b06cc2 ("x86/cpufeatures: Assign dedicated feature word for CPUID_0x8000001F[EAX]")
  e7b6385b01d8e9fb ("x86/cpufeatures: Add Intel SGX hardware bits")
  1478b99a76534b6c ("x86/cpufeatures: Mark ENQCMD as disabled when configured out")

That don't cause any change in the tools, just silences this perf build
warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
  diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h

Cc: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf session: Correct buffer copying when peeking events
Leo Yan [Sat, 5 Jun 2021 05:29:57 +0000 (13:29 +0800)]
perf session: Correct buffer copying when peeking events

When peeking an event, it has a short path and a long path.  The short
path uses the session pointer "one_mmap_addr" to directly fetch the
event; and the long path needs to read out the event header and the
following event data from file and fill into the buffer pointer passed
through the argument "buf".

The issue is in the long path that it copies the event header and event
data into the same destination address which pointer "buf", this means
the event header is overwritten.  We are just lucky to run into the
short path in most cases, so we don't hit the issue in the long path.

This patch adds the offset "hdr_sz" to the pointer "buf" when copying
the event data, so that it can reserve the event header which can be
used properly by its caller.

Fixes: 5a52f33adf02 ("perf session: Add perf_session__peek_event()")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210605052957.1070720-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>