Peter Zijlstra [Thu, 12 Jan 2023 19:43:19 +0000 (20:43 +0100)]
cpuidle, riscv: Push RCU-idle into driver
Doing RCU-idle outside the driver, only to then temporarily enable it
again, at least twice, before going idle is suboptimal.
That is, once implicitly through the cpu_pm_*() calls and once
explicitly doing ct_irq_*_irqon().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.637185846@infradead.org
Peter Zijlstra [Thu, 12 Jan 2023 19:43:18 +0000 (20:43 +0100)]
cpuidle: Move IRQ state validation
Make cpuidle_enter_state() consistent with the s2idle variant and
verify ->enter() always returns with interrupts disabled.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195539.576412812@infradead.org
Peter Zijlstra [Thu, 12 Jan 2023 19:43:17 +0000 (20:43 +0100)]
cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls
Make cpuidle_state::enter() methods IRQ state invariant on exit.
Additionally make sure to use raw_local_irq_*() methods since this
cpuidle callback will be called with RCU already disabled.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195539.515253662@infradead.org
Peter Zijlstra [Thu, 12 Jan 2023 19:43:16 +0000 (20:43 +0100)]
x86/idle: Replace 'x86_idle' function pointer with a static_call
Typical boot time setup; no need to suffer an indirect call for that.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/20230112195539.453613251@infradead.org
Peter Zijlstra [Thu, 12 Jan 2023 19:43:15 +0000 (20:43 +0100)]
x86/perf/amd: Remove tracing from perf_lopwr_cb()
The perf_lopwr_cb() function is called from the idle routines; there
is no RCU there, we must not enter tracing.
Use __always_inline, noidle annotations and existing no-trace methods.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195539.392862891@infradead.org
Mathieu Desnoyers [Wed, 4 Jan 2023 19:20:54 +0000 (14:20 -0500)]
rseq: Increase AT_VECTOR_SIZE_BASE to match rseq auxvec entries
Two new auxiliary vector entries are introduced for rseq without
matching increment of the AT_VECTOR_SIZE_BASE, which causes failures
with CONFIG_HARDENED_USERCOPY=y.
Fixes:
317c8194e6ae ("rseq: Introduce feature size and alignment ELF auxiliary vector entries")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230104192054.34046-1-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Wed, 4 Jan 2023 16:35:42 +0000 (11:35 -0500)]
selftests/rseq: Revert "selftests/rseq: Add mm_numa_cid to test script"
The mm_numa_cid related rseq patches from the series were not picked up
into the tip tree, so enabling the mm_numa_cid test needs to be
reverted.
This reverts commit
b344b8f2d88dbf095caf97ac57fd3645843fa70f.
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/oe-lkp/202301040903.2dd1e25b-oliver.sang@intel.com
Ingo Molnar [Wed, 11 Jan 2023 09:25:34 +0000 (10:25 +0100)]
sched/cputime: Fix IA64 build error of missing arch_vtime_task_switch() prototype
The following commit:
c89970202a11 ("cputime: remove cputime_to_nsecs fallback")
Removed an <asm/cputime.h> inclusion from <linux/sched/cputime.h>, but this
broke the IA64 build:
arch/ia64/kernel/time.c:110:6: warning: no previous prototype for 'arch_vtime_task_switch' [-Wmissing-prototypes]
Add in the missing <asm/cputime.h> header to fix it.
Fixes:
c89970202a11 ("cputime: remove cputime_to_nsecs fallback")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Michal Clapinski [Wed, 7 Dec 2022 16:43:38 +0000 (17:43 +0100)]
selftests/membarrier: Test MEMBARRIER_CMD_GET_REGISTRATIONS
Keep track of previously issued registrations and compare the result
with MEMBARRIER_CMD_GET_REGISTRATIONS return value.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20221207164338.1535591-3-mclapinski@google.com
Michal Clapinski [Wed, 7 Dec 2022 16:43:37 +0000 (17:43 +0100)]
sched/membarrier: Introduce MEMBARRIER_CMD_GET_REGISTRATIONS
Provide a method to query previously issued registrations.
Signed-off-by: Michal Clapinski <mclapinski@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20221207164338.1535591-2-mclapinski@google.com
Lukasz Luba [Thu, 8 Dec 2022 16:02:56 +0000 (16:02 +0000)]
cpufreq, sched/util: Optimize operations with single CPU capacity lookup
The max CPU capacity is the same for all CPUs sharing frequency domain.
There is a way to avoid heavy operations in a loop for each CPU by
leveraging this knowledge. Thus, simplify the looping code in the
sugov_next_freq_shared() and drop heavy multiplications. Instead, use
simple max() to get the highest utilization from these CPUs.
This is useful for platforms with many (4 or 6) little CPUs. We avoid
heavy 2*PD_CPU_NUM multiplications in that loop, which is called billions
of times, since it's not limited by the schedutil time delta filter in
sugov_should_update_freq(). When there was no need to change frequency
the code bailed out, not updating the sg_policy::last_freq_update_time.
Then every visit after delta_ns time longer than the
sg_policy::freq_update_delay_ns goes through and triggers the next
frequency calculation code. Although, if the next frequency, as outcome
of that, would be the same as current frequency, we won't update the
sg_policy::last_freq_update_time and the story will be repeated (in
a very short period, sometimes a few microseconds).
The max CPU capacity must be fetched every time we are called, due to
difficulties during the policy setup, where we are not able to get the
normalized CPU capacity at the right time.
The fetched CPU capacity value is than used in sugov_iowait_apply() to
calculate the right boost. This required a few changes in the local
functions and arguments. The capacity value should hopefully be fetched
once when needed and then passed over CPU registers to those functions.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20221208160256.859-2-lukasz.luba@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Chengming Zhou [Fri, 23 Dec 2022 10:32:57 +0000 (18:32 +0800)]
sched/core: Reorganize ttwu_do_wakeup() and ttwu_do_activate()
ttwu_do_activate() is used for a complete wakeup, in which we will
activate_task() and use ttwu_do_wakeup() to mark the task runnable
and perform wakeup-preemption, also call class->task_woken() callback
and update the rq->idle_stamp.
Since ttwu_runnable() is not a complete wakeup, don't need all those
done in ttwu_do_wakeup(), so we can move those to ttwu_do_activate()
to simplify ttwu_do_wakeup(), making it only mark the task runnable
to be reused in ttwu_runnable() and try_to_wake_up().
This patch should not have any functional changes.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20221223103257.4962-2-zhouchengming@bytedance.com
Chengming Zhou [Fri, 23 Dec 2022 10:32:56 +0000 (18:32 +0800)]
sched/core: Micro-optimize ttwu_runnable()
ttwu_runnable() is used as a fast wakeup path when the wakee task
is running on CPU or runnable on RQ, in both cases we can just
set its state to TASK_RUNNING to prevent a sleep.
If the wakee task is on_cpu running, we don't need to update_rq_clock()
or check_preempt_curr().
But if the wakee task is on_rq && !on_cpu (e.g. an IRQ hit before
the task got to schedule() and the task been preempted), we should
check_preempt_curr() to see if it can preempt the current running.
This also removes the class->task_woken() callback from ttwu_runnable(),
which wasn't required per the RT/DL implementations: any required push
operation would have been queued during class->set_next_task() when p
got preempted.
ttwu_runnable() also loses the update to rq->idle_stamp, as by definition
the rq cannot be idle in this scenario.
Suggested-by: Valentin Schneider <vschneid@redhat.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221223103257.4962-1-zhouchengming@bytedance.com
Qais Yousef [Fri, 16 Dec 2022 23:57:16 +0000 (23:57 +0000)]
sched/documentation: Document the util clamp feature
Add a document explaining the util clamp feature: what it is and
how to use it. The new document hopefully covers everything one needs to
know about uclamp.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20221216235716.201923-1-qyousef@layalina.io
Cc: Jonathan Corbet <corbet@lwn.net>
Bing Huang [Thu, 5 Jan 2023 01:49:43 +0000 (09:49 +0800)]
sched/topology: Add __init for sched_init_domains()
sched_init_domains() is only used in initialization
Signed-off-by: Bing Huang <huangbing@kylinos.cn>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230105014943.9857-1-huangbing775@126.com
Mathieu Desnoyers [Mon, 2 Jan 2023 15:12:16 +0000 (10:12 -0500)]
sched/rseq: Fix concurrency ID handling of usermodehelper kthreads
sched_mm_cid_after_execve() does not expect NULL t->mm, but it may happen
if a usermodehelper kthread fails when attempting to execute a binary.
sched_mm_cid_fork() can be issued from a usermodehelper kthread, which
has t->flags PF_KTHREAD set.
Fixes:
af7f588d8f73 ("sched: Introduce per-memory-map concurrency ID")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/oe-lkp/202212301353.5c959d72-yujie.liu@intel.com
Nicholas Piggin [Tue, 20 Dec 2022 07:07:05 +0000 (17:07 +1000)]
cputime: remove cputime_to_nsecs fallback
The archs that use cputime_to_nsecs() internally provide their own
definition and don't need the fallback. cputime_to_usecs() unused except
in this fallback, and is not defined anywhere.
This removes the final remnant of the cputime_t code from the kernel.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20221220070705.2958959-1-npiggin@gmail.com
Hao Jia [Fri, 16 Dec 2022 06:24:06 +0000 (14:24 +0800)]
sched/core: Adjusting the order of scanning CPU
When select_idle_capacity() starts scanning for an idle CPU, it starts
with target CPU that has already been checked in select_idle_sibling().
So we start checking from the next CPU and try the target CPU at the end.
Similarly for task_numa_assign(), we have just checked numa_migrate_on
of dst_cpu, so start from the next CPU. This also works for
steal_cookie_task(), the first scan must fail and start directly
from the next one.
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20221216062406.7812-3-jiahao.os@bytedance.com
Hao Jia [Fri, 16 Dec 2022 06:24:05 +0000 (14:24 +0800)]
sched/numa: Stop an exhastive search if an idle core is found
In update_numa_stats() we try to find an idle cpu on the NUMA node,
preferably an idle core. we can stop looking for the next idle core
or idle cpu after finding an idle core. But we can't stop the
whole loop of scanning the CPU, because we need to calculate
approximate NUMA stats at a point in time. For example,
the src and dst nr_running is needed by task_numa_find_cpu().
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20221216062406.7812-2-jiahao.os@bytedance.com
Matthew Wilcox (Oracle) [Mon, 12 Dec 2022 14:49:46 +0000 (14:49 +0000)]
sched: Make const-safe
With a modified container_of() that preserves constness, the compiler
finds some pointers which should have been marked as const. task_of()
also needs to become const-preserving for the !FAIR_GROUP_SCHED case so
that cfs_rq_of() can take a const argument. No change to generated code.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221212144946.2657785-1-willy@infradead.org
Mathieu Desnoyers [Fri, 16 Dec 2022 14:53:32 +0000 (09:53 -0500)]
selftests/rseq: Add mm_numa_cid to test script
Add mm_numa_cid tests to the run_param_test.sh test script.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221216145332.205095-1-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:23 +0000 (15:39 -0500)]
tracing/rseq: Add mm_cid field to rseq_update
Add the mm_cid field to the rseq_update event, allowing tracers to
follow which mm_cid is observed by user-space, and whether negative
mm_cid values are visible in case of internal scheduler implementation
issues.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-22-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:22 +0000 (15:39 -0500)]
selftests/rseq: parametrized test: Report/abort on negative concurrency ID
Report and abort when a negative concurrency ID value is observed by the
spinlock test.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-21-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:21 +0000 (15:39 -0500)]
selftests/rseq: Implement parametrized mm_cid test
Adapt to the rseq.h API changes introduced by commits
"selftests/rseq: <arch>: Template memory ordering and percpu access mode".
Build a new param_test_mm_cid, param_test_mm_cid_benchmark, and
param_test_mm_cid_compare_twice executables to test the new "mm_cid"
rseq field.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-20-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:20 +0000 (15:39 -0500)]
selftests/rseq: Implement basic percpu ops mm_cid test
Adapt to the rseq.h API changes introduced by commits
"selftests/rseq: <arch>: Template memory ordering and percpu access mode".
Build a new basic_percpu_ops_mm_cid_test to test the new "mm_cid" rseq
field.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-19-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:19 +0000 (15:39 -0500)]
selftests/rseq: riscv: Template memory ordering and percpu access mode
Introduce a rseq-riscv-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-18-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:18 +0000 (15:39 -0500)]
selftests/rseq: s390: Template memory ordering and percpu access mode
Introduce a rseq-s390-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-17-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:17 +0000 (15:39 -0500)]
selftests/rseq: ppc: Template memory ordering and percpu access mode
Introduce a rseq-ppc-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-16-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:16 +0000 (15:39 -0500)]
selftests/rseq: mips: Template memory ordering and percpu access mode
Introduce a rseq-mips-bits.h template header which is internally
included to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-15-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:15 +0000 (15:39 -0500)]
selftests/rseq: arm64: Template memory ordering and percpu access mode
Introduce a rseq-arm64-bits.h template header which is internally
included to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-14-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:14 +0000 (15:39 -0500)]
selftests/rseq: arm: Template memory ordering and percpu access mode
Introduce a rseq-arm-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-13-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:13 +0000 (15:39 -0500)]
selftests/rseq: x86: Template memory ordering and percpu access mode
Introduce a rseq-x86-bits.h template header which is internally included
to generate the static inline functions covering:
- relaxed and release memory ordering,
- per-cpu-id and per-mm-cid per-cpu data access.
This introduces changes to the rseq.h selftests API which require to
update the rseq selftest programs. Similar API/templating changes need
to be done for other architectures.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-12-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:12 +0000 (15:39 -0500)]
selftests/rseq: Implement rseq mm_cid field support
Add support for the mm_cid field (per-memory-map concurrency ID) of
struct rseq to rseq selftests.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-11-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:11 +0000 (15:39 -0500)]
selftests/rseq: Remove RSEQ_SKIP_FASTPATH code
This code is not currently build by the test Makefile, adds complexity,
and is not overall useful considering that the abort handling loops to
retry the fast-path.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-10-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:10 +0000 (15:39 -0500)]
rseq: Extend struct rseq with per-memory-map concurrency ID
If a memory map has fewer threads than there are cores on the system, or
is limited to run on few cores concurrently through sched affinity or
cgroup cpusets, the concurrency IDs will be values close to 0, thus
allowing efficient use of user-space memory for per-cpu data structures.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-9-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:09 +0000 (15:39 -0500)]
sched: Introduce per-memory-map concurrency ID
This feature allows the scheduler to expose a per-memory map concurrency
ID to user-space. This concurrency ID is within the possible cpus range,
and is temporarily (and uniquely) assigned while threads are actively
running within a memory map. If a memory map has fewer threads than
cores, or is limited to run on few cores concurrently through sched
affinity or cgroup cpusets, the concurrency IDs will be values close
to 0, thus allowing efficient use of user-space memory for per-cpu
data structures.
This feature is meant to be exposed by a new rseq thread area field.
The primary purpose of this feature is to do the heavy-lifting needed
by memory allocators to allow them to use per-cpu data structures
efficiently in the following situations:
- Single-threaded applications,
- Multi-threaded applications on large systems (many cores) with limited
cpu affinity mask,
- Multi-threaded applications on large systems (many cores) with
restricted cgroup cpuset per container.
One of the key concern from scheduler maintainers is the overhead
associated with additional spin locks or atomic operations in the
scheduler fast-path. This is why the following optimization is
implemented.
On context switch between threads belonging to the same memory map,
transfer the mm_cid from prev to next without any atomic ops. This
takes care of use-cases involving frequent context switch between
threads belonging to the same memory map.
Additional optimizations can be done if the spin locks added when
context switching between threads belonging to different memory maps end
up being a performance bottleneck. Those are left out of this patch
though. A performance impact would have to be clearly demonstrated to
justify the added complexity.
The credit goes to Paul Turner (Google) for the original virtual cpu id
idea. This feature is implemented based on the discussions with Paul
Turner and Peter Oskolkov (Google), but I took the liberty to implement
scheduler fast-path optimizations and my own NUMA-awareness scheme. The
rumor has it that Google have been running a rseq vcpu_id extension
internally in production for a year. The tcmalloc source code indeed has
comments hinting at a vcpu_id prototype extension to the rseq system
call [1].
The following benchmarks do not show any significant overhead added to
the scheduler context switch by this feature:
* perf bench sched messaging (process)
Baseline: 86.5±0.3 ms
With mm_cid: 86.7±2.6 ms
* perf bench sched messaging (threaded)
Baseline: 84.3±3.0 ms
With mm_cid: 84.7±2.6 ms
* hackbench (process)
Baseline: 82.9±2.7 ms
With mm_cid: 82.9±2.9 ms
* hackbench (threaded)
Baseline: 85.2±2.6 ms
With mm_cid: 84.4±2.9 ms
[1] https://github.com/google/tcmalloc/blob/master/tcmalloc/internal/linux_syscall_support.h#L26
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-8-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:08 +0000 (15:39 -0500)]
selftests/rseq: Implement rseq numa node id field selftest
Test the NUMA node id extension rseq field. Compare it against the value
returned by the getcpu(2) system call while pinned on a specific core.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-7-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:07 +0000 (15:39 -0500)]
selftests/rseq: Use ELF auxiliary vector for extensible rseq
Use the ELF auxiliary vector AT_RSEQ_FEATURE_SIZE to detect the RSEQ
features supported by the kernel.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-6-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:06 +0000 (15:39 -0500)]
rseq: Extend struct rseq with numa node id
Adding the NUMA node id to struct rseq is a straightforward thing to do,
and a good way to figure out if anything in the user-space ecosystem
prevents extending struct rseq.
This NUMA node id field allows memory allocators such as tcmalloc to
take advantage of fast access to the current NUMA node id to perform
NUMA-aware memory allocation.
It can also be useful for implementing fast-paths for NUMA-aware
user-space mutexes.
It also allows implementing getcpu(2) purely in user-space.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-5-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:05 +0000 (15:39 -0500)]
rseq: Introduce extensible rseq ABI
Introduce the extensible rseq ABI, where the feature size supported by
the kernel and the required alignment are communicated to user-space
through ELF auxiliary vectors.
This allows user-space to call rseq registration with a rseq_len of
either 32 bytes for the original struct rseq size (which includes
padding), or larger.
If rseq_len is larger than 32 bytes, then it must be large enough to
contain the feature size communicated to user-space through ELF
auxiliary vectors.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-4-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:04 +0000 (15:39 -0500)]
rseq: Introduce feature size and alignment ELF auxiliary vector entries
Export the rseq feature size supported by the kernel as well as the
required allocation alignment for the rseq per-thread area to user-space
through ELF auxiliary vector entries.
This is part of the extensible rseq ABI.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-3-mathieu.desnoyers@efficios.com
Mathieu Desnoyers [Tue, 22 Nov 2022 20:39:03 +0000 (15:39 -0500)]
selftests/rseq: Fix: Fail thread registration when CONFIG_RSEQ=n
When linking the selftests against a libc which does not handle rseq
registration (before 2.35), rseq thread registration silently succeed
even with CONFIG_RSEQ=n because it erroneously thinks that libc is
handling rseq registration.
This is caused by setting the rseq ownership flag only after the
rseq_available() check. It should rather be set before the
rseq_available() check.
Set the rseq_size to 0 (error value) immediately after the
rseq_available() check fails rather than in the thread registration
functions.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221122203932.231377-2-mathieu.desnoyers@efficios.com
Josh Don [Thu, 17 Nov 2022 00:54:18 +0000 (16:54 -0800)]
sched: Async unthrottling for cfs bandwidth
CFS bandwidth currently distributes new runtime and unthrottles cfs_rq's
inline in an hrtimer callback. Runtime distribution is a per-cpu
operation, and unthrottling is a per-cgroup operation, since a tg walk
is required. On machines with a large number of cpus and large cgroup
hierarchies, this cpus*cgroups work can be too much to do in a single
hrtimer callback: since IRQ are disabled, hard lockups may easily occur.
Specifically, we've found this scalability issue on configurations with
256 cpus, O(1000) cgroups in the hierarchy being throttled, and high
memory bandwidth usage.
To fix this, we can instead unthrottle cfs_rq's asynchronously via a
CSD. Each cpu is responsible for unthrottling itself, thus sharding the
total work more fairly across the system, and avoiding hard lockups.
Signed-off-by: Josh Don <joshdon@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221117005418.3499691-1-joshdon@google.com
Bing Huang [Fri, 18 Nov 2022 03:42:08 +0000 (11:42 +0800)]
sched/topology: Add __init for init_defrootdomain
init_defrootdomain is only used in initialization
Signed-off-by: Bing Huang <huangbing@kylinos.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lkml.kernel.org/r/20221118034208.267330-1-huangbing775@126.com
Linus Torvalds [Sun, 25 Dec 2022 21:41:39 +0000 (13:41 -0800)]
Linux 6.2-rc1
Steven Rostedt (Google) [Tue, 20 Dec 2022 18:45:19 +0000 (13:45 -0500)]
treewide: Convert del_timer*() to timer_shutdown*()
Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown". After a timer is set to this state, then it can no
longer be re-armed.
The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed. It also ignores any locations where
the timer->function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.
This was created by using a coccinelle script and the following
commands:
$ cat timer.cocci
@@
expression ptr, slab;
identifier timer, rfield;
@@
(
- del_timer(&ptr->timer);
+ timer_shutdown(&ptr->timer);
|
- del_timer_sync(&ptr->timer);
+ timer_shutdown_sync(&ptr->timer);
)
... when strict
when != ptr->timer
(
kfree_rcu(ptr, rfield);
|
kmem_cache_free(slab, ptr);
|
kfree(ptr);
)
$ spatch timer.cocci . > /tmp/t.patch
$ patch -p1 < /tmp/t.patch
Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ]
Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ]
Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 23 Dec 2022 22:44:08 +0000 (14:44 -0800)]
Merge tag 'spi-fix-v6.2-rc1' of git://git./linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One driver specific change here which handles the case where a SPI
device for some reason tries to change the bus speed during a message
on fsl_spi hardware, this should be very unusual"
* tag 'spi-fix-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fsl_spi: Don't change speed while chipselect is active
Linus Torvalds [Fri, 23 Dec 2022 22:38:00 +0000 (14:38 -0800)]
Merge tag 'regulator-fix-v6.2-rc1' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"Two core fixes here, one for a long standing race which some Qualcomm
systems have started triggering with their UFS driver and another
fixing a problem with supply lookup introduced by the fixes for devm
related use after free issues that were introduced in this merge
window"
* tag 'regulator-fix-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: fix deadlock on regulator enable
regulator: core: Fix resolve supply lookup issue
Linus Torvalds [Fri, 23 Dec 2022 21:56:41 +0000 (13:56 -0800)]
Merge tag 'coccinelle-6.2' of git://git./linux/kernel/git/jlawall/linux
Pull coccicheck update from Julia Lawall:
"Modernize use of grep in coccicheck:
Use 'grep -E' instead of 'egrep'"
* tag 'coccinelle-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
scripts: coccicheck: use "grep -E" instead of "egrep"
Linus Torvalds [Fri, 23 Dec 2022 20:00:24 +0000 (12:00 -0800)]
Merge tag 'hardening-v6.2-rc1-fixes' of git://git./linux/kernel/git/kees/linux
Pull kernel hardening fixes from Kees Cook:
- Fix CFI failure with KASAN (Sami Tolvanen)
- Fix LKDTM + CFI under GCC 7 and 8 (Kristina Martsenko)
- Limit CONFIG_ZERO_CALL_USED_REGS to Clang > 15.0.6 (Nathan
Chancellor)
- Ignore "contents" argument in LoadPin's LSM hook handling
- Fix paste-o in /sys/kernel/warn_count API docs
- Use READ_ONCE() consistently for oops/warn limit reading
* tag 'hardening-v6.2-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
cfi: Fix CFI failure with KASAN
exit: Use READ_ONCE() for all oops/warn limit reads
security: Restrict CONFIG_ZERO_CALL_USED_REGS to gcc or clang > 15.0.6
lkdtm: cfi: Make PAC test work with GCC 7 and 8
docs: Fix path paste-o for /sys/kernel/warn_count
LoadPin: Ignore the "contents" argument of the LSM hooks
Linus Torvalds [Fri, 23 Dec 2022 19:55:54 +0000 (11:55 -0800)]
Merge tag 'pstore-v6.2-rc1-fixes' of git://git./linux/kernel/git/kees/linux
Pull pstore fixes from Kees Cook:
- Switch pmsg_lock to an rt_mutex to avoid priority inversion (John
Stultz)
- Correctly assign mem_type property (Luca Stefani)
* tag 'pstore-v6.2-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore: Properly assign mem_type property
pstore: Make sure CONFIG_PSTORE_PMSG selects CONFIG_RT_MUTEXES
pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion
Linus Torvalds [Fri, 23 Dec 2022 19:44:20 +0000 (11:44 -0800)]
Merge tag 'dma-mapping-2022-12-23' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"Fix up the sound code to not pass __GFP_COMP to the non-coherent DMA
allocator, as it copes with that just as badly as the coherent
allocator, and then add a check to make sure no one passes the flag
ever again"
* tag 'dma-mapping-2022-12-23' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: reject GFP_COMP for noncoherent allocations
ALSA: memalloc: don't use GFP_COMP for non-coherent dma allocations
Linus Torvalds [Fri, 23 Dec 2022 19:39:18 +0000 (11:39 -0800)]
Merge tag '9p-for-6.2-rc1' of https://github.com/martinetd/linux
Pull 9p updates from Dominique Martinet:
- improve p9_check_errors to check buffer size instead of msize when
possible (e.g. not zero-copy)
- some more syzbot and KCSAN fixes
- minor headers include cleanup
* tag '9p-for-6.2-rc1' of https://github.com/martinetd/linux:
9p/client: fix data race on req->status
net/9p: fix response size check in p9_check_errors()
net/9p: distinguish zero-copy requests
9p/xen: do not memcpy header into req->rc
9p: set req refcount to zero to avoid uninitialized usage
9p/net: Remove unneeded idr.h #include
9p/fs: Remove unneeded idr.h #include
Linus Torvalds [Fri, 23 Dec 2022 19:15:48 +0000 (11:15 -0800)]
Merge tag 'sound-6.2-rc1-2' of git://git./linux/kernel/git/tiwai/sound
Pull more sound updates from Takashi Iwai:
"A few more updates for 6.2: most of changes are about ASoC
device-specific fixes.
- Lots of ASoC Intel AVS extensions and refactoring
- Quirks for ASoC Intel SOF as well as regression fixes
- ASoC Mediatek and Rockchip fixes
- Intel HD-audio HDMI workarounds
- Usual HD- and USB-audio device-specific quirks"
* tag 'sound-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (54 commits)
ALSA: usb-audio: Add new quirk FIXED_RATE for JBL Quantum810 Wireless
ALSA: azt3328: Remove the unused function snd_azf3328_codec_outl()
ASoC: lochnagar: Fix unused lochnagar_of_match warning
ASoC: Intel: Add HP Stream 8 to bytcr_rt5640.c
ASoC: SOF: mediatek: initialize panic_info to zero
ASoC: rt5670: Remove unbalanced pm_runtime_put()
ASoC: Intel: bytcr_rt5640: Add quirk for the Advantech MICA-071 tablet
ASoC: Intel: soc-acpi: update codec addr on 0C11/0C4F product
ASoC: rockchip: spdif: Add missing clk_disable_unprepare() in rk_spdif_runtime_resume()
ASoC: wm8994: Fix potential deadlock
ASoC: mediatek: mt8195: add sof be ops to check audio active
ASoC: SOF: Revert: "core: unregister clients and machine drivers in .shutdown"
ASoC: SOF: Intel: pci-tgl: unblock S5 entry if DMA stop has failed"
ALSA: hda/hdmi: fix stream-id config keep-alive for rt suspend
ALSA: hda/hdmi: set default audio parameters for KAE silent-stream
ALSA: hda/hdmi: fix i915 silent stream programming flow
ALSA: hda: Error out if invalid stream is being setup
ASoC: dt-bindings: fsl-sai: Reinstate i.MX93 SAI compatible string
ASoC: soc-pcm.c: Clear DAIs parameters after stream_active is updated
ASoC: codecs: wcd-clsh: Remove the unused function
...
Linus Torvalds [Fri, 23 Dec 2022 19:09:44 +0000 (11:09 -0800)]
Merge tag 'drm-next-2022-12-23' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Holiday fixes!
Two batches from amd, and one group of i915 changes.
amdgpu:
- Spelling fix
- BO pin fix
- Properly handle polaris 10/11 overlap asics
- GMC9 fix
- SR-IOV suspend fix
- DCN 3.1.4 fix
- KFD userptr locking fix
- SMU13.x fixes
- GDS/GWS/OA handling fix
- Reserved VMID handling fixes
- FRU EEPROM fix
- BO validation fixes
- Avoid large variable on the stack
- S0ix fixes
- SMU 13.x fixes
- VCN fix
- Add missing fence reference
amdkfd:
- Fix init vm error handling
- Fix double release of compute pasid
i915
- Documentation fixes
- OA-perf related fix
- VLV/CHV HDMI/DP audio fix
- Display DDI/Transcoder fix
- Migrate fixes"
* tag 'drm-next-2022-12-23' of git://anongit.freedesktop.org/drm/drm: (39 commits)
drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency
drm/amdgpu: enable VCN DPG for GC IP v11.0.4
drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0
drm/amd/pm: correct the fan speed retrieving in PWM for some SMU13 asics
drm/amd/pm: bump SMU13.0.0 driver_if header to version 0x34
drm/amdgpu: skip MES for S0ix as well since it's part of GFX
drm/amd/pm: avoid large variable on kernel stack
drm/amdkfd: Fix double release compute pasid
drm/amdkfd: Fix kfd_process_device_init_vm error handling
drm/amd/pm: update SMU13.0.0 reported maximum shader clock
drm/amd/pm: correct SMU13.0.0 pstate profiling clock settings
drm/amd/pm: enable GPO dynamic control support for SMU13.0.7
drm/amd/pm: enable GPO dynamic control support for SMU13.0.0
drm/amdgpu: revert "generally allow over-commit during BO allocation"
drm/amdgpu: Remove unnecessary domain argument
drm/amdgpu: Fix size validation for non-exclusive domains (v4)
drm/amdgpu: Check if fru_addr is not NULL (v2)
drm/i915/ttm: consider CCS for backup objects
drm/i915/migrate: fix corner case in CCS aux copying
drm/amdgpu: rework reserved VMID handling
...
Linus Torvalds [Fri, 23 Dec 2022 18:49:45 +0000 (10:49 -0800)]
Merge tag 'mips_6.2_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
"Fixes due to DT changes"
* tag 'mips_6.2_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: dts: bcm63268: Add missing properties to the TWD node
MIPS: ralink: mt7621: avoid to init common ralink reset controller
Linus Torvalds [Fri, 23 Dec 2022 18:45:00 +0000 (10:45 -0800)]
Merge tag 'mm-hotfixes-stable-2022-12-22-14-34' of git://git./linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"Eight fixes, all cc:stable. One is for gcov and the remainder are MM"
* tag 'mm-hotfixes-stable-2022-12-22-14-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
gcov: add support for checksum field
test_maple_tree: add test for mas_spanning_rebalance() on insufficient data
maple_tree: fix mas_spanning_rebalance() on insufficient data
hugetlb: really allocate vma lock for all sharable vmas
kmsan: export kmsan_handle_urb
kmsan: include linux/vmalloc.h
mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
mm, mremap: fix mremap() expanding vma with addr inside vma
Luca Stefani [Thu, 22 Dec 2022 13:10:49 +0000 (14:10 +0100)]
pstore: Properly assign mem_type property
If mem-type is specified in the device tree
it would end up overriding the record_size
field instead of populating mem_type.
As record_size is currently parsed after the
improper assignment with default size 0 it
continued to work as expected regardless of the
value found in the device tree.
Simply changing the target field of the struct
is enough to get mem-type working as expected.
Fixes:
9d843e8fafc7 ("pstore: Add mem_type property DT parsing support")
Cc: stable@vger.kernel.org
Signed-off-by: Luca Stefani <luca@osomprivacy.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221222131049.286288-1-luca@osomprivacy.com
John Stultz [Wed, 21 Dec 2022 05:18:55 +0000 (05:18 +0000)]
pstore: Make sure CONFIG_PSTORE_PMSG selects CONFIG_RT_MUTEXES
In commit
76d62f24db07 ("pstore: Switch pmsg_lock to an rt_mutex
to avoid priority inversion") I changed a lock to an rt_mutex.
However, its possible that CONFIG_RT_MUTEXES is not enabled,
which then results in a build failure, as the 0day bot detected:
https://lore.kernel.org/linux-mm/
202212211244.TwzWZD3H-lkp@intel.com/
Thus this patch changes CONFIG_PSTORE_PMSG to select
CONFIG_RT_MUTEXES, which ensures the build will not fail.
Cc: Wei Wang <wvw@google.com>
Cc: Midas Chien<midaschieh@google.com>
Cc: Connor O'Brien <connoro@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: kernel test robot <lkp@intel.com>
Cc: kernel-team@android.com
Fixes:
76d62f24db07 ("pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221221051855.15761-1-jstultz@google.com
Sami Tolvanen [Thu, 22 Dec 2022 22:57:47 +0000 (22:57 +0000)]
cfi: Fix CFI failure with KASAN
When CFI_CLANG and KASAN are both enabled, LLVM doesn't generate a
CFI type hash for asan.module_ctor functions in translation units
where CFI is disabled, which leads to a CFI failure during boot when
do_ctors calls the affected constructors:
CFI failure at do_basic_setup+0x64/0x90 (target:
asan.module_ctor+0x0/0x28; expected type: 0xa540670c)
Specifically, this happens because CFI is disabled for
kernel/cfi.c. There's no reason to keep CFI disabled here anymore, so
fix the failure by not filtering out CC_FLAGS_CFI for the file.
Note that https://reviews.llvm.org/rG3b14862f0a96 fixed the issue
where LLVM didn't emit CFI type hashes for any sanitizer constructors,
but now type hashes are emitted correctly for TUs that use CFI.
Link: https://github.com/ClangBuiltLinux/linux/issues/1742
Fixes:
89245600941e ("cfi: Switch to -fsanitize=kcfi")
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221222225747.3538676-1-samitolvanen@google.com
Linus Torvalds [Thu, 22 Dec 2022 19:22:31 +0000 (11:22 -0800)]
Merge tag 'scsi-misc' of git://git./linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"Mostly small bug fixes and small updates.
The only things of note is a qla2xxx fix for crash on hotplug and
timeout and the addition of a user exposed abstraction layer for
persistent reservation error return handling (which necessitates the
conversion of nvme.c as well as SCSI)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix crash when I/O abort times out
nvme: Convert NVMe errors to PR errors
scsi: sd: Convert SCSI errors to PR errors
scsi: core: Rename status_byte to sg_status_byte
block: Add error codes for common PR failures
scsi: sd: sd_zbc: Trace zone append emulation
scsi: libfc: Include the correct header
Linus Torvalds [Thu, 22 Dec 2022 19:17:34 +0000 (11:17 -0800)]
Merge tag 'afs-next-
20221222' of git://git./linux/kernel/git/dhowells/linux-fs
Pull afs update from David Howells:
"A fix for a couple of missing resource counter decrements, two small
cleanups of now-unused bits of code and a patch to remove writepage
support from afs"
* tag 'afs-next-
20221222' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Stop implementing ->writepage()
afs: remove afs_cache_netfs and afs_zap_permits() declarations
afs: remove variable nr_servers
afs: Fix lost servers_outstanding count
Linus Torvalds [Thu, 22 Dec 2022 19:07:29 +0000 (11:07 -0800)]
Merge tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git./linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
"perf tools fixes and improvements:
- Don't stop building perf if python setuptools isn't installed, just
disable the affected perf feature.
- Remove explicit reference to python 2.x devel files, that warning
is about python-devel, no matter what version, being unavailable
and thus disabling the linking with libpython.
- Don't use -Werror=switch-enum when building the python support that
handles libtraceevent enumerations, as there is no good way to test
if some specific enum entry is available with the libtraceevent
installed on the system.
- Introduce 'perf lock contention' --type-filter and --lock-filter,
to filter by lock type and lock name:
$ sudo ./perf lock record -a -- ./perf bench sched messaging
$ sudo ./perf lock contention -E 5 -Y spinlock
contended total wait max wait avg wait type caller
802 1.26 ms 11.73 us 1.58 us spinlock __wake_up_common_lock+0x62
13 787.16 us 105.44 us 60.55 us spinlock remove_wait_queue+0x14
12 612.96 us 78.70 us 51.08 us spinlock prepare_to_wait+0x27
114 340.68 us 12.61 us 2.99 us spinlock try_to_wake_up+0x1f5
83 226.38 us 9.15 us 2.73 us spinlock folio_lruvec_lock_irqsave+0x5e
$ sudo ./perf lock contention -l
contended total wait max wait avg wait address symbol
57 1.11 ms 42.83 us 19.54 us
ffff9f4140059000
15 280.88 us 23.51 us 18.73 us
ffffffff9d007a40 jiffies_lock
1 20.49 us 20.49 us 20.49 us
ffffffff9d0d50c0 rcu_state
1 9.02 us 9.02 us 9.02 us
ffff9f41759e9ba0
$ sudo ./perf lock contention -L jiffies_lock,rcu_state
contended total wait max wait avg wait type caller
15 280.88 us 23.51 us 18.73 us spinlock tick_sched_do_timer+0x93
1 20.49 us 20.49 us 20.49 us spinlock __softirqentry_text_start+0xeb
$ sudo ./perf lock contention -L
ffff9f4140059000
contended total wait max wait avg wait type caller
38 779.40 us 42.83 us 20.51 us spinlock worker_thread+0x50
11 216.30 us 39.87 us 19.66 us spinlock queue_work_on+0x39
8 118.13 us 20.51 us 14.77 us spinlock kthread+0xe5
- Fix splitting CC into compiler and options when checking if a
option is present in clang to build the python binding, needed in
systems such as yocto that set CC to, e.g.: "gcc --sysroot=/a/b/c".
- Refresh metris and events for Intel systems: alderlake.
alderlake-n, bonnell, broadwell, broadwellde, broadwellx,
cascadelakex, elkhartlake, goldmont, goldmontplus, haswell,
haswellx, icelake, icelakex, ivybridge, ivytown, jaketown,
knightslanding, meteorlake, nehalemep, nehalemex, sandybridge,
sapphirerapids, silvermont, skylake, skylakex, snowridgex,
tigerlake, westmereep-dp, westmereep-sp, westmereex.
- Add vendor events files (JSON) for AMD Zen 4, from sections
2.1.15.4 "Core Performance Monitor Counters", 2.1.15.5 "L3 Cache
Performance Monitor Counter"s and Section 7.1 "Fabric Performance
Monitor Counter (PMC) Events" in the Processor Programming
Reference (PPR) for AMD Family 19h Model 11h Revision B1
processors.
This constitutes events which capture op dispatch, execution and
retirement, branch prediction, L1 and L2 cache activity, TLB
activity, L3 cache activity and data bandwidth for various links
and interfaces in the Data Fabric.
- Also, from the same PPR are metrics taken from Section 2.1.15.2
"Performance Measurement", including pipeline utilization, which
are new to Zen 4 processors and useful for finding performance
bottlenecks by analyzing activity at different stages of the
pipeline.
- Greatly improve the 'srcline', 'srcline_from', 'srcline_to' and
'srcfile' sort keys performance by postponing calling the external
addr2line utility to the collapse phase of histogram bucketing.
- Fix 'perf test' "all PMU test" to skip parametrized events, that
requires setting up and are not supported by this test.
- Update tools/ copies of kernel headers: features,
disabled-features, fscrypt.h, i915_drm.h, msr-index.h, power pc
syscall table and kvm.h.
- Add .DELETE_ON_ERROR special Makefile target to clean up partially
updated files on error.
- Simplify the mksyscalltbl script for arm64 by avoiding to run the
host compiler to create the syscall table, do it all just with the
shell script.
- Further fixes to honour quiet mode (-q)"
* tag 'perf-tools-for-v6.2-2-2022-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (67 commits)
perf python: Fix splitting CC into compiler and options
perf scripting python: Don't be strict at handling libtraceevent enumerations
perf arm64: Simplify mksyscalltbl
perf build: Remove explicit reference to python 2.x devel files
perf vendor events amd: Add Zen 4 mapping
perf vendor events amd: Add Zen 4 metrics
perf vendor events amd: Add Zen 4 uncore events
perf vendor events amd: Add Zen 4 core events
perf vendor events intel: Refresh westmereex events
perf vendor events intel: Refresh westmereep-sp events
perf vendor events intel: Refresh westmereep-dp events
perf vendor events intel: Refresh tigerlake metrics and events
perf vendor events intel: Refresh snowridgex events
perf vendor events intel: Refresh skylakex metrics and events
perf vendor events intel: Refresh skylake metrics and events
perf vendor events intel: Refresh silvermont events
perf vendor events intel: Refresh sapphirerapids metrics and events
perf vendor events intel: Refresh sandybridge metrics and events
perf vendor events intel: Refresh nehalemex events
perf vendor events intel: Refresh nehalemep events
...
Arnaldo Carvalho de Melo [Thu, 22 Dec 2022 13:56:25 +0000 (10:56 -0300)]
perf python: Fix splitting CC into compiler and options
Noticed this build failure on archlinux:base when building with clang:
clang-14: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument]
In tools/perf/util/setup.py we check if clang supports that option, but
since commit
3cad53a6f9cdbafa ("perf python: Account for multiple words
in CC") this got broken as in the common case where CC="clang":
>>> cc="clang"
>>> print(cc.split()[0])
clang
>>> option="-ffat-lto-objects"
>>> print(str(cc.split()[1:]) + option)
[]-ffat-lto-objects
>>>
And then the Popen will call clang with that bogus option name that in
turn will not produce the b"unknown argument" or b"is not supported"
that this function uses to detect if the option is not available and
thus later on clang will be called with an unknown/unsupported option.
Fix it by looking if really there are options in the provided CC
variable, and if so override 'cc' with the first token and append the
options to the 'option' variable.
Fixes:
3cad53a6f9cdbafa ("perf python: Account for multiple words in CC")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Khem Raj <raj.khem@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Link: http://lore.kernel.org/lkml/Y6Rq5F5NI0v1QQHM@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Howells [Fri, 18 Nov 2022 07:57:27 +0000 (07:57 +0000)]
afs: Stop implementing ->writepage()
We're trying to get rid of the ->writepage() hook[1]. Stop afs from using
it by unlocking the page and calling afs_writepages_region() rather than
folio_write_one().
A flag is passed to afs_writepages_region() to indicate that it should only
write a single region so that we don't flush the entire file in
->write_begin(), but do add other dirty data to the region being written to
try and reduce the number of RPC ops.
This requires ->migrate_folio() to be implemented, so point that at
filemap_migrate_folio() for files and also for symlinks and directories.
This can be tested by turning on the afs_folio_dirty tracepoint and then
doing something like:
xfs_io -c "w 2223 7000" -c "w 15000 22222" -c "w 23 7" /afs/my/test/foo
and then looking in the trace to see if the write at position 15000 gets
stored before page 0 gets dirtied for the write at position 23.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20221113162902.883850-1-hch@lst.de/
Link: https://lore.kernel.org/r/166876785552.222254.4403222906022558715.stgit@warthog.procyon.org.uk/
Gaosheng Cui [Fri, 9 Sep 2022 07:03:53 +0000 (15:03 +0800)]
afs: remove afs_cache_netfs and afs_zap_permits() declarations
afs_zap_permits() has been removed since
commit
be080a6f43c4 ("afs: Overhaul permit caching").
afs_cache_netfs has been removed since
commit
523d27cda149 ("afs: Convert afs to use the new fscache API").
so remove the declare for them from header file.
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20220909070353.1160228-1-cuigaosheng1@huawei.com/
Colin Ian King [Thu, 20 Oct 2022 17:39:23 +0000 (18:39 +0100)]
afs: remove variable nr_servers
Variable nr_servers is no longer being used, the last reference
to it was removed in commit
45df8462730d ("afs: Fix server list handling")
so clean up the code by removing it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20221020173923.21342-1-colin.i.king@gmail.com/
David Howells [Wed, 21 Dec 2022 14:30:48 +0000 (14:30 +0000)]
afs: Fix lost servers_outstanding count
The afs_fs_probe_dispatcher() work function is passed a count on
net->servers_outstanding when it is scheduled (which may come via its
timer). This is passed back to the work_item, passed to the timer or
dropped at the end of the dispatcher function.
But, at the top of the dispatcher function, there are two checks which
skip the rest of the function: if the network namespace is being destroyed
or if there are no fileservers to probe. These two return paths, however,
do not drop the count passed to the dispatcher, and so, sometimes, the
destruction of a network namespace, such as induced by rmmod of the kafs
module, may get stuck in afs_purge_servers(), waiting for
net->servers_outstanding to become zero.
Fix this by adding the missing decrements in afs_fs_probe_dispatcher().
Fixes:
f6cbb368bcb0 ("afs: Actively poll fileservers to maintain NAT or firewall openings")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/167164544917.2072364.3759519569649459359.stgit@warthog.procyon.org.uk/
Takashi Iwai [Thu, 22 Dec 2022 08:18:38 +0000 (09:18 +0100)]
Merge tag 'asoc-v6.2-3' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v6.2
Some more small fixes and board quirks that came in since my last
update, the main one being the fixes from Kai for issues around the
attempts to get kexec working well on SOF based systems.
Jaroslav Kysela [Thu, 15 Dec 2022 15:30:37 +0000 (16:30 +0100)]
ALSA: usb-audio: Add new quirk FIXED_RATE for JBL Quantum810 Wireless
It seems that the firmware is broken and does not accept
the UAC_EP_CS_ATTR_SAMPLE_RATE URB. There is only one rate (48000Hz)
available in the descriptors for the output endpoint.
Create a new quirk QUIRK_FLAG_FIXED_RATE to skip the rate setup
when only one rate is available (fixed).
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216798
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20221215153037.1163786-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Jiapeng Chong [Tue, 13 Dec 2022 06:13:55 +0000 (14:13 +0800)]
ALSA: azt3328: Remove the unused function snd_azf3328_codec_outl()
The function snd_azf3328_codec_outl is defined in the azt3328.c file, but
not called elsewhere, so remove this unused function.
sound/pci/azt3328.c:367:1: warning: unused function 'snd_azf3328_codec_outl'.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3432
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221213061355.62856-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai [Thu, 22 Dec 2022 08:11:48 +0000 (09:11 +0100)]
Merge branch 'for-next' into for-linus
Linus Torvalds [Thu, 22 Dec 2022 03:03:42 +0000 (19:03 -0800)]
Merge tag 'trace-v6.2-1' of git://git./linux/kernel/git/trace/linux-trace
Pull tracing fix from Steven Rostedt:
"I missed this minor hardening of the kernel in the first pull.
- Make monitor structures read only"
* tag 'trace-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv/monitors: Move monitor structure in rodata
Linus Torvalds [Thu, 22 Dec 2022 02:57:24 +0000 (18:57 -0800)]
Merge tag 'trace-probes-v6.2' of git://git./linux/kernel/git/trace/linux-trace
Pull trace probes updates from Steven Rostedt:
- New "symstr" type for dynamic events that writes the name of the
function+offset into the ring buffer and not just the address
- Prevent kernel symbol processing on addresses in user space probes
(uprobes).
- And minor fixes and clean ups
* tag 'trace-probes-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Reject symbol/symstr type for uprobe
tracing/probes: Add symstr type for dynamic events
kprobes: kretprobe events missing on 2-core KVM guest
kprobes: Fix check for probe enabled in kill_kprobe()
test_kprobes: Fix implicit declaration error of test_kprobes
tracing: Fix race where eprobes can be called before the event
Linus Torvalds [Thu, 22 Dec 2022 02:52:15 +0000 (18:52 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull RISC-V kvm updates from Paolo Bonzini:
- Allow unloading KVM module
- Allow KVM user-space to set mvendorid, marchid, and mimpid
- Several fixes and cleanups
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
RISC-V: KVM: Add ONE_REG interface for mvendorid, marchid, and mimpid
RISC-V: KVM: Save mvendorid, marchid, and mimpid when creating VCPU
RISC-V: Export sbi_get_mvendorid() and friends
RISC-V: KVM: Move sbi related struct and functions to kvm_vcpu_sbi.h
RISC-V: KVM: Use switch-case in kvm_riscv_vcpu_set/get_reg()
RISC-V: KVM: Remove redundant includes of asm/csr.h
RISC-V: KVM: Remove redundant includes of asm/kvm_vcpu_timer.h
RISC-V: KVM: Fix reg_val check in kvm_riscv_vcpu_set_reg_config()
RISC-V: KVM: Simplify kvm_arch_prepare_memory_region()
RISC-V: KVM: Exit run-loop immediately if xfer_to_guest fails
RISC-V: KVM: use vma_lookup() instead of find_vma_intersection()
RISC-V: KVM: Add exit logic to main.c
Dave Airlie [Thu, 22 Dec 2022 01:02:55 +0000 (11:02 +1000)]
Merge tag 'amd-drm-fixes-6.2-2022-12-21' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-6.2-2022-12-21:
amdgpu:
- Avoid large variable on the stack
- S0ix fixes
- SMU 13.x fixes
- VCN fix
- Add missing fence reference
amdkfd:
- Fix init vm error handling
- Fix double release of compute pasid
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221221205828.6093-1-alexander.deucher@amd.com
Linus Torvalds [Thu, 22 Dec 2022 00:35:26 +0000 (16:35 -0800)]
Merge tag 'block-6.2-2022-12-19' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- Various fixes for BFQ (Yu, Yuwei)
- Fix for loop command line parsing (Isaac)
- No need to specifically clear REQ_ALLOC_CACHE on IOPOLL downgrade
anymore (me)
- blk-iocost enum fix for newer gcc (Jiri)
- UAF fix for queue release (Ming)
- blk-iolatency error handling memory leak fix (Tejun)
* tag 'block-6.2-2022-12-19' of git://git.kernel.dk/linux:
block: don't clear REQ_ALLOC_CACHE for non-polled requests
block: fix use-after-free of q->q_usage_counter
block, bfq: only do counting of pending-request for BFQ_GROUP_IOSCHED
blk-iolatency: Fix memory leak on add_disk() failures
loop: Fix the max_loop commandline argument treatment when it is set to 0
block/blk-iocost (gcc13): keep large values in a new enum
block, bfq: replace 0/1 with false/true in bic apis
block, bfq: don't return bfqg from __bfq_bic_change_cgroup()
block, bfq: fix possible uaf for 'bfqq->bic'
Linus Torvalds [Thu, 22 Dec 2022 00:28:25 +0000 (16:28 -0800)]
Merge tag 'io_uring-6.2-2022-12-19' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Improve the locking for timeouts. This was originally queued up for
the initial pull, but I messed up and it got missed. (Pavel)
- Fix an issue with running task_work from the wait path, causing some
inefficiencies (me)
- Add a clear of ->free_iov upfront in the 32-bit compat data
importing, so we ensure that it's always sane at completion time (me)
- Use call_rcu_hurry() for the eventfd signaling (Dylan)
- Ordering fix for multishot recv completions (Pavel)
- Add the io_uring trace header to the MAINTAINERS entry (Ammar)
* tag 'io_uring-6.2-2022-12-19' of git://git.kernel.dk/linux:
MAINTAINERS: io_uring: Add include/trace/events/io_uring.h
io_uring/net: fix cleanup after recycle
io_uring/net: ensure compat import handlers clear free_iov
io_uring: include task_work run after scheduling in wait for events
io_uring: don't use TIF_NOTIFY_SIGNAL to test for availability of task_work
io_uring: use call_rcu_hurry if signaling an eventfd
io_uring: fix overflow handling regression
io_uring: ease timeout flush locking requirements
io_uring: revise completion_lock locking
io_uring: protect cq_timeouts with timeout_lock
Rickard x Andersson [Tue, 20 Dec 2022 10:23:18 +0000 (11:23 +0100)]
gcov: add support for checksum field
In GCC version 12.1 a checksum field was added.
This patch fixes a kernel crash occurring during boot when using
gcov-kernel with GCC version 12.2. The crash occurred on a system running
on i.MX6SX.
Link: https://lkml.kernel.org/r/20221220102318.3418501-1-rickaran@axis.com
Fixes:
977ef30a7d88 ("gcov: support GCC 12.1 and newer compilers")
Signed-off-by: Rickard x Andersson <rickaran@axis.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Martin Liska <mliska@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam Howlett [Mon, 19 Dec 2022 16:20:15 +0000 (16:20 +0000)]
test_maple_tree: add test for mas_spanning_rebalance() on insufficient data
Add a test to the maple tree test suite for the spanning rebalance
insufficient node issue does not go undetected again.
Link: https://lkml.kernel.org/r/20221219161922.2708732-3-Liam.Howlett@oracle.com
Fixes:
54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Liam Howlett [Mon, 19 Dec 2022 16:20:15 +0000 (16:20 +0000)]
maple_tree: fix mas_spanning_rebalance() on insufficient data
Mike Rapoport contacted me off-list with a regression in running criu.
Periodic tests fail with an RCU stall during execution. Although rare, it
is possible to hit this with other uses so this patch should be backported
to fix the regression.
This patchset adds the fix and a test case to the maple tree test
suite.
This patch (of 2):
An insufficient node was causing an out-of-bounds access on the node in
mas_leaf_max_gap(). The cause was the faulty detection of the new node
being a root node when overwriting many entries at the end of the tree.
Fix the detection of a new root and ensure there is sufficient data prior
to entering the spanning rebalance loop.
Link: https://lkml.kernel.org/r/20221219161922.2708732-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20221219161922.2708732-2-Liam.Howlett@oracle.com
Fixes:
54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Mike Rapoport <rppt@kernel.org>
Tested-by: Mike Rapoport <rppt@kernel.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mike Kravetz [Mon, 12 Dec 2022 23:50:41 +0000 (15:50 -0800)]
hugetlb: really allocate vma lock for all sharable vmas
Commit
bbff39cc6cbc ("hugetlb: allocate vma lock for all sharable vmas")
removed the pmd sharable checks in the vma lock helper routines. However,
it left the functional version of helper routines behind #ifdef
CONFIG_ARCH_WANT_HUGE_PMD_SHARE. Therefore, the vma lock is not being
used for sharable vmas on architectures that do not support pmd sharing.
On these architectures, a potential fault/truncation race is exposed that
could leave pages in a hugetlb file past i_size until the file is removed.
Move the functional vma lock helpers outside the ifdef, and remove the
non-functional stubs. Since the vma lock is not just for pmd sharing,
rename the routine __vma_shareable_flags_pmd.
Link: https://lkml.kernel.org/r/20221212235042.178355-1-mike.kravetz@oracle.com
Fixes:
bbff39cc6cbc ("hugetlb: allocate vma lock for all sharable vmas")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Arnd Bergmann [Thu, 15 Dec 2022 16:26:57 +0000 (17:26 +0100)]
kmsan: export kmsan_handle_urb
USB support can be in a loadable module, and this causes a link failure
with KMSAN:
ERROR: modpost: "kmsan_handle_urb" [drivers/usb/core/usbcore.ko] undefined!
Export the symbol so it can be used by this module.
Link: https://lkml.kernel.org/r/20221215162710.3802378-1-arnd@kernel.org
Fixes:
553a80188a5d ("kmsan: handle memory sent to/from USB")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Arnd Bergmann [Thu, 15 Dec 2022 16:30:17 +0000 (17:30 +0100)]
kmsan: include linux/vmalloc.h
This is needed for the vmap/vunmap declarations:
mm/kmsan/kmsan_test.c:316:9: error: implicit declaration of function 'vmap' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
^
mm/kmsan/kmsan_test.c:316:29: error: use of undeclared identifier 'VM_MAP'
vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
^
mm/kmsan/kmsan_test.c:322:3: error: implicit declaration of function 'vunmap' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
vunmap(vbuf);
^
Link: https://lkml.kernel.org/r/20221215163046.4079767-1-arnd@kernel.org
Fixes:
8ed691b02ade ("kmsan: add tests for KMSAN")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mathieu Desnoyers [Thu, 15 Dec 2022 19:46:21 +0000 (14:46 -0500)]
mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
When encountering any vma in the range with policy other than MPOL_BIND or
MPOL_PREFERRED_MANY, an error is returned without issuing a mpol_put on
the policy just allocated with mpol_dup().
This allows arbitrary users to leak kernel memory.
Link: https://lkml.kernel.org/r/20221215194621.202816-1-mathieu.desnoyers@efficios.com
Fixes:
c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <stable@vger.kernel.org> [5.17+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vlastimil Babka [Fri, 16 Dec 2022 16:32:27 +0000 (17:32 +0100)]
mm, mremap: fix mremap() expanding vma with addr inside vma
Since 6.1 we have noticed random rpm install failures that were tracked to
mremap() returning -ENOMEM and to commit
ca3d76b0aa80 ("mm: add merging
after mremap resize").
The problem occurs when mremap() expands a VMA in place, but using an
starting address that's not vma->vm_start, but somewhere in the middle.
The extension_pgoff calculation introduced by the commit is wrong in that
case, so vma_merge() fails due to pgoffs not being compatible. Fix the
calculation.
By the way it seems that the situations, where rpm now expands a vma from
the middle, were made possible also due to that commit, thanks to the
improved vma merging. Yet it should work just fine, except for the buggy
calculation.
Link: https://lkml.kernel.org/r/20221216163227.24648-1-vbabka@suse.cz
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359
Fixes:
ca3d76b0aa80 ("mm: add merging after mremap resize")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jakub Matěna <matenajakub@gmail.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Christian König [Mon, 19 Dec 2022 10:47:18 +0000 (11:47 +0100)]
drm/amdgpu: grab extra fence reference for drm_sched_job_add_dependency
That function consumes the reference.
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Reported-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes:
aab9cf7b6954 ("drm/amdgpu: use scheduler dependencies for VM updates")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Arnaldo Carvalho de Melo [Wed, 21 Dec 2022 20:30:38 +0000 (17:30 -0300)]
perf scripting python: Don't be strict at handling libtraceevent enumerations
The build was failing on archlinux because it has a newer libtraceevent
that added a new entry to the tep_print_arg_type enum:
19.72 archlinux:base : FAIL gcc version 12.2.0 (GCC)
util/scripting-engines/trace-event-python.c: In function ‘define_event_symbols’:
util/scripting-engines/trace-event-python.c:281:9: error: enumeration value ‘TEP_PRINT_CPUMASK’ not handled in switch [-Werror=switch-enum]
281 | switch (args->type) {
| ^~~~~~
cc1: all warnings being treated as errors
Since we build with distros that have different versions of
libtraceevent and there is no way to easily test if these enum entries
are available, just disable -Werror=switch-enum for that specific
object.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Saleemkhan Jamadar [Tue, 20 Dec 2022 07:51:44 +0000 (13:21 +0530)]
drm/amdgpu: enable VCN DPG for GC IP v11.0.4
Enable VCN Dynamic Power Gating control for GC IP v11.0.4.
Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Reviewed-by: Veerabadhran Gopalakrishnan <veerabadhran.gopalakrishnan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.0, 6.1
Hans-Peter Nilsson [Mon, 28 Dec 2020 02:41:59 +0000 (03:41 +0100)]
perf arm64: Simplify mksyscalltbl
This patch isn't intended to have any effect on the compiled code. It
just removes one level of indirection: calling the *host* compiler to
build and then run a program that just printf:s the numerical entries of
the syscall-table. In other words, the generated syscalls.c changes
from:
[46] = "ftruncate",
to:
[__NR3264_ftruncate] = "ftruncate",
The latter is as good as the former to the user of perf, and this can be
done directly by the shell-script. The syscalls defined as non-literal
values (like "#define __NR_ftruncate __NR3264_ftruncate") are trivially
resolved at compile-time without namespace-leaking and/or collision for
its sole user, perf/util/syscalltbl.c, that just #includes the generated
file. A future "-mabi=32" support would probably have to handle this
differently, but that is a pre-existing problem not affected by this
simplification.
Calling the *host* compiler only complicates things and accidentally can
get a completely wrong set of files and syscall numbers, see earlier
commits. Note that the script parameter hostcc is now unused.
At the time of this patch, powerpc (the origin, see comments), and also
e.g. x86 has moved on, from filtering "gcc -dM -E" output to reading
separate specific text-file, a table of syscall numbers. IMHO should
arm64 consider adopting this.
Signed-off-by: Hans-Peter Nilsson <hp@axis.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20201228024159.2BB66203B5@pchp3.se.axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Wed, 21 Dec 2022 18:40:08 +0000 (10:40 -0800)]
Merge tag '6.2-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"cifs/smb3 client fixes, mostly related to reconnect and/or DFS:
- two important reconnect fixes: cases where status of recently
connected IPCs and shares were not being updated leaving them in an
incorrect state
- fix for older Windows servers that would return
STATUS_OBJECT_NAME_INVALID to query info requests on DFS links in a
namespace that contained non-ASCII characters, reducing number of
wasted roundtrips.
- fix for leaked -ENOMEM to userspace when cifs.ko couldn't perform
I/O due to a disconnected server, expired or deleted session.
- removal of all unneeded DFS related mount option string parsing
(now using fs_context for automounts)
- improve clarity/readability, moving various DFS related functions
out of fs/cifs/connect.c (which was getting too big to be readable)
to new file.
- Fix problem when large number of DFS connections. Allow sharing of
DFS connections and fix how the referral paths are matched
- Referral caching fix: Instead of looking up ipc connections to
refresh cached referrals, store direct dfs root server's IPC
pointer in new sessions so it can simply be accessed to either
refresh or create a new referral that such connections belong to.
- Fix to allow dfs root server's connections to also failover
- Optimized reconnect of nested DFS links
- Set correct status of IPC connections marked for reconnect"
* tag '6.2-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module number
cifs: don't leak -ENOMEM in smb2_open_file()
cifs: use origin fullpath for automounts
cifs: set correct status of tcon ipc when reconnecting
cifs: optimize reconnect of nested links
cifs: fix source pathname comparison of dfs supers
cifs: fix confusing debug message
cifs: don't block in dfs_cache_noreq_update_tgthint()
cifs: refresh root referrals
cifs: fix refresh of cached referrals
cifs: don't refresh cached referrals from unactive mounts
cifs: share dfs connections and supers
cifs: split out ses and tcon retrieval from mount_get_conns()
cifs: set resolved ip in sockaddr
cifs: remove unused smb3_fs_context::mount_options
cifs: get rid of mount options string parsing
cifs: use fs_context for automounts
cifs: reduce roundtrips on create/qinfo requests
cifs: set correct ipc status after initial tree connect
cifs: set correct tcon status after initial tree connect
Linus Torvalds [Wed, 21 Dec 2022 18:18:17 +0000 (10:18 -0800)]
Merge tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
- added mount options 'hidedotfiles', 'nocase' and 'windows_names'
- fixed xfstests (tested on x86_64): generic/083 generic/263
generic/307 generic/465
- fix some logic errors
- code refactoring and dead code removal
* tag 'ntfs3_for_6.2' of https://github.com/Paragon-Software-Group/linux-ntfs3: (61 commits)
fs/ntfs3: Make if more readable
fs/ntfs3: Improve checking of bad clusters
fs/ntfs3: Fix wrong if in hdr_first_de
fs/ntfs3: Use ALIGN kernel macro
fs/ntfs3: Fix incorrect if in ntfs_set_acl_ex
fs/ntfs3: Check fields while reading
fs/ntfs3: Correct ntfs_check_for_free_space
fs/ntfs3: Restore correct state after ENOSPC in attr_data_get_block
fs/ntfs3: Changing locking in ntfs_rename
fs/ntfs3: Fixing wrong logic in attr_set_size and ntfs_fallocate
fs/ntfs3: atomic_open implementation
fs/ntfs3: Fix wrong indentations
fs/ntfs3: Change new sparse cluster processing
fs/ntfs3: Fixing work with sparse clusters
fs/ntfs3: Simplify ntfs_update_mftmirr function
fs/ntfs3: Remove unused functions
fs/ntfs3: Fix sparse problems
fs/ntfs3: Add ntfs_bitmap_weight_le function and refactoring
fs/ntfs3: Use _le variants of bitops functions
fs/ntfs3: Add functions to modify LE bitmaps
...
Linus Torvalds [Wed, 21 Dec 2022 17:54:00 +0000 (09:54 -0800)]
Merge tag 'fs.mount.propagation.fix.v6.2-rc1' of git://git./linux/kernel/git/vfs/idmapping
Pull mount propagation fix from Christian Brauner:
"The propagate_mnt() function handles mount propagation when creating
mounts and propagates the source mount tree @source_mnt to all
applicable nodes of the destination propagation mount tree headed by
@dest_mnt.
Unfortunately it contains a bug where it fails to terminate at peers
of @source_mnt when looking up copies of the source mount that become
masters for copies of the source mount tree mounted on top of slaves
in the destination propagation tree causing a NULL dereference.
This fixes that bug (with a long commit message for a seven character
fix but hopefully it'll help us fix issues faster in the future rather
than having to go through the pain of having to relearn everything
once more)"
* tag 'fs.mount.propagation.fix.v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
pnode: terminate at peers of source
Arnaldo Carvalho de Melo [Wed, 21 Dec 2022 17:08:20 +0000 (14:08 -0300)]
perf build: Remove explicit reference to python 2.x devel files
If the libpython feature test (tools/build/feature/test-libpython.c)
fails, then the python-devel is missing, it doesn't mattere if it is for
python2 or 3, remove that explicit 2.x reference.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sandipan Das [Wed, 14 Dec 2022 08:26:52 +0000 (13:56 +0530)]
perf vendor events amd: Add Zen 4 mapping
Add a regular expression in the map file so that appropriate JSON event
files are used for AMD Zen 4 processors. Restrict the regular expression
for AMD Zen 3 processors to known model ranges since they also belong to
Family 19h.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20221214082652.419965-5-sandipan.das@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sandipan Das [Wed, 14 Dec 2022 08:26:51 +0000 (13:56 +0530)]
perf vendor events amd: Add Zen 4 metrics
Add metrics taken from Section 2.1.15.2 "Performance Measurement" in
the Processor Programming Reference (PPR) for AMD Family 19h Model 11h
Revision B1 processors.
The recommended metrics are sourced from Table 27 "Guidance for Common
Performance Statistics with Complex Event Selects".
The pipeline utilization metrics are sourced from Table 28 "Guidance
for Pipeline Utilization Analysis Statistics". These are new to Zen 4
processors and useful for finding performance bottlenecks by analyzing
activity at different stages of the pipeline. Metric groups have been
added for Level 1 and Level 2 analysis.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20221214082652.419965-4-sandipan.das@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sandipan Das [Wed, 14 Dec 2022 08:26:50 +0000 (13:56 +0530)]
perf vendor events amd: Add Zen 4 uncore events
Add uncore events taken from Section 2.1.15.5 "L3 Cache Performance
Monitor Counter"s and Section 7.1 "Fabric Performance Monitor Counter
(PMC) Events" in the Processor Programming Reference (PPR) for AMD
Family 19h Model 11h Revision B1 processors. This constitutes events
which capture L3 cache activity and data bandwidth for various links
and interfaces in the Data Fabric.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20221214082652.419965-3-sandipan.das@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sandipan Das [Wed, 14 Dec 2022 08:26:49 +0000 (13:56 +0530)]
perf vendor events amd: Add Zen 4 core events
Add core events taken from Section 2.1.15.4 "Core Performance Monitor
Counters" in the Processor Programming Reference (PPR) for AMD Family
19h Model 11h Revision B1 processors. This constitutes events which
capture op dispatch, execution and retirement, branch prediction, L1
and L2 cache activity, TLB activity, etc.
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jirka Hladky <jhladky@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20221214082652.419965-2-sandipan.das@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 15 Dec 2022 06:55:10 +0000 (22:55 -0800)]
perf vendor events intel: Refresh westmereex events
Update the westmereex events using the new tooling from:
https://github.com/intel/perfmon
The events are unchanged but unused json values are removed. This
increases consistency across the json files.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221215065510.1621979-24-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Thu, 15 Dec 2022 06:55:09 +0000 (22:55 -0800)]
perf vendor events intel: Refresh westmereep-sp events
Update the westmereep-sp events using the new tooling from:
https://github.com/intel/perfmon
The events are unchanged but unused json values are removed. This
increases consistency across the json files.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221215065510.1621979-23-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>