Marc Zyngier [Fri, 25 Feb 2022 13:49:48 +0000 (13:49 +0000)]
Merge branch kvm-arm64/psci-1.1 into kvmarm-master/next
* kvm-arm64/psci-1.1:
: .
: Limited PSCI-1.1 support from Will Deacon:
:
: This small series exposes the PSCI SYSTEM_RESET2 call to guests, which
: allows the propagation of a "reset_type" and a "cookie" back to the VMM.
: Although Linux guests only ever pass 0 for the type ("SYSTEM_WARM_RESET"),
: the vendor-defined range can be used by a bootloader to provide additional
: information about the reset, such as an error code.
: .
KVM: arm64: Remove unneeded semicolons
KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field
KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest
KVM: arm64: Bump guest PSCI version to 1.1
Signed-off-by: Marc Zyngier <maz@kernel.org>
Changcheng Deng [Wed, 23 Feb 2022 09:27:50 +0000 (09:27 +0000)]
KVM: arm64: Remove unneeded semicolons
Fix the following coccicheck review:
./arch/arm64/kvm/psci.c: 379: 3-4: Unneeded semicolon
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
[maz: squashed another instance of the same issue in the patch]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220223092750.1934130-1-deng.changcheng@zte.com.cn
Link: https://lore.kernel.org/r/20220225122922.GA19390@willie-the-truck
Will Deacon [Mon, 21 Feb 2022 15:35:24 +0000 (15:35 +0000)]
KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field
When handling reset and power-off PSCI calls from the guest, we
initialise X0 to PSCI_RET_INTERNAL_FAILURE in case the VMM tries to
re-run the vCPU after issuing the call.
Unfortunately, this also means that the VMM cannot see which PSCI call
was issued and therefore cannot distinguish between PSCI SYSTEM_RESET
and SYSTEM_RESET2 calls, which is necessary in order to determine the
validity of the "reset_type" in X1.
Allocate bit 0 of the previously unused 'flags' field of the
system_event structure so that we can indicate the PSCI call used to
initiate the reset.
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-4-will@kernel.org
Will Deacon [Mon, 21 Feb 2022 15:35:23 +0000 (15:35 +0000)]
KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest
PSCI v1.1 introduces the optional SYSTEM_RESET2 call, which allows the
caller to provide a vendor-specific "reset type" and "cookie" to request
a particular form of reset or shutdown.
Expose this call to the guest and handle it in the same way as PSCI
SYSTEM_RESET, along with some basic range checking on the type argument.
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-3-will@kernel.org
Will Deacon [Mon, 21 Feb 2022 15:35:22 +0000 (15:35 +0000)]
KVM: arm64: Bump guest PSCI version to 1.1
Expose PSCI version v1.1 to the guest by default. The only difference
for now is that an updated version number is reported by PSCI_VERSION.
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220221153524.15397-2-will@kernel.org
Marc Zyngier [Tue, 8 Feb 2022 17:54:41 +0000 (17:54 +0000)]
Merge branch kvm-arm64/pmu-bl into kvmarm-master/next
* kvm-arm64/pmu-bl:
: .
: Improve PMU support on heterogeneous systems, courtesy of Alexandru Elisei
: .
KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
KVM: arm64: Keep a list of probed PMUs
KVM: arm64: Keep a per-VM pointer to the default PMU
perf: Fix wrong name in comment for struct perf_cpu_context
KVM: arm64: Do not change the PMU event filter after a VCPU has run
Signed-off-by: Marc Zyngier <maz@kernel.org>
Alexandru Elisei [Thu, 27 Jan 2022 16:17:59 +0000 (16:17 +0000)]
KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU
Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU
device ioctl. If the VCPU is scheduled on a physical CPU which has a
different PMU, the perf events needed to emulate a guest PMU won't be
scheduled in and the guest performance counters will stop counting. Treat
it as an userspace error and refuse to run the VCPU in this situation.
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
Alexandru Elisei [Thu, 27 Jan 2022 16:17:58 +0000 (16:17 +0000)]
KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
When KVM creates an event and there are more than one PMUs present on the
system, perf_init_event() will go through the list of available PMUs and
will choose the first one that can create the event. The order of the PMUs
in this list depends on the probe order, which can change under various
circumstances, for example if the order of the PMU nodes change in the DTB
or if asynchronous driver probing is enabled on the kernel command line
(with the driver_async_probe=armv8-pmu option).
Another consequence of this approach is that on heteregeneous systems all
virtual machines that KVM creates will use the same PMU. This might cause
unexpected behaviour for userspace: when a VCPU is executing on the
physical CPU that uses this default PMU, PMU events in the guest work
correctly; but when the same VCPU executes on another CPU, PMU events in
the guest will suddenly stop counting.
Fortunately, perf core allows user to specify on which PMU to create an
event by using the perf_event_attr->type field, which is used by
perf_init_event() as an index in the radix tree of available PMUs.
Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
attribute to allow userspace to specify the arm_pmu that KVM will use when
creating events for that VCPU. KVM will make no attempt to run the VCPU on
the physical CPUs that share the PMU, leaving it up to userspace to manage
the VCPU threads' affinity accordingly.
To ensure that KVM doesn't expose an asymmetric system to the guest, the
PMU set for one VCPU will be used by all other VCPUs. Once a VCPU has run,
the PMU cannot be changed in order to avoid changing the list of available
events for a VCPU, or to change the semantics of existing events.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-6-alexandru.elisei@arm.com
Alexandru Elisei [Thu, 27 Jan 2022 16:17:57 +0000 (16:17 +0000)]
KVM: arm64: Keep a list of probed PMUs
The ARM PMU driver calls kvm_host_pmu_init() after probing to tell KVM that
a hardware PMU is available for guest emulation. Heterogeneous systems can
have more than one PMU present, and the callback gets called multiple
times, once for each of them. Keep track of all the PMUs available to KVM,
as they're going to be needed later.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-5-alexandru.elisei@arm.com
Marc Zyngier [Thu, 27 Jan 2022 16:17:56 +0000 (16:17 +0000)]
KVM: arm64: Keep a per-VM pointer to the default PMU
As we are about to allow selection of the PMU exposed to a guest, start by
keeping track of the default one instead of only the PMU version.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20220127161759.53553-4-alexandru.elisei@arm.com
Alexandru Elisei [Thu, 27 Jan 2022 16:17:55 +0000 (16:17 +0000)]
perf: Fix wrong name in comment for struct perf_cpu_context
Commit
0793a61d4df8 ("performance counters: core code") added the perf
subsystem (then called Performance Counters) to Linux, creating the struct
perf_cpu_context. The comment for the struct referred to it as a "struct
perf_counter_cpu_context".
Commit
cdd6c482c9ff ("perf: Do the big rename: Performance Counters ->
Performance Events") changed the comment to refer to a "struct
perf_event_cpu_context", which was still the wrong name for the struct.
Change the comment to say "struct perf_cpu_context".
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-3-alexandru.elisei@arm.com
Marc Zyngier [Thu, 27 Jan 2022 16:17:54 +0000 (16:17 +0000)]
KVM: arm64: Do not change the PMU event filter after a VCPU has run
Userspace can specify which events a guest is allowed to use with the
KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be
identified by a guest from reading the PMCEID{0,1}_EL0 registers.
Changing the PMU event filter after a VCPU has run can cause reads of the
registers performed before the filter is changed to return different values
than reads performed with the new event filter in place. The architecture
defines the two registers as read-only, and this behaviour contradicts
that.
Keep track when the first VCPU has run and deny changes to the PMU event
filter to prevent this from happening.
Signed-off-by: Marc Zyngier <maz@kernel.org>
[ Alexandru E: Added commit message, updated ioctl documentation ]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
Marc Zyngier [Tue, 8 Feb 2022 15:29:28 +0000 (15:29 +0000)]
Merge branch kvm-arm64/misc-5.18 into kvmarm-master/next
* kvm-arm64/misc-5.18:
: .
: Misc fixes for KVM/arm64 5.18:
:
: - Drop unused kvm parameter to kvm_psci_version()
:
: - Implement CONFIG_DEBUG_LIST at EL2
: .
KVM: arm64: pkvm: Implement CONFIG_DEBUG_LIST at EL2
KVM: arm64: Drop unused param from kvm_psci_version()
Signed-off-by: Marc Zyngier <maz@kernel.org>
Keir Fraser [Mon, 31 Jan 2022 12:40:53 +0000 (12:40 +0000)]
KVM: arm64: pkvm: Implement CONFIG_DEBUG_LIST at EL2
Currently the check functions are stubbed out at EL2. Implement
versions suitable for the constrained EL2 environment.
Signed-off-by: Keir Fraser <keirf@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220131124114.3103337-1-keirf@google.com
Oliver Upton [Tue, 8 Feb 2022 01:27:05 +0000 (01:27 +0000)]
KVM: arm64: Drop unused param from kvm_psci_version()
kvm_psci_version() consumes a pointer to struct kvm in addition to a
vcpu pointer. Drop the kvm pointer as it is unused. While the comment
suggests the explicit kvm pointer was useful for calling from hyp, there
exist no such callsite in hyp.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220208012705.640444-1-oupton@google.com
Marc Zyngier [Tue, 8 Feb 2022 15:20:16 +0000 (15:20 +0000)]
Merge branch kvm-arm64/selftest/vgic-5.18 into kvmarm-master/next
* kvm-arm64/selftest/vgic-5.18:
: .
: A bunch of selftest fixes, courtesy of Ricardo Koller
: .
kvm: selftests: aarch64: use a tighter assert in vgic_poke_irq()
kvm: selftests: aarch64: fix some vgic related comments
kvm: selftests: aarch64: fix the failure check in kvm_set_gsi_routing_irqchip_check
kvm: selftests: aarch64: pass vgic_irq guest args as a pointer
kvm: selftests: aarch64: fix assert in gicv3_access_reg
Signed-off-by: Marc Zyngier <maz@kernel.org>
Ricardo Koller [Thu, 27 Jan 2022 03:08:58 +0000 (19:08 -0800)]
kvm: selftests: aarch64: use a tighter assert in vgic_poke_irq()
vgic_poke_irq() checks that the attr argument passed to the vgic device
ioctl is sane. Make this check tighter by moving it to after the last
attr update.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-6-ricarkol@google.com
Ricardo Koller [Thu, 27 Jan 2022 03:08:57 +0000 (19:08 -0800)]
kvm: selftests: aarch64: fix some vgic related comments
Fix the formatting of some comments and the wording of one of them (in
gicv3_access_reg).
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-5-ricarkol@google.com
Ricardo Koller [Thu, 27 Jan 2022 03:08:56 +0000 (19:08 -0800)]
kvm: selftests: aarch64: fix the failure check in kvm_set_gsi_routing_irqchip_check
kvm_set_gsi_routing_irqchip_check(expect_failure=true) is used to check
the error code returned by the kernel when trying to setup an invalid
gsi routing table. The ioctl fails if "pin >= KVM_IRQCHIP_NUM_PINS", so
kvm_set_gsi_routing_irqchip_check() should test the error only when
"intid >= KVM_IRQCHIP_NUM_PINS+32". The issue is that the test check is
"intid >= KVM_IRQCHIP_NUM_PINS", so for a case like "intid =
KVM_IRQCHIP_NUM_PINS" the test wrongly assumes that the kernel will
return an error. Fix this by using the right check.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-4-ricarkol@google.com
Ricardo Koller [Thu, 27 Jan 2022 03:08:55 +0000 (19:08 -0800)]
kvm: selftests: aarch64: pass vgic_irq guest args as a pointer
The guest in vgic_irq gets its arguments in a struct. This struct used
to fit nicely in a single register so vcpu_args_set() was able to pass
it by value by setting x0 with it. Unfortunately, this args struct grew
after some commits and some guest args became random (specically
kvm_supports_irqfd).
Fix this by passing the guest args as a pointer (after allocating some
guest memory for it).
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-3-ricarkol@google.com
Ricardo Koller [Thu, 27 Jan 2022 03:08:54 +0000 (19:08 -0800)]
kvm: selftests: aarch64: fix assert in gicv3_access_reg
The val argument in gicv3_access_reg can have any value when used for a
read, not necessarily 0. Fix the assert by checking val only for
writes.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reported-by: Reiji Watanabe <reijiw@google.com>
Cc: Andrew Jones <drjones@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127030858.3269036-2-ricarkol@google.com
Marc Zyngier [Tue, 8 Feb 2022 14:58:38 +0000 (14:58 +0000)]
Merge branch kvm-arm64/vmid-allocator into kvmarm-master/next
* kvm-arm64/vmid-allocator:
: .
: VMID allocation rewrite from Shameerali Kolothum Thodi, paving the
: way for pinned VMIDs and SVA.
: .
KVM: arm64: Make active_vmids invalid on vCPU schedule out
KVM: arm64: Align the VMID allocation with the arm64 ASID
KVM: arm64: Make VMID bits accessible outside of allocator
KVM: arm64: Introduce a new VMID allocator for KVM
Signed-off-by: Marc Zyngier <maz@kernel.org>
Shameer Kolothum [Mon, 22 Nov 2021 12:18:44 +0000 (12:18 +0000)]
KVM: arm64: Make active_vmids invalid on vCPU schedule out
Like ASID allocator, we copy the active_vmids into the
reserved_vmids on a rollover. But it's unlikely that
every CPU will have a vCPU as current task and we may
end up unnecessarily reserving the VMID space.
Hence, set active_vmids to an invalid one when scheduling
out a vCPU.
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-5-shameerali.kolothum.thodi@huawei.com
Julien Grall [Mon, 22 Nov 2021 12:18:43 +0000 (12:18 +0000)]
KVM: arm64: Align the VMID allocation with the arm64 ASID
At the moment, the VMID algorithm will send an SGI to all the
CPUs to force an exit and then broadcast a full TLB flush and
I-Cache invalidation.
This patch uses the new VMID allocator. The benefits are:
- Aligns with arm64 ASID algorithm.
- CPUs are not forced to exit at roll-over. Instead,
the VMID will be marked reserved and context invalidation
is broadcasted. This will reduce the IPIs traffic.
- More flexible to add support for pinned KVM VMIDs in
the future.
With the new algo, the code is now adapted:
- The call to update_vmid() will be done with preemption
disabled as the new algo requires to store information
per-CPU.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-4-shameerali.kolothum.thodi@huawei.com
Shameer Kolothum [Mon, 22 Nov 2021 12:18:42 +0000 (12:18 +0000)]
KVM: arm64: Make VMID bits accessible outside of allocator
Since we already set the kvm_arm_vmid_bits in the VMID allocator
init function, make it accessible outside as well so that it can
be used in the subsequent patch.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-3-shameerali.kolothum.thodi@huawei.com
Shameer Kolothum [Mon, 22 Nov 2021 12:18:41 +0000 (12:18 +0000)]
KVM: arm64: Introduce a new VMID allocator for KVM
A new VMID allocator for arm64 KVM use. This is based on
arm64 ASID allocator algorithm.
One major deviation from the ASID allocator is the way we
flush the context. Unlike ASID allocator, we expect less
frequent rollover in the case of VMIDs. Hence, instead of
marking the CPU as flush_pending and issuing a local context
invalidation on the next context switch, we broadcast TLB
flush + I-cache invalidation over the inner shareable domain
on rollover.
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211122121844.867-2-shameerali.kolothum.thodi@huawei.com
Marc Zyngier [Tue, 8 Feb 2022 14:44:46 +0000 (14:44 +0000)]
Merge branch kvm-arm64/fpsimd-doc into kvmarm-master/next
* kvm-arm64/fpsimd-doc:
: .
: FPSIMD documentation update, courtesy of Mark Brown
: .
arm64/fpsimd: Clarify the purpose of using last in fpsimd_save()
KVM: arm64: Add some more comments in kvm_hyp_handle_fpsimd()
KVM: arm64: Add comments for context flush and sync callbacks
Signed-off-by: Marc Zyngier <maz@kernel.org>
Mark Brown [Mon, 24 Jan 2022 16:11:15 +0000 (16:11 +0000)]
arm64/fpsimd: Clarify the purpose of using last in fpsimd_save()
When saving the floating point context in fpsimd_save() we always reference
the state using last-> rather than using current->. Looking at the FP code
in isolation the reason for this is not entirely obvious, it's done because
when KVM is running it will bind the guest context and rely on the host
writing out the guest state on context switch away from the guest.
There's a slight trick here in that KVM still uses TIF_FOREIGN_FPSTATE and
TIF_SVE to communicate what needs to be saved, it maintains those flags
and restores them when it is done running the guest so that the normal
restore paths function when we return back to userspace.
Add a comment to explain this to help future readers work out what's going
on a bit faster.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220124161115.115200-1-broonie@kernel.org
Mark Brown [Mon, 24 Jan 2022 15:57:20 +0000 (15:57 +0000)]
KVM: arm64: Add some more comments in kvm_hyp_handle_fpsimd()
The handling for FPSIMD/SVE traps is multi stage and involves some trap
manipulation which isn't quite so immediately obvious as might be desired
so add a few more comments.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220124155720.3943374-3-broonie@kernel.org
Mark Brown [Mon, 24 Jan 2022 15:57:19 +0000 (15:57 +0000)]
KVM: arm64: Add comments for context flush and sync callbacks
Add a little bit of information on where _ctxflush_fp() and _ctxsync_fp()
are called to help people unfamiliar with the code get up to speed.
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220124155720.3943374-2-broonie@kernel.org
Marc Zyngier [Tue, 8 Feb 2022 14:29:29 +0000 (14:29 +0000)]
Merge branch kvm-arm64/mmu-rwlock into kvmarm-master/next
* kvm-arm64/mmu-rwlock:
: .
: MMU locking optimisations from Jing Zhang, allowing permission
: relaxations to occur in parallel.
: .
KVM: selftests: Add vgic initialization for dirty log perf test for ARM
KVM: arm64: Add fast path to handle permission relaxation during dirty logging
KVM: arm64: Use read/write spin lock for MMU protection
Signed-off-by: Marc Zyngier <maz@kernel.org>
Jing Zhang [Tue, 18 Jan 2022 01:57:03 +0000 (01:57 +0000)]
KVM: selftests: Add vgic initialization for dirty log perf test for ARM
For ARM64, if no vgic is setup before the dirty log perf test, the
userspace irqchip would be used, which would affect the dirty log perf
test result.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220118015703.3630552-4-jingzhangos@google.com
Jing Zhang [Tue, 18 Jan 2022 01:57:02 +0000 (01:57 +0000)]
KVM: arm64: Add fast path to handle permission relaxation during dirty logging
To reduce MMU lock contention during dirty logging, all permission
relaxation operations would be performed under read lock.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220118015703.3630552-3-jingzhangos@google.com
Jing Zhang [Tue, 18 Jan 2022 01:57:01 +0000 (01:57 +0000)]
KVM: arm64: Use read/write spin lock for MMU protection
Replace MMU spinlock with rwlock and update all instances of the lock
being acquired with a write lock acquisition.
Future commit will add a fast path for permission relaxation during
dirty logging under a read lock.
Signed-off-by: Jing Zhang <jingzhangos@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220118015703.3630552-2-jingzhangos@google.com
Marc Zyngier [Tue, 8 Feb 2022 14:26:30 +0000 (14:26 +0000)]
Merge branch kvm-arm64/oslock into kvmarm-master/next
* kvm-arm64/oslock:
: .
: Debug OS-Lock emulation courtesy of Oliver Upton. From the cover letter:
:
: "KVM does not implement the debug architecture to the letter of the
: specification. One such issue is the fact that KVM treats the OS Lock as
: RAZ/WI, rather than emulating its behavior on hardware. This series adds
: emulation support for the OS Lock to KVM. Emulation is warranted as the
: OS Lock affects debug exceptions taken from all ELs, and is not limited
: to only the context of the guest."
: .
selftests: KVM: Test OS lock behavior
selftests: KVM: Add OSLSR_EL1 to the list of blessed regs
KVM: arm64: Emulate the OS Lock
KVM: arm64: Allow guest to set the OSLK bit
KVM: arm64: Stash OSLSR_EL1 in the cpu context
KVM: arm64: Correctly treat writes to OSLSR_EL1 as undefined
Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Thu, 3 Feb 2022 17:41:59 +0000 (17:41 +0000)]
selftests: KVM: Test OS lock behavior
KVM now correctly handles the OS Lock for its guests. When set, KVM
blocks all debug exceptions originating from the guest. Add test cases
to the debug-exceptions test to assert that software breakpoint,
hardware breakpoint, watchpoint, and single-step exceptions are in fact
blocked.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-7-oupton@google.com
Oliver Upton [Thu, 3 Feb 2022 17:41:58 +0000 (17:41 +0000)]
selftests: KVM: Add OSLSR_EL1 to the list of blessed regs
OSLSR_EL1 is now part of the visible system register state. Add it to
the get-reg-list selftest to ensure we keep it that way.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-6-oupton@google.com
Oliver Upton [Thu, 3 Feb 2022 17:41:57 +0000 (17:41 +0000)]
KVM: arm64: Emulate the OS Lock
The OS lock blocks all debug exceptions at every EL. To date, KVM has
not implemented the OS lock for its guests, despite the fact that it is
mandatory per the architecture. Simple context switching between the
guest and host is not appropriate, as its effects are not constrained to
the guest context.
Emulate the OS Lock by clearing MDE and SS in MDSCR_EL1, thereby
blocking all but software breakpoint instructions.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-5-oupton@google.com
Oliver Upton [Thu, 3 Feb 2022 17:41:56 +0000 (17:41 +0000)]
KVM: arm64: Allow guest to set the OSLK bit
Allow writes to OSLAR and forward the OSLK bit to OSLSR. Do nothing with
the value for now.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-4-oupton@google.com
Oliver Upton [Thu, 3 Feb 2022 17:41:55 +0000 (17:41 +0000)]
KVM: arm64: Stash OSLSR_EL1 in the cpu context
An upcoming change to KVM will emulate the OS Lock from the PoV of the
guest. Add OSLSR_EL1 to the cpu context and handle reads using the
stored value. Define some mnemonics for for handling the OSLM field and
use them to make the reset value of OSLSR_EL1 more readable.
Wire up a custom handler for writes from userspace and prevent any of
the invariant bits from changing. Note that the OSLK bit is not
invariant and will be made writable by the aforementioned change.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-3-oupton@google.com
Oliver Upton [Thu, 3 Feb 2022 17:41:54 +0000 (17:41 +0000)]
KVM: arm64: Correctly treat writes to OSLSR_EL1 as undefined
Writes to OSLSR_EL1 are UNDEFINED and should never trap from EL1 to
EL2, but the kvm trap handler for OSLSR_EL1 handles writes via
ignore_write(). This is confusing to readers of code, but should have
no functional impact.
For clarity, use write_to_read_only() rather than ignore_write(). If a
trap is unexpectedly taken to EL2 in violation of the architecture, this
will WARN_ONCE() and inject an undef into the guest.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
[adopted Mark's changelog suggestion, thanks!]
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220203174159.2887882-2-oupton@google.com
Linus Torvalds [Sun, 6 Feb 2022 20:20:50 +0000 (12:20 -0800)]
Linux 5.17-rc3
Linus Torvalds [Sun, 6 Feb 2022 18:34:45 +0000 (10:34 -0800)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Various bug fixes for ext4 fast commit and inline data handling.
Also fix regression introduced as part of moving to the new mount API"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
fs/ext4: fix comments mentioning i_mutex
ext4: fix incorrect type issue during replay_del_range
jbd2: fix kernel-doc descriptions for jbd2_journal_shrink_{scan,count}()
ext4: fix potential NULL pointer dereference in ext4_fill_super()
jbd2: refactor wait logic for transaction updates into a common function
jbd2: cleanup unused functions declarations from jbd2.h
ext4: fix error handling in ext4_fc_record_modified_inode()
ext4: remove redundant max inline_size check in ext4_da_write_inline_data_begin()
ext4: fix error handling in ext4_restore_inline_data()
ext4: fast commit may miss file actions
ext4: fast commit may not fallback for ineligible commit
ext4: modify the logic of ext4_mb_new_blocks_simple
ext4: prevent used blocks from being allocated during fast commit replay
Linus Torvalds [Sun, 6 Feb 2022 18:18:23 +0000 (10:18 -0800)]
Merge tag 'perf-tools-fixes-for-v5.17-2022-02-06' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix display of grouped aliased events in 'perf stat'.
- Add missing branch_sample_type to perf_event_attr__fprintf().
- Apply correct label to user/kernel symbols in branch mode.
- Fix 'perf ftrace' system_wide tracing, it has to be set before
creating the maps.
- Return error if procfs isn't mounted for PID namespaces when
synthesizing records for pre-existing processes.
- Set error stream of objdump process for 'perf annotate' TUI, to avoid
garbling the screen.
- Add missing arm64 support to perf_mmap__read_self(), the kernel part
got into 5.17.
- Check for NULL pointer before dereference writing debug info about a
sample.
- Update UAPI copies for asound, perf_event, prctl and kvm headers.
- Fix a typo in bpf_counter_cgroup.c.
* tag 'perf-tools-fixes-for-v5.17-2022-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf ftrace: system_wide collection is not effective by default
libperf: Add arm64 support to perf_mmap__read_self()
tools include UAPI: Sync sound/asound.h copy with the kernel sources
perf stat: Fix display of grouped aliased events
perf tools: Apply correct label to user/kernel symbols in branch mode
perf bpf: Fix a typo in bpf_counter_cgroup.c
perf synthetic-events: Return error if procfs isn't mounted for PID namespaces
perf session: Check for NULL pointer before dereference
perf annotate: Set error stream of objdump process for TUI
perf tools: Add missing branch_sample_type to perf_event_attr__fprintf()
tools headers UAPI: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Sync linux/prctl.h with the kernel sources
perf beauty: Make the prctl arg regexp more strict to cope with PR_SET_VMA
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
tools include UAPI: Sync sound/asound.h copy with the kernel sources
Linus Torvalds [Sun, 6 Feb 2022 18:11:14 +0000 (10:11 -0800)]
Merge tag 'perf_urgent_for_v5.17_rc3' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Intel/PT: filters could crash the kernel
- Intel: default disable the PMU for SMM, some new-ish EFI firmware has
started using CPL3 and the PMU CPL filters don't discriminate against
SMM, meaning that CPL3 (userspace only) events now also count EFI/SMM
cycles.
- Fixup for perf_event_attr::sig_data
* tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/pt: Fix crash with stop filters in single-range mode
perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
selftests/perf_events: Test modification of perf_event_attr::sig_data
perf: Copy perf_event_attr::sig_data on modification
x86/perf: Default set FREEZE_ON_SMI for all
Linus Torvalds [Sun, 6 Feb 2022 18:04:43 +0000 (10:04 -0800)]
Merge tag 'objtool_urgent_for_v5.17_rc3' of git://git./linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
"Fix a potential truncated string warning triggered by gcc12"
* tag 'objtool_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix truncated string warning
Linus Torvalds [Sun, 6 Feb 2022 18:00:40 +0000 (10:00 -0800)]
Merge tag 'irq_urgent_for_v5.17_rc3' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
"Remove a bogus warning introduced by the recent PCI MSI irq affinity
overhaul"
* tag 'irq_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Remove bogus warning in pci_irq_get_affinity()
Linus Torvalds [Sun, 6 Feb 2022 17:57:39 +0000 (09:57 -0800)]
Merge tag 'edac_urgent_for_v5.17_rc3' of git://git./linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
"Fix altera and xgene EDAC drivers to propagate the correct error code
from platform_get_irq() so that deferred probing still works"
* tag 'edac_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/xgene: Fix deferred probing
EDAC/altera: Fix deferred probing
Changbin Du [Thu, 27 Jan 2022 13:20:10 +0000 (21:20 +0800)]
perf ftrace: system_wide collection is not effective by default
The ftrace.target.system_wide must be set before invoking
evlist__create_maps(), otherwise it has no effect.
Fixes:
53be50282269b46c ("perf ftrace: Add 'latency' subcommand")
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Namhyung Kim <namhyung@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220127132010.4836-1-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rob Herring [Tue, 1 Feb 2022 21:40:56 +0000 (15:40 -0600)]
libperf: Add arm64 support to perf_mmap__read_self()
Add the arm64 variants for read_perf_counter() and read_timestamp().
Unfortunately the counter number is encoded into the instruction, so the
code is a bit verbose to enumerate all possible counters.
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/r/20220201214056.702854-1-robh@kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Arnaldo Carvalho de Melo [Wed, 12 Feb 2020 14:04:23 +0000 (11:04 -0300)]
tools include UAPI: Sync sound/asound.h copy with the kernel sources
Picking the changes from:
06feec6005c9d950 ("ASoC: hdmi-codec: Fix OOB memory accesses")
Which entails no changes in the tooling side as it doesn't introduce new
SNDRV_PCM_IOCTL_ ioctls.
To silence this perf tools build warning:
Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h
Cc: Dmitry Osipenko <digetx@gmail.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/lkml/Yf+6OT+2eMrYDEeX@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Sat, 5 Feb 2022 01:09:41 +0000 (17:09 -0800)]
perf stat: Fix display of grouped aliased events
An event may have a number of uncore aliases that when added to the
evlist are consecutive.
If there are multiple uncore events in a group then
parse_events__set_leader_for_uncore_aliase will reorder the evlist so
that events on the same PMU are adjacent.
The collect_all_aliases function assumes that aliases are in blocks so
that only the first counter is printed and all others are marked merged.
The reordering for groups breaks the assumption and so all counts are
printed.
This change removes the assumption from collect_all_aliases
that the events are in blocks and instead processes the entire evlist.
Before:
```
$ perf stat -e '{UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE,UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE},duration_time' -a -A -- sleep 1
Performance counter stats for 'system wide':
CPU0 256,866 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 494,413 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 967 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,738 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 285,161 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 429,920 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 955 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,443 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 310,753 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 416,657 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,231 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,573 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 416,067 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 405,966 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,481 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,447 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 312,911 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 408,154 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,086 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,380 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 333,994 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 370,349 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,287 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,335 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 188,107 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 302,423 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 701 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,070 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 307,221 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 383,642 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,036 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,158 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 318,479 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 821,545 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,028 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 2,550 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 227,618 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 372,272 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 903 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,456 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 376,783 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 419,827 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,406 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,453 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 286,583 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 429,956 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 999 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,436 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 313,867 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 370,159 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,114 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,291 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 342,083 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 409,111 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,399 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,684 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 365,828 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 376,037 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,378 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,411 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 382,456 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 621,743 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,232 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,955 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 342,316 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 385,067 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,176 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,268 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 373,588 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 386,163 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,394 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,464 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 381,206 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 546,891 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,266 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,712 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 221,176 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 392,069 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 831 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,456 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 355,401 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 705,595 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,235 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 2,216 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 371,436 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 428,103 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,306 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,442 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 384,352 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 504,200 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,468 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,860 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 228,856 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 287,976 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 832 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,060 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 215,121 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 334,162 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 681 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,026 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 296,179 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 436,083 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,084 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,525 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 262,296 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 416,573 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 986 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,533 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 285,852 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 359,842 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,073 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,326 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 303,379 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 367,222 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,008 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,156 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 273,487 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 425,449 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 932 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,367 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 297,596 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 414,793 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,140 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,601 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 342,365 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 360,422 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,291 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,342 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 327,196 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 580,858 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,122 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 2,014 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 296,564 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 452,817 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,087 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,694 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 375,002 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 389,393 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,478 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 1,540 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 365,213 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 594,685 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 1,401 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 2,222 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 1,000,749,060 ns duration_time
1.
000749060 seconds time elapsed
```
After:
```
Performance counter stats for 'system wide':
CPU0 20,547,434 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU36 45,202,862 UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD_REMOTE
CPU0 82,001 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU36 159,688 UNC_CHA_TOR_INSERTS.IA_MISS_DRD_REMOTE
CPU0 1,000,464,828 ns duration_time
1.
000464828 seconds time elapsed
```
Fixes:
3cdc5c2cb924acb4 ("perf parse-events: Handle uncore event aliases in small groups properly")
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Asaf Yaffe <asaf.yaffe@intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220205010941.1065469-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
German Gomez [Wed, 26 Jan 2022 10:59:26 +0000 (10:59 +0000)]
perf tools: Apply correct label to user/kernel symbols in branch mode
In branch mode, the branch symbols were being displayed with incorrect
cpumode labels. So fix this.
For example, before:
# perf record -b -a -- sleep 1
# perf report -b
Overhead Command Source Shared Object Source Symbol Target Symbol
0.08% swapper [kernel.kallsyms] [k] rcu_idle_enter [k] cpuidle_enter_state
==> 0.08% cmd0 [kernel.kallsyms] [.] psi_group_change [.] psi_group_change
0.08% cmd1 [kernel.kallsyms] [k] psi_group_change [k] psi_group_change
After:
# perf report -b
Overhead Command Source Shared Object Source Symbol Target Symbol
0.08% swapper [kernel.kallsyms] [k] rcu_idle_enter [k] cpuidle_enter_state
0.08% cmd0 [kernel.kallsyms] [k] psi_group_change [k] pei_group_change
0.08% cmd1 [kernel.kallsyms] [k] psi_group_change [k] psi_group_change
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220126105927.3411216-1-german.gomez@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masanari Iida [Sat, 25 Dec 2021 00:55:58 +0000 (09:55 +0900)]
perf bpf: Fix a typo in bpf_counter_cgroup.c
This patch fixes a spelling typo in error message.
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211225005558.503935-1-standby24x7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Leo Yan [Fri, 24 Dec 2021 12:40:13 +0000 (20:40 +0800)]
perf synthetic-events: Return error if procfs isn't mounted for PID namespaces
For perf recording, it retrieves process info by iterating nodes in proc
fs. If we run perf in a non-root PID namespace with command:
# unshare --fork --pid perf record -e cycles -a -- test_program
... in this case, unshare command creates a child PID namespace and
launches perf tool in it, but the issue is the proc fs is not mounted
for the non-root PID namespace, this leads to the perf tool gathering
process info from its parent PID namespace.
We can use below command to observe the process nodes under proc fs:
# unshare --pid --fork ls /proc
1 137 1968 2128 3 342 48 62 78 crypto kcore net uptime
10 138 2 2142 30 35 49 63 8 devices keys pagetypeinfo version
11 139 20 2143 304 36 50 64 82 device-tree key-users partitions vmallocinfo
12 14 2011 22 305 37 51 65 83 diskstats kmsg self vmstat
128 140 2038 23 307 39 52 656 84 driver kpagecgroup slabinfo zoneinfo
129 15 2074 24 309 4 53 67 9 execdomains kpagecount softirqs
13 16 2094 241 31 40 54 68 asound fb kpageflags stat
130 164 2096 242 310 41 55 69 buddyinfo filesystems loadavg swaps
131 17 2098 25 317 42 56 70 bus fs locks sys
132 175 21 26 32 43 57 71 cgroups interrupts meminfo sysrq-trigger
133 179 2102 263 329 44 58 75 cmdline iomem misc sysvipc
134 1875 2103 27 330 45 59 76 config.gz ioports modules thread-self
135 19 2117 29 333 46 6 77 consoles irq mounts timer_list
136 1941 2121 298 34 47 60 773 cpuinfo kallsyms mtd tty
So it shows many existed tasks, since unshared command has not mounted
the proc fs for the new created PID namespace, it still accesses the
proc fs of the root PID namespace. This leads to two prominent issues:
- Firstly, PID values are mismatched between thread info and samples.
The gathered thread info are coming from the proc fs of the root PID
namespace, but samples record its PID from the child PID namespace.
- The second issue is profiled program 'test_program' returns its forked
PID number from the child PID namespace, perf tool wrongly uses this
PID number to retrieve the process info via the proc fs of the root
PID namespace.
To avoid issues, we need to mount proc fs for the child PID namespace
with the option '--mount-proc' when use unshare command:
# unshare --fork --pid --mount-proc perf record -e cycles -a -- test_program
Conversely, when the proc fs of the root PID namespace is used by child
namespace, perf tool can detect the multiple PID levels and
nsinfo__is_in_root_namespace() returns false, this patch reports error
for this case:
# unshare --fork --pid perf record -e cycles -a -- test_program
Couldn't synthesize bpf events.
Perf runs in non-root PID namespace but it tries to gather process info from its parent PID namespace.
Please mount the proc file system properly, e.g. add the option '--mount-proc' for unshare command.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20211224124014.2492751-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ameer Hamza [Tue, 25 Jan 2022 12:11:41 +0000 (17:11 +0500)]
perf session: Check for NULL pointer before dereference
Move NULL pointer check before dereferencing the variable.
Addresses-Coverity: 1497622 ("Derereference before null check")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ameer Hamza <amhamza.mgc@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Link: https://lore.kernel.org/r/20220125121141.18347-1-amhamza.mgc@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Wed, 2 Feb 2022 07:08:25 +0000 (23:08 -0800)]
perf annotate: Set error stream of objdump process for TUI
The stderr should be set to a pipe when using TUI. Otherwise it'd
print to stdout and break TUI windows with an error message.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220202070828.143303-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Anshuman Khandual [Wed, 2 Feb 2022 10:57:23 +0000 (16:27 +0530)]
perf tools: Add missing branch_sample_type to perf_event_attr__fprintf()
This updates branch sample type with missing PERF_SAMPLE_BRANCH_TYPE_SAVE.
Suggested-by: James Clark <james.clark@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/1643799443-15109-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sun, 9 May 2021 12:39:02 +0000 (09:39 -0300)]
tools headers UAPI: Sync linux/kvm.h with the kernel sources
To pick the changes in:
f6c6804c43fa18d3 ("kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h")
That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.
This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/Yf+4k5Fs5Q3HdSG9@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo Carvalho de Melo [Sun, 6 Feb 2022 11:28:34 +0000 (08:28 -0300)]
Merge remote-tracking branch 'torvalds/master' into perf/urgent
To check if more kernel API sync is needed and also to see if the perf
build tests continue to pass.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Sat, 5 Feb 2022 18:40:17 +0000 (10:40 -0800)]
Merge tag 'for-linus-5.17a-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- documentation fixes related to Xen
- enable x2apic mode when available when running as hardware
virtualized guest under Xen
- cleanup and fix a corner case of vcpu enumeration when running a
paravirtualized Xen guest
* tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/Xen: streamline (and fix) PV CPU enumeration
xen: update missing ioctl magic numers documentation
Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF
xen: xenbus_dev.h: delete incorrect file name
xen/x2apic: enable x2apic mode when supported for HVM
Linus Torvalds [Sat, 5 Feb 2022 17:55:59 +0000 (09:55 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- A couple of fixes when handling an exception while a SError has
been delivered
- Workaround for Cortex-A510's single-step erratum
RISC-V:
- Make CY, TM, and IR counters accessible in VU mode
- Fix SBI implementation version
x86:
- Report deprecation of x87 features in supported CPUID
- Preparation for fixing an interrupt delivery race on AMD hardware
- Sparse fix
All except POWER and s390:
- Rework guest entry code to correctly mark noinstr areas and fix
vtime' accounting (for x86, this was already mostly correct but not
entirely; for ARM, MIPS and RISC-V it wasn't)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer
KVM: x86: Report deprecated x87 features in supported CPUID
KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
KVM: arm64: Avoid consuming a stale esr value when SError occur
RISC-V: KVM: Fix SBI implementation version
RISC-V: KVM: make CY, TM, and IR counters accessible in VU mode
kvm/riscv: rework guest entry logic
kvm/arm64: rework guest entry logic
kvm/x86: rework guest entry logic
kvm/mips: rework guest entry logic
kvm: add guest_state_{enter,exit}_irqoff()
KVM: x86: Move delivery of non-APICv interrupt into vendor code
kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h
Linus Torvalds [Sat, 5 Feb 2022 17:21:55 +0000 (09:21 -0800)]
Merge tag 'xfs-5.17-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"I was auditing operations in XFS that clear file privileges, and
realized that XFS' fallocate implementation drops suid/sgid but
doesn't clear file capabilities the same way that file writes and
reflink do.
There are VFS helpers that do it correctly, so refactor XFS to use
them. I also noticed that we weren't flushing the log at the correct
point in the fallocate operation, so that's fixed too.
Summary:
- Fix fallocate so that it drops all file privileges when files are
modified instead of open-coding that incompletely.
- Fix fallocate to flush the log if the caller wanted synchronous
file updates"
* tag 'xfs-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: ensure log flush at the end of a synchronous fallocate call
xfs: move xfs_update_prealloc_flags() to xfs_pnfs.c
xfs: set prealloc flag in xfs_alloc_file_space()
xfs: fallocate() should call file_modified()
xfs: remove XFS_PREALLOC_SYNC
xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*
Linus Torvalds [Sat, 5 Feb 2022 17:13:51 +0000 (09:13 -0800)]
Merge tag 'vfs-5.17-fixes-2' of git://git./fs/xfs/xfs-linux
Pull vfs fixes from Darrick Wong:
"I was auditing the sync_fs code paths recently and noticed that most
callers of ->sync_fs ignore its return value (and many implementations
never return nonzero even if the fs is broken!), which means that
internal fs errors and corruption are not passed up to userspace
callers of syncfs(2) or FIFREEZE. Hence fixing the common code and
XFS, and I'll start working on the ext4/btrfs folks if this is merged.
Summary:
- Fix a bug where callers of ->sync_fs (e.g. sync_filesystem and
syncfs(2)) ignore the return value.
- Fix a bug where callers of sync_filesystem (e.g. fs freeze) ignore
the return value.
- Fix a bug in XFS where xfs_fs_sync_fs never passed back error
returns"
* tag 'vfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: return errors in xfs_fs_sync_fs
quota: make dquot_quota_sync return errors from ->sync_fs
vfs: make sync_filesystem return errors from ->sync_fs
vfs: make freeze_super abort when sync_filesystem returns error
Linus Torvalds [Sat, 5 Feb 2022 17:04:43 +0000 (09:04 -0800)]
Merge tag 'iomap-5.17-fixes-1' of git://git./fs/xfs/xfs-linux
Pull iomap fix from Darrick Wong:
"A single bugfix for iomap.
The fix should eliminate occasional complaints about stall warnings
when a lot of writeback IO completes all at once and we have to then
go clearing status on a large number of folios.
Summary:
- Limit the length of ioend chains in writeback so that we don't trip
the softlockup watchdog and to limit long tail latency on clearing
PageWriteback"
* tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs, iomap: limit individual ioend chain lengths in writeback
Paolo Bonzini [Sat, 5 Feb 2022 05:58:25 +0000 (00:58 -0500)]
Merge tag 'kvmarm-fixes-5.17-2' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.17, take #2
- A couple of fixes when handling an exception while a SError has been
delivered
- Workaround for Cortex-A510's single-step[ erratum
Linus Torvalds [Sat, 5 Feb 2022 00:28:11 +0000 (16:28 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Some medium sized bugs in the various drivers. A couple are more
recent regressions:
- Fix two panics in hfi1 and two allocation problems
- Send the IGMP to the correct address in cma
- Squash a syzkaller bug related to races reading the multicast list
- Memory leak in siw and cm
- Fix a corner case spec compliance for HFI/QIB
- Correct the implementation of fences in siw
- Error unwind bug in mlx4"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx4: Don't continue event handler after memory allocation failure
RDMA/siw: Fix broken RDMA Read Fence/Resume logic.
IB/rdmavt: Validate remote_addr during loopback atomic tests
IB/cm: Release previously acquired reference counter in the cm_id_priv
RDMA/siw: Fix refcounting leak in siw_create_qp()
RDMA/ucma: Protect mc during concurrent multicast leaves
RDMA/cma: Use correct address when leaving multicast group
IB/hfi1: Fix tstats alloc and dealloc
IB/hfi1: Fix AIP early init panic
IB/hfi1: Fix alloc failure with larger txqueuelen
IB/hfi1: Fix panic with larger ipoib send_queue_size
Linus Torvalds [Fri, 4 Feb 2022 23:27:45 +0000 (15:27 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Seven fixes, six of which are fairly obvious driver fixes.
The one core change to the device budget depth is to try to ensure
that if the default depth is large (which can produce quite a sizeable
bitmap allocation per device), we give back the memory we don't need
if there's a queue size reduction in slave_configure (which happens to
a lot of devices)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internal
scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
scsi: pm8001: Fix use-after-free for aborted TMF sas_task
scsi: pm8001: Fix warning for undescribed param in process_one_iomb()
scsi: core: Reallocate device's budget map on queue depth change
scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe
scsi: pm80xx: Fix double completion for SATA devices
Linus Torvalds [Fri, 4 Feb 2022 23:22:35 +0000 (15:22 -0800)]
Merge tag 'pci-v5.17-fixes-3' of git://git./linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Restructure j721e_pcie_probe() so we don't dereference a NULL pointer
(Bjorn Helgaas)
- Add a kirin_pcie_data struct to identify different Kirin variants to
fix probe failure for controllers with an internal PHY (Bjorn
Helgaas)
* tag 'pci-v5.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: kirin: Add dev struct for of_device_get_match_data()
PCI: j721e: Initialize pcie->cdns_pcie before using it
Bjorn Helgaas [Wed, 2 Feb 2022 15:52:41 +0000 (09:52 -0600)]
PCI: kirin: Add dev struct for of_device_get_match_data()
Bean reported that
a622435fbe1a ("PCI: kirin: Prefer
of_device_get_match_data()") broke kirin_pcie_probe() because it assumed
match data of 0 was a failure when in fact, it meant the match data was
"(void *)PCIE_KIRIN_INTERNAL_PHY".
Therefore, probing of "hisilicon,kirin960-pcie" devices failed with -EINVAL
and an "OF data missing" message.
Add a struct kirin_pcie_data to encode the PHY type. Then the result of
of_device_get_match_data() should always be a non-NULL pointer to a struct
kirin_pcie_data that contains the PHY type.
Fixes:
a622435fbe1a ("PCI: kirin: Prefer of_device_get_match_data()")
Link: https://lore.kernel.org/r/20220202162659.GA12603@bhelgaas
Link: https://lore.kernel.org/r/20220201215941.1203155-1-huobean@gmail.com
Reported-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Linus Torvalds [Fri, 4 Feb 2022 20:14:58 +0000 (12:14 -0800)]
Merge tag 'for-5.17-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few fixes and error handling improvements:
- fix deadlock between quota disable and qgroup rescan worker
- fix use-after-free after failure to create a snapshot
- skip warning on unmount after log cleanup failure
- don't start transaction for scrub if the fs is mounted read-only
- tree checker verifies item sizes"
* tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: skip reserved bytes warning on unmount after log cleanup failure
btrfs: fix use of uninitialized variable at rm device ioctl
btrfs: fix use-after-free after failure to create a snapshot
btrfs: tree-checker: check item_size for dev_item
btrfs: tree-checker: check item_size for inode_item
btrfs: fix deadlock between quota disable and qgroup rescan worker
btrfs: don't start transaction for scrub if the fs is mounted read-only
Linus Torvalds [Fri, 4 Feb 2022 20:08:49 +0000 (12:08 -0800)]
Merge tag 'erofs-for-5.17-rc3-fixes' of git://git./linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Two fixes related to fsdax cleanup in this cycle and ztailpacking to
fix small compressed data inlining. There is also a trivial cleanup to
rearrange code for better reading.
Summary:
- fix fsdax partition offset misbehavior
- clean up z_erofs_decompressqueue_work() declaration
- fix up EOF lcluster inlining, especially for small compressed data"
* tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix small compressed files inlining
erofs: avoid unnecessary z_erofs_decompressqueue_work() declaration
erofs: fix fsdax partition offset handling
Linus Torvalds [Fri, 4 Feb 2022 20:01:57 +0000 (12:01 -0800)]
Merge tag 'block-5.17-2022-02-04' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request
- fix use-after-free in rdma and tcp controller reset (Sagi Grimberg)
- fix the state check in nvmf_ctlr_matches_baseopts (Uday Shankar)
- MD nowait null pointer fix (Song)
- blk-integrity seed advance fix (Martin)
- Fix a dio regression in this merge window (Ilya)
* tag 'block-5.17-2022-02-04' of git://git.kernel.dk/linux-block:
block: bio-integrity: Advance seed correctly for larger interval sizes
nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()
md: fix NULL pointer deref with nowait but no mddev->queue
block: fix DIO handling regressions in blkdev_read_iter()
nvme-rdma: fix possible use-after-free in transport error_recovery work
nvme-tcp: fix possible use-after-free in transport error_recovery work
nvme: fix a possible use-after-free in controller reset during load
Linus Torvalds [Fri, 4 Feb 2022 19:52:37 +0000 (11:52 -0800)]
Merge tag 'ata-5.17-rc3' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA fixes from Damien Le Moal:
- Sergey volunteered to be a reviewer for the Renesas R-Car SATA driver
and PATA drivers. Update the MAINTAINERS file accordingly.
- Regression fix: add a horkage flag to prevent accessing the log
directory log page with SATADOM-ML 3ME SATA devices as they react
badly to reading that log page (from Anton).
* tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage
MAINTAINERS: add myself as Renesas R-Car SATA driver reviewer
MAINTAINERS: add myself as PATA drivers reviewer
Linus Torvalds [Fri, 4 Feb 2022 19:45:16 +0000 (11:45 -0800)]
Merge tag 'iommu-fixes-v5.17-rc2' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Warning fixes and a fix for a potential use-after-free in IOMMU core
code
- Another potential memory leak fix for the Intel VT-d driver
- Fix for an IO polling loop timeout issue in the AMD IOMMU driver
* tag 'iommu-fixes-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
iommu: Fix some W=1 warnings
iommu: Fix potential use-after-free during probe
Linus Torvalds [Fri, 4 Feb 2022 19:38:01 +0000 (11:38 -0800)]
Merge tag 'random-5.17-rc3-for-linus' of git://git./linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
"For this week, we have:
- A fix to make more frequent use of hwgenerator randomness, from
Dominik.
- More cleanups to the boot initialization sequence, from Dominik.
- A fix for an old shortcoming with the ZAP ioctl, from me.
- A workaround for a still unfixed Clang CFI/FullLTO compiler bug,
from me. On one hand, it's a bummer to commit workarounds for
experimental compiler features that have bugs. But on the other, I
think this actually improves the code somewhat, independent of the
bug. So a win-win"
* tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: only call crng_finalize_init() for primary_crng
random: access primary_pool directly rather than through pointer
random: wake up /dev/random writers after zap
random: continually use hwgenerator randomness
lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
Linus Torvalds [Fri, 4 Feb 2022 19:32:46 +0000 (11:32 -0800)]
Merge tag 'acpi-5.17-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix compilation in the case when ACPI is selected and CRC32, depended
on by ACPI after recent changes, is not (Randy Dunlap)"
* tag 'acpi-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: require CRC32 to build
Linus Torvalds [Fri, 4 Feb 2022 19:24:28 +0000 (11:24 -0800)]
Merge tag 'sound-5.17-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes.
The major changes are ASoC core fixes, addressing the DPCM locking
issue after the recent code changes and the potentially invalid
register accesses via control API. Also, HD-audio got a core fix for
Oops at dynamic unbinding.
The rest are device-specific small fixes, including the usual stuff
like HD-audio and USB-audio quirks"
* tag 'sound-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (31 commits)
ALSA: hda: Skip codec shutdown in case the codec is not registered
ALSA: usb-audio: Correct quirk for VF0770
ALSA: Replace acpi_bus_get_device()
Input: wm97xx: Simplify resource management
ALSA: hda/realtek: Add quirk for ASUS GU603
ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows
ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset)
ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks
ALSA: hda: realtek: Fix race at concurrent COEF updates
ASoC: ops: Check for negative values before reading them
ASoC: rt5682: Fix deadlock on resume
ASoC: hdmi-codec: Fix OOB memory accesses
ASoC: soc-pcm: Move debugfs removal out of spinlock
ASoC: soc-pcm: Fix DPCM lockdep warning due to nested stream locks
ASoC: fsl: Add missing error handling in pcm030_fabric_probe
ALSA: hda: Fix signedness of sscanf() arguments
ALSA: usb-audio: initialize variables that could ignore errors
ALSA: hda: Fix UAF of leds class devs at unbinding
ASoC: qdsp6: q6apm-dai: only stop graphs that are started
ASoC: codecs: wcd938x: fix return value of mixer put function
...
Linus Torvalds [Fri, 4 Feb 2022 19:13:54 +0000 (11:13 -0800)]
Merge tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular fixes for the week. Daniel has agreed to bring back the fbcon
hw acceleration under a CONFIG option for the non-drm fbdev users, we
don't advise turning this on unless you are in the niche that is old
fbdev drivers, Since it's essentially a revert and shouldn't be high
impact seemed like a good time to do it now.
Otherwise, i915 and amdgpu fixes are most of it, along with some minor
fixes elsewhere.
fbdev:
- readd fbcon acceleration
i915:
- fix DP monitor via type-c dock
- fix for engine busyness and read timeout with GuC
- use ALLOW_FAIL for error capture buffer allocs
- don't use interruptible lock on error paths
- smatch fix to reject zero sized overlays.
amdgpu:
- mGPU fan boost fix for beige goby
- S0ix fixes
- Cyan skillfish hang fix
- DCN fixes for DCN 3.1
- DCN fixes for DCN 3.01
- Apple retina panel fix
- ttm logic inversion fix
dma-buf:
- heaps: fix potential spectre v1 gadget
kmb:
- fix potential oob access
mxsfb:
- fix NULL ptr deref
nouveau:
- fix potential oob access during BIOS decode"
* tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm: (24 commits)
drm: mxsfb: Fix NULL pointer dereference
drm/amdgpu: fix logic inversion in check
drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled
drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels
drm/amd/display: revert "Reset fifo after enable otg"
drm/amd/display: watermark latencies is not enough on DCN31
drm/amd/display: Update watermark values for DCN301
drm/amdgpu: fix a potential GPU hang on cyan skillfish
drm/amd: Only run s3 or s0ix if system is configured properly
drm/amd: add support to check whether the system is set to s3
fbcon: Add option to enable legacy hardware acceleration
Revert "fbcon: Disable accelerated scrolling"
Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)"
drm/i915/pmu: Fix KMD and GuC race on accessing busyness
dma-buf: heaps: Fix potential spectre v1 gadget
drm/amd: Warn users about potential s0ix problems
drm/amd/pm: correct the MGpuFanBoost support for Beige Goby
drm/nouveau: fix off by one in BIOS boundary checking
drm/i915/adlp: Fix TypeC PHY-ready status readout
drm/i915/pmu: Use PM timestamp instead of RING TIMESTAMP for reference
...
Linus Torvalds [Fri, 4 Feb 2022 18:34:19 +0000 (10:34 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"10 patches.
Subsystems affected by this patch series: ipc, MAINTAINERS, and mm
(vmscan, debug, pagemap, kmemleak, and selftests)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"
MAINTAINERS: update rppt's email
mm/kmemleak: avoid scanning potential huge holes
ipc/sem: do not sleep with a spin lock held
mm/pgtable: define pte_index so that preprocessor could recognize it
mm/page_table_check: check entries at pmd levels
mm/khugepaged: unify collapse pmd clear, flush and free
mm/page_table_check: use unsigned long for page counters and cleanup
mm/debug_vm_pgtable: remove pte entry from the page table
Revert "mm/page_isolation: unset migratetype directly for non Buddy page"
Dominik Brodowski [Sun, 30 Jan 2022 21:03:20 +0000 (22:03 +0100)]
random: only call crng_finalize_init() for primary_crng
crng_finalize_init() returns instantly if it is called for another pool
than primary_crng. The test whether crng_finalize_init() is still required
can be moved to the relevant caller in crng_reseed(), and
crng_need_final_init can be reset to false if crng_finalize_init() is
called with workqueues ready. Then, no previous callsite will call
crng_finalize_init() unless it is needed, and we can get rid of the
superfluous function parameter.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Dominik Brodowski [Sun, 30 Jan 2022 21:03:19 +0000 (22:03 +0100)]
random: access primary_pool directly rather than through pointer
Both crng_initialize_primary() and crng_init_try_arch_early() are
only called for the primary_pool. Accessing it directly instead of
through a function parameter simplifies the code.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Fri, 28 Jan 2022 22:44:03 +0000 (23:44 +0100)]
random: wake up /dev/random writers after zap
When account() is called, and the amount of entropy dips below
random_write_wakeup_bits, we wake up the random writers, so that they
can write some more in. However, the RNDZAPENTCNT/RNDCLEARPOOL ioctl
sets the entropy count to zero -- a potential reduction just like
account() -- but does not unblock writers. This commit adds the missing
logic to that ioctl to unblock waiting writers.
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Dominik Brodowski [Tue, 25 Jan 2022 20:14:57 +0000 (21:14 +0100)]
random: continually use hwgenerator randomness
The rngd kernel thread may sleep indefinitely if the entropy count is
kept above random_write_wakeup_bits by other entropy sources. To make
best use of multiple sources of randomness, mix entropy from hardware
RNGs into the pool at least once within CRNG_RESEED_INTERVAL.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Wed, 19 Jan 2022 13:35:06 +0000 (14:35 +0100)]
lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".
[ 0.000000][ T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[ 0.000000][ T0] Hardware name: MT6873 (DT)
[ 0.000000][ T0] Call trace:
[ 0.000000][ T0] dump_backtrace+0xfc/0x1dc
[ 0.000000][ T0] dump_stack_lvl+0xa8/0x11c
[ 0.000000][ T0] panic+0x194/0x464
[ 0.000000][ T0] __cfi_check_fail+0x54/0x58
[ 0.000000][ T0] __cfi_slowpath_diag+0x354/0x4b0
[ 0.000000][ T0] blake2s_update+0x14c/0x178
[ 0.000000][ T0] _extract_entropy+0xf4/0x29c
[ 0.000000][ T0] crng_initialize_primary+0x24/0x94
[ 0.000000][ T0] rand_initialize+0x2c/0x6c
[ 0.000000][ T0] start_kernel+0x2f8/0x65c
[ 0.000000][ T0] __primary_switched+0xc4/0x7be4
[ 0.000000][ T0] Rebooting in 5 seconds..
Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.
In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.
Fixes:
6048fdcc5f26 ("lib/crypto: blake2s: include as built-in")
Link: https://github.com/ClangBuiltLinux/linux/issues/1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Linus Torvalds [Fri, 4 Feb 2022 17:54:02 +0000 (09:54 -0800)]
Merge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A patch to make it possible to disable zero copy path in the messenger
to avoid checksum or authentication tag mismatches and ensuing session
resets in case the destination buffer isn't guaranteed to be stable"
* tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client:
libceph: optionally use bounce buffer on recv path in crc mode
libceph: make recv path in secure mode work the same as send path
Linus Torvalds [Fri, 4 Feb 2022 17:44:42 +0000 (09:44 -0800)]
Merge tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux
Pull 9p fix from Dominique Martinet:
"Fix 'cannot walk open fid' rule
The 9p 'walk' operation requires fid arguments to not originate from
an open or create call and we've missed that for a while as the
servers regularly running tests with don't enforce the check and no
active reviewer knew about the rule.
Both reporters confirmed reverting this patch fixes things for them
and looking at it further wasn't actually required... Will take more
time for follow up and enforcing the rule more thoroughly later"
* tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux:
Revert "fs/9p: search open fids first"
Linus Torvalds [Fri, 4 Feb 2022 17:34:37 +0000 (09:34 -0800)]
Merge tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"SMB3 client fixes including:
- multiple fscache related fixes, reenabling ability to read/write to
cached files for cifs.ko (that was temporarily disabled for cifs.ko
a few weeks ago due to the recent fscache changes)
- also includes a new fscache helper function ("query_occupancy")
used by above
- fix for multiuser mounts and NTLMSSP auth (workstation name) for
stable
- fix locking ordering problem in multichannel code
- trivial malformed comment fix"
* tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix workstation_name for multiuser mounts
Invalidate fscache cookie only when inode attributes are changed.
cifs: Fix the readahead conversion to manage the batch when reading from cache
cifs: Implement cache I/O by accessing the cache directly
netfs, cachefiles: Add a method to query presence of data in the cache
cifs: Transition from ->readpages() to ->readahead()
cifs: unlock chan_lock before calling cifs_put_tcp_session
Fix a warning about a malformed kernel doc comment in cifs
Shuah Khan [Fri, 4 Feb 2022 04:49:45 +0000 (20:49 -0800)]
kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"
With this change, userfaultfd fails to build with undefined reference
swap() error:
userfaultfd.c: In function `userfaultfd_stress':
userfaultfd.c:1530:17: warning: implicit declaration of function `swap'; did you mean `swab'? [-Wimplicit-function-declaration]
1530 | swap(area_src, area_dst);
| ^~~~
| swab
/usr/bin/ld: /tmp/ccDGOAdV.o: in function `userfaultfd_stress':
userfaultfd.c:(.text+0x549e): undefined reference to `swap'
/usr/bin/ld: userfaultfd.c:(.text+0x54bc): undefined reference to `swap'
collect2: error: ld returned 1 exit status
Revert the commit to fix the problem.
Link: https://lkml.kernel.org/r/20220202003340.87195-1-skhan@linuxfoundation.org
Fixes:
2c769ed7137a ("tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner")
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Fri, 4 Feb 2022 04:49:41 +0000 (20:49 -0800)]
MAINTAINERS: update rppt's email
Use my @kernel.org address
Link: https://lkml.kernel.org/r/20220203090324.3701774-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lang Yu [Fri, 4 Feb 2022 04:49:37 +0000 (20:49 -0800)]
mm/kmemleak: avoid scanning potential huge holes
When using devm_request_free_mem_region() and devm_memremap_pages() to
add ZONE_DEVICE memory, if requested free mem region's end pfn were
huge(e.g., 0x400000000), the node_end_pfn() will be also huge (see
move_pfn_range_to_zone()). Thus it creates a huge hole between
node_start_pfn() and node_end_pfn().
We found on some AMD APUs, amdkfd requested such a free mem region and
created a huge hole. In such a case, following code snippet was just
doing busy test_bit() looping on the huge hole.
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
if (!page)
continue;
...
}
So we got a soft lockup:
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
RIP: 0010:pfn_to_online_page+0x5/0xd0
Call Trace:
? kmemleak_scan+0x16a/0x440
kmemleak_write+0x306/0x3a0
? common_file_perm+0x72/0x170
full_proxy_write+0x5c/0x90
vfs_write+0xb9/0x260
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
I did some tests with the patch.
(1) amdgpu module unloaded
before the patch:
real 0m0.976s
user 0m0.000s
sys 0m0.968s
after the patch:
real 0m0.981s
user 0m0.000s
sys 0m0.973s
(2) amdgpu module loaded
before the patch:
real 0m35.365s
user 0m0.000s
sys 0m35.354s
after the patch:
real 0m1.049s
user 0m0.000s
sys 0m1.042s
Link: https://lkml.kernel.org/r/20211108140029.721144-1-lang.yu@amd.com
Signed-off-by: Lang Yu <lang.yu@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minghao Chi [Fri, 4 Feb 2022 04:49:33 +0000 (20:49 -0800)]
ipc/sem: do not sleep with a spin lock held
We can't call kvfree() with a spin lock held, so defer it.
Link: https://lkml.kernel.org/r/20211223031207.556189-1-chi.minghao@zte.com.cn
Fixes:
fc37a3b8b438 ("[PATCH] ipc sem: use kvmalloc for sem_undo allocation")
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Yang Guang <cgel.zte@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Fri, 4 Feb 2022 04:49:29 +0000 (20:49 -0800)]
mm/pgtable: define pte_index so that preprocessor could recognize it
Since commit
974b9b2c68f3 ("mm: consolidate pte_index() and
pte_offset_*() definitions") pte_index is a static inline and there is
no define for it that can be recognized by the preprocessor. As a
result, vm_insert_pages() uses slower loop over vm_insert_page() instead
of insert_pages() that amortizes the cost of spinlock operations when
inserting multiple pages.
Link: https://lkml.kernel.org/r/20220111145457.20748-1-rppt@kernel.org
Fixes:
974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Christian Dietrich <stettberger@dokucode.de>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pasha Tatashin [Fri, 4 Feb 2022 04:49:24 +0000 (20:49 -0800)]
mm/page_table_check: check entries at pmd levels
syzbot detected a case where the page table counters were not properly
updated.
syzkaller login: ------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:162!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3099 Comm: pasha Not tainted 5.16.0+ #48
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO4
RIP: 0010:__page_table_check_zero+0x159/0x1a0
Call Trace:
free_pcp_prepare+0x3be/0xaa0
free_unref_page+0x1c/0x650
free_compound_page+0xec/0x130
free_transhuge_page+0x1be/0x260
__put_compound_page+0x90/0xd0
release_pages+0x54c/0x1060
__pagevec_release+0x7c/0x110
shmem_undo_range+0x85e/0x1250
...
The repro involved having a huge page that is split due to uprobe event
temporarily replacing one of the pages in the huge page. Later the huge
page was combined again, but the counters were off, as the PTE level was
not properly updated.
Make sure that when PMD is cleared and prior to freeing the level the
PTEs are updated.
Link: https://lkml.kernel.org/r/20220131203249.2832273-5-pasha.tatashin@soleen.com
Fixes:
df4e817b7108 ("mm: page table check")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pasha Tatashin [Fri, 4 Feb 2022 04:49:20 +0000 (20:49 -0800)]
mm/khugepaged: unify collapse pmd clear, flush and free
Unify the code that flushes, clears pmd entry, and frees the PTE table
level into a new function collapse_and_free_pmd().
This cleanup is useful as in the next patch we will add another call to
this function to iterate through PTE prior to freeing the level for page
table check.
Link: https://lkml.kernel.org/r/20220131203249.2832273-4-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pasha Tatashin [Fri, 4 Feb 2022 04:49:15 +0000 (20:49 -0800)]
mm/page_table_check: use unsigned long for page counters and cleanup
For consistency, use "unsigned long" for all page counters.
Also, reduce code duplication by calling __page_table_check_*_clear()
from __page_table_check_*_set() functions.
Link: https://lkml.kernel.org/r/20220131203249.2832273-3-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Wei Xu <weixugc@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pasha Tatashin [Fri, 4 Feb 2022 04:49:10 +0000 (20:49 -0800)]
mm/debug_vm_pgtable: remove pte entry from the page table
Patch series "page table check fixes and cleanups", v5.
This patch (of 4):
The pte entry that is used in pte_advanced_tests() is never removed from
the page table at the end of the test.
The issue is detected by page_table_check, to repro compile kernel with
the following configs:
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_PAGE_TABLE_CHECK=y
CONFIG_PAGE_TABLE_CHECK_ENFORCED=y
During the boot the following BUG is printed:
debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture page table helpers
------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:162!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-11413-g2c271fe77d52 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
...
The entry should be properly removed from the page table before the page
is released to the free list.
Link: https://lkml.kernel.org/r/20220131203249.2832273-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20220131203249.2832273-2-pasha.tatashin@soleen.com
Fixes:
a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chen Wandun [Fri, 4 Feb 2022 04:49:06 +0000 (20:49 -0800)]
Revert "mm/page_isolation: unset migratetype directly for non Buddy page"
This reverts commit
721fb891ad0b3956d5c168b2931e3e5e4fb7ca40.
Commit
721fb891ad0b ("mm/page_isolation: unset migratetype directly for
non Buddy page") will result memory that should in buddy disappear by
mistake. move_freepages_block moves all pages in pageblock instead of
pages indicated by input parameter, so if input pages is not in buddy
but other pages in pageblock is in buddy, it will result in page out of
control.
Link: https://lkml.kernel.org/r/20220126024436.13921-1-chenwandun@huawei.com
Fixes:
721fb891ad0b ("mm/page_isolation: unset migratetype directly for non Buddy page")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Dong Aisheng <aisheng.dong@nxp.com>
Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joerg Roedel [Fri, 4 Feb 2022 11:55:37 +0000 (12:55 +0100)]
iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
The polling loop for the register change in iommu_ga_log_enable() needs
to have a udelay() in it. Otherwise the CPU might be faster than the
IOMMU hardware and wrongly trigger the WARN_ON() further down the code
stream. Use a 10us for udelay(), has there is some hardware where
activation of the GA log can take more than a 100ms.
A future optimization should move the activation check of the GA log
to the point where it gets used for the first time. But that is a
bigger change and not suitable for a fix.
Fixes:
8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20220204115537.3894-1-joro@8bytes.org
Thomas Gleixner [Mon, 31 Jan 2022 21:02:46 +0000 (22:02 +0100)]
PCI/MSI: Remove bogus warning in pci_irq_get_affinity()
The recent overhaul of pci_irq_get_affinity() introduced a regression when
pci_irq_get_affinity() is called for an MSI-X interrupt which was not
allocated with affinity descriptor information.
The original code just returned a NULL pointer in that case, but the rework
added a WARN_ON() under the assumption that the corresponding WARN_ON() in
the MSI case can be applied to MSI-X as well.
In fact the MSI warning in the original code does not make sense either
because it's legitimate to invoke pci_irq_get_affinity() for a MSI
interrupt which was not allocated with affinity descriptor information.
Remove it and just return NULL as the original code did.
Fixes:
f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87ee4n38sm.ffs@tglx