Marc Zyngier [Mon, 16 May 2022 16:48:20 +0000 (17:48 +0100)]
Merge branch kvm-arm64/psci-suspend into kvmarm-master/next
* kvm-arm64/psci-suspend:
: .
: Add support for PSCI SYSTEM_SUSPEND and allow userspace to
: filter the wake-up events.
:
: Patches courtesy of Oliver.
: .
Documentation: KVM: Fix title level for PSCI_SUSPEND
selftests: KVM: Test SYSTEM_SUSPEND PSCI call
selftests: KVM: Refactor psci_test to make it amenable to new tests
selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test
selftests: KVM: Create helper for making SMCCC calls
selftests: KVM: Rename psci_cpu_on_test to psci_test
KVM: arm64: Implement PSCI SYSTEM_SUSPEND
KVM: arm64: Add support for userspace to suspend a vCPU
KVM: arm64: Return a value from check_vcpu_requests()
KVM: arm64: Rename the KVM_REQ_SLEEP handler
KVM: arm64: Track vCPU power state using MP state values
KVM: arm64: Dedupe vCPU power off helpers
KVM: arm64: Don't depend on fallthrough to hide SYSTEM_RESET2
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 16 May 2022 16:47:03 +0000 (17:47 +0100)]
Merge branch kvm-arm64/hcall-selection into kvmarm-master/next
* kvm-arm64/hcall-selection:
: .
: Introduce a new set of virtual sysregs for userspace to
: select the hypercalls it wants to see exposed to the guest.
:
: Patches courtesy of Raghavendra and Oliver.
: .
KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run
KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace
Documentation: Fix index.rst after psci.rst renaming
selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list
selftests: KVM: aarch64: Introduce hypercall ABI test
selftests: KVM: Create helper for making SMCCC calls
selftests: KVM: Rename psci_cpu_on_test to psci_test
tools: Import ARM SMCCC definitions
Docs: KVM: Add doc for the bitmap firmware registers
Docs: KVM: Rename psci.rst to hypercalls.rst
KVM: arm64: Add vendor hypervisor firmware register
KVM: arm64: Add standard hypervisor firmware register
KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM: arm64: Factor out firmware register handling from psci.c
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Mon, 16 May 2022 16:32:54 +0000 (17:32 +0100)]
KVM: arm64: Fix hypercall bitmap writeback when vcpus have already run
We generally want to disallow hypercall bitmaps being changed
once vcpus have already run. But we must allow the write if
the written value is unchanged so that userspace can rewrite
the register file on reboot, for example.
Without this, a QEMU-based VM will fail to reboot correctly.
The original code was correct, and it is me that introduced
the regression.
Fixes:
05714cab7d63 ("KVM: arm64: Setup a framework for hypercall bitmap firmware registers")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Sun, 15 May 2022 10:36:24 +0000 (11:36 +0100)]
KVM: arm64: Hide KVM_REG_ARM_*_BMAP_BIT_COUNT from userspace
These constants will change over time, and userspace has no
business knowing about them. Hide them behind __KERNEL__.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Stephen Rothwell [Thu, 5 May 2022 10:06:34 +0000 (20:06 +1000)]
Documentation: KVM: Fix title level for PSCI_SUSPEND
The htmldoc build breaks in a funny way with:
<quote>
Sphinx parallel build error:
docutils.utils.SystemMessage: /home/sfr/next/next/Documentation/virt/kvm/api.rst:6175: (SEVERE/4) Title level inconsistent:
For arm/arm64:
^^^^^^^^^^^^^^
</quote>
Swap the ^^s for a bunch of --s...
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[maz: commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 4 May 2022 12:13:55 +0000 (13:13 +0100)]
Documentation: Fix index.rst after psci.rst renaming
Fix the TOC in index.rst after psci.rst has been renamed to
hypercalls.rst.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/r/20220504205627.18f46380@canb.auug.org.au
Marc Zyngier [Wed, 4 May 2022 08:42:45 +0000 (09:42 +0100)]
Merge branch kvm-arm64/aarch32-idreg-trap into kvmarm-master/next
* kvm-arm64/aarch32-idreg-trap:
: .
: Add trapping/sanitising infrastructure for AArch32 systen registers,
: allowing more control over what we actually expose (such as the PMU).
:
: Patches courtesy of Oliver and Alexandru.
: .
KVM: arm64: Fix new instances of 32bit ESRs
KVM: arm64: Hide AArch32 PMU registers when not available
KVM: arm64: Start trapping ID registers for 32 bit guests
KVM: arm64: Plumb cp10 ID traps through the AArch64 sysreg handler
KVM: arm64: Wire up CP15 feature registers to their AArch64 equivalents
KVM: arm64: Don't write to Rt unless sys_reg emulation succeeds
KVM: arm64: Return a bool from emulate_cp()
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 4 May 2022 08:42:37 +0000 (09:42 +0100)]
Merge branch kvm-arm64/hyp-stack-guard into kvmarm-master/next
* kvm-arm64/hyp-stack-guard:
: .
: Harden the EL2 stack by providing stack guards, courtesy of
: Kalesh Singh.
: .
KVM: arm64: Symbolize the nVHE HYP addresses
KVM: arm64: Detect and handle hypervisor stack overflows
KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
KVM: arm64: Introduce pkvm_alloc_private_va_range()
KVM: arm64: Introduce hyp_alloc_private_va_range()
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 4 May 2022 08:42:16 +0000 (09:42 +0100)]
Merge branch kvm-arm64/wfxt into kvmarm-master/next
* kvm-arm64/wfxt:
: .
: Add support for the WFET/WFIT instructions that provide the same
: service as WFE/WFI, only with a timeout.
: .
KVM: arm64: Expose the WFXT feature to guests
KVM: arm64: Offer early resume for non-blocking WFxT instructions
KVM: arm64: Handle blocking WFIT instruction
KVM: arm64: Introduce kvm_counter_compute_delta() helper
KVM: arm64: Simplify kvm_cpu_has_pending_timer()
arm64: Use WFxT for __delay() when possible
arm64: Add wfet()/wfit() helpers
arm64: Add HWCAP advertising FEAT_WFXT
arm64: Add RV and RN fields for ESR_ELx_WFx_ISS
arm64: Expand ESR_ELx_WFx_ISS_TI to match its ARMv8.7 definition
Signed-off-by: Marc Zyngier <maz@kernel.org>
Marc Zyngier [Wed, 4 May 2022 08:38:32 +0000 (09:38 +0100)]
Merge remote-tracking branch 'arm64/for-next/sme' into kvmarm-master/next
Merge arm64's SME branch to resolve conflicts with the WFxT branch.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Oliver Upton [Wed, 4 May 2022 03:24:46 +0000 (03:24 +0000)]
selftests: KVM: Test SYSTEM_SUSPEND PSCI call
Assert that the vCPU exits to userspace with KVM_SYSTEM_EVENT_SUSPEND if
the guest calls PSCI SYSTEM_SUSPEND. Additionally, guarantee that the
SMC32 and SMC64 flavors of this call are discoverable with the
PSCI_FEATURES call.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-13-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:45 +0000 (03:24 +0000)]
selftests: KVM: Refactor psci_test to make it amenable to new tests
Split up the current test into several helpers that will be useful to
subsequent test cases added to the PSCI test suite.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-12-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:44 +0000 (03:24 +0000)]
selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test
Setting a vCPU's MP state to KVM_MP_STATE_STOPPED has the effect of
powering off the vCPU. Rather than using the vCPU init feature flag, use
the KVM_SET_MP_STATE ioctl to power off the target vCPU.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-11-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:43 +0000 (03:24 +0000)]
selftests: KVM: Create helper for making SMCCC calls
The PSCI and PV stolen time tests both need to make SMCCC calls within
the guest. Create a helper for making SMCCC calls and rework the
existing tests to use the library function.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-10-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:42 +0000 (03:24 +0000)]
selftests: KVM: Rename psci_cpu_on_test to psci_test
There are other interactions with PSCI worth testing; rename the PSCI
test to make it more generic.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-9-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:41 +0000 (03:24 +0000)]
KVM: arm64: Implement PSCI SYSTEM_SUSPEND
ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
software to request that a system be placed in the deepest possible
low-power state. Effectively, software can use this to suspend itself to
RAM.
Unfortunately, there really is no good way to implement a system-wide
PSCI call in KVM. Any precondition checks done in the kernel will need
to be repeated by userspace since there is no good way to protect a
critical section that spans an exit to userspace. SYSTEM_RESET and
SYSTEM_OFF are equally plagued by this issue, although no users have
seemingly cared for the relatively long time these calls have been
supported.
The solution is to just make the whole implementation userspace's
problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that
indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND.
Additionally, add a CAP to get buy-in from userspace for this new exit
type.
Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in.
If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide
explicit documentation of userspace's responsibilites for the exit and
point to the PSCI specification to describe the actual PSCI call.
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:40 +0000 (03:24 +0000)]
KVM: arm64: Add support for userspace to suspend a vCPU
Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
is in a suspended state. In the suspended state the vCPU will block
until a wakeup event (pending interrupt) is recognized.
Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
userspace that KVM has recognized one such wakeup event. It is the
responsibility of userspace to then make the vCPU runnable, or leave it
suspended until the next wakeup event.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:39 +0000 (03:24 +0000)]
KVM: arm64: Return a value from check_vcpu_requests()
A subsequent change to KVM will introduce a vCPU request that could
result in an exit to userspace. Change check_vcpu_requests() to return a
value and document the function. Unconditionally return 1 for now.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-6-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:38 +0000 (03:24 +0000)]
KVM: arm64: Rename the KVM_REQ_SLEEP handler
The naming of the kvm_req_sleep function is confusing: the function
itself sleeps the vCPU, it does not request such an event. Rename the
function to make its purpose more clear.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-5-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:37 +0000 (03:24 +0000)]
KVM: arm64: Track vCPU power state using MP state values
A subsequent change to KVM will add support for additional power states.
Store the MP state by value rather than keeping track of it as a
boolean.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-4-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:36 +0000 (03:24 +0000)]
KVM: arm64: Dedupe vCPU power off helpers
vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the
former and replace all callsites to the latter.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-3-oupton@google.com
Oliver Upton [Wed, 4 May 2022 03:24:35 +0000 (03:24 +0000)]
KVM: arm64: Don't depend on fallthrough to hide SYSTEM_RESET2
Depending on a fallthrough to the default case for hiding SYSTEM_RESET2
requires that any new case statements clean up the failure path for this
PSCI call.
Unhitch SYSTEM_RESET2 from the default case by setting val to
PSCI_RET_NOT_SUPPORTED outside of the switch statement. Apply the
cleanup to both the PSCI_1_1_FN_SYSTEM_RESET2 and
PSCI_1_0_FN_PSCI_FEATURES handlers.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-2-oupton@google.com
Marc Zyngier [Wed, 4 May 2022 07:01:05 +0000 (08:01 +0100)]
KVM: arm64: Fix new instances of 32bit ESRs
Fix the new instances of ESR being described as a u32, now that
we consistently are using a u64 for this register.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Raghavendra Rao Ananta [Mon, 2 May 2022 23:38:53 +0000 (23:38 +0000)]
selftests: KVM: aarch64: Add the bitmap firmware registers to get-reg-list
Add the psuedo-firmware registers KVM_REG_ARM_STD_BMAP,
KVM_REG_ARM_STD_HYP_BMAP, and KVM_REG_ARM_VENDOR_HYP_BMAP to
the base_regs[] list.
Also, add the COPROC support for KVM_REG_ARM_FW_FEAT_BMAP.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-10-rananta@google.com
Raghavendra Rao Ananta [Mon, 2 May 2022 23:38:52 +0000 (23:38 +0000)]
selftests: KVM: aarch64: Introduce hypercall ABI test
Introduce a KVM selftest to check the hypercall interface
for arm64 platforms. The test validates the user-space'
[GET|SET]_ONE_REG interface to read/write the psuedo-firmware
registers as well as its effects on the guest upon certain
configurations.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-9-rananta@google.com
Oliver Upton [Sat, 9 Apr 2022 18:45:46 +0000 (18:45 +0000)]
selftests: KVM: Create helper for making SMCCC calls
The PSCI and PV stolen time tests both need to make SMCCC calls within
the guest. Create a helper for making SMCCC calls and rework the
existing tests to use the library function.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220409184549.1681189-11-oupton@google.com
Oliver Upton [Sat, 9 Apr 2022 18:45:45 +0000 (18:45 +0000)]
selftests: KVM: Rename psci_cpu_on_test to psci_test
There are other interactions with PSCI worth testing; rename the PSCI
test to make it more generic.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220409184549.1681189-10-oupton@google.com
Raghavendra Rao Ananta [Mon, 2 May 2022 23:38:51 +0000 (23:38 +0000)]
tools: Import ARM SMCCC definitions
Import the standard SMCCC definitions from include/linux/arm-smccc.h.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-8-rananta@google.com
Raghavendra Rao Ananta [Mon, 2 May 2022 23:38:50 +0000 (23:38 +0000)]
Docs: KVM: Add doc for the bitmap firmware registers
Add the documentation for the bitmap firmware registers in
hypercalls.rst and api.rst. This includes the details for
KVM_REG_ARM_STD_BMAP, KVM_REG_ARM_STD_HYP_BMAP, and
KVM_REG_ARM_VENDOR_HYP_BMAP registers.
Since the document is growing to carry other hypercall related
information, make necessary adjustments to present the document
in a generic sense, rather than being PSCI focused.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: small scale reformat, move things about, random typo fixes]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-7-rananta@google.com
Raghavendra Rao Ananta [Mon, 2 May 2022 23:38:49 +0000 (23:38 +0000)]
Docs: KVM: Rename psci.rst to hypercalls.rst
Since the doc also covers general hypercalls' details,
rather than just PSCI, and the fact that the bitmap firmware
registers' details will be added to this doc, rename the file
to a more appropriate name- hypercalls.rst.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-6-rananta@google.com
Raghavendra Rao Ananta [Mon, 2 May 2022 23:38:48 +0000 (23:38 +0000)]
KVM: arm64: Add vendor hypervisor firmware register
Introduce the firmware register to hold the vendor specific
hypervisor service calls (owner value 6) as a bitmap. The
bitmap represents the features that'll be enabled for the
guest, as configured by the user-space. Currently, this
includes support for KVM-vendor features along with
reading the UID, represented by bit-0, and Precision Time
Protocol (PTP), represented by bit-1.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: tidy-up bitmap values]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-5-rananta@google.com
Raghavendra Rao Ananta [Mon, 2 May 2022 23:38:47 +0000 (23:38 +0000)]
KVM: arm64: Add standard hypervisor firmware register
Introduce the firmware register to hold the standard hypervisor
service calls (owner value 5) as a bitmap. The bitmap represents
the features that'll be enabled for the guest, as configured by
the user-space. Currently, this includes support only for
Paravirtualized time, represented by bit-0.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: tidy-up bitmap values]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-4-rananta@google.com
Raghavendra Rao Ananta [Mon, 2 May 2022 23:38:46 +0000 (23:38 +0000)]
KVM: arm64: Setup a framework for hypercall bitmap firmware registers
KVM regularly introduces new hypercall services to the guests without
any consent from the userspace. This means, the guests can observe
hypercall services in and out as they migrate across various host
kernel versions. This could be a major problem if the guest
discovered a hypercall, started using it, and after getting migrated
to an older kernel realizes that it's no longer available. Depending
on how the guest handles the change, there's a potential chance that
the guest would just panic.
As a result, there's a need for the userspace to elect the services
that it wishes the guest to discover. It can elect these services
based on the kernels spread across its (migration) fleet. To remedy
this, extend the existing firmware pseudo-registers, such as
KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space
for all the hypercall services available.
These firmware registers are categorized based on the service call
owners, but unlike the existing firmware pseudo-registers, they hold
the features supported in the form of a bitmap.
During the VM initialization, the registers are set to upper-limit of
the features supported by the corresponding registers. It's expected
that the VMMs discover the features provided by each register via
GET_ONE_REG, and write back the desired values using SET_ONE_REG.
KVM allows this modification only until the VM has started.
Some of the standard features are not mapped to any bits of the
registers. But since they can recreate the original problem of
making it available without userspace's consent, they need to
be explicitly added to the case-list in
kvm_hvc_call_default_allowed(). Any function-id that's not enabled
via the bitmap, or not listed in kvm_hvc_call_default_allowed, will
be returned as SMCCC_RET_NOT_SUPPORTED to the guest.
Older userspace code can simply ignore the feature and the
hypercall services will be exposed unconditionally to the guests,
thus ensuring backward compatibility.
In this patch, the framework adds the register only for ARM's standard
secure services (owner value 4). Currently, this includes support only
for ARM True Random Number Generator (TRNG) service, with bit-0 of the
register representing mandatory features of v1.0. Other services are
momentarily added in the upcoming patches.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: reduced the scope of some helpers, tidy-up bitmap max values,
dropped error-only fast path]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
Raghavendra Rao Ananta [Mon, 2 May 2022 23:38:45 +0000 (23:38 +0000)]
KVM: arm64: Factor out firmware register handling from psci.c
Common hypercall firmware register handing is currently employed
by psci.c. Since the upcoming patches add more of these registers,
it's better to move the generic handling to hypercall.c for a
cleaner presentation.
While we are at it, collect all the firmware registers under
fw_reg_ids[] to help implement kvm_arm_get_fw_num_regs() and
kvm_arm_copy_fw_reg_indices() in a generic way. Also, define
KVM_REG_FEATURE_LEVEL_MASK using a GENMASK instead.
No functional change intended.
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
[maz: fixed KVM_REG_FEATURE_LEVEL_MASK]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220502233853.1233742-2-rananta@google.com
Alexandru Elisei [Tue, 3 May 2022 06:02:04 +0000 (06:02 +0000)]
KVM: arm64: Hide AArch32 PMU registers when not available
commit
11663111cd49 ("KVM: arm64: Hide PMU registers from userspace when
not available") hid the AArch64 PMU registers from userspace and guest
when the PMU VCPU feature was not set. Do the same when the PMU
registers are accessed by an AArch32 guest. While we're at it, rename
the previously unused AA32_ZEROHIGH to AA32_DIRECT to match the behavior
of get_access_mask().
Now that KVM emulates ID_DFR0 and hides the PMU from the guest when the
feature is not set, it is safe to inject to inject an undefined exception
when the PMU is not present, as that corresponds to the architected
behaviour.
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
[Oliver - Add AA32_DIRECT to match the zero value of the enum]
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220503060205.2823727-7-oupton@google.com
Oliver Upton [Tue, 3 May 2022 06:02:03 +0000 (06:02 +0000)]
KVM: arm64: Start trapping ID registers for 32 bit guests
To date KVM has not trapped ID register accesses from AArch32, meaning
that guests get an unconstrained view of what hardware supports. This
can be a serious problem because we try to base the guest's feature
registers on values that are safe system-wide. Furthermore, KVM does not
implement the latest ISA in the PMU and Debug architecture, so we
constrain these fields to supported values.
Since KVM now correctly handles CP15 and CP10 register traps, we no
longer need to clear HCR_EL2.TID3 for 32 bit guests and will instead
emulate reads with their safe values.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220503060205.2823727-6-oupton@google.com
Oliver Upton [Tue, 3 May 2022 06:02:02 +0000 (06:02 +0000)]
KVM: arm64: Plumb cp10 ID traps through the AArch64 sysreg handler
In order to enable HCR_EL2.TID3 for AArch32 guests KVM needs to handle
traps where ESR_EL2.EC=0x8, which corresponds to an attempted VMRS
access from an ID group register. Specifically, the MVFR{0-2} registers
are accessed this way from AArch32. Conveniently, these registers are
architecturally mapped to MVFR{0-2}_EL1 in AArch64. Furthermore, KVM
already handles reads to these aliases in AArch64.
Plumb VMRS read traps through to the general AArch64 system register
handler.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220503060205.2823727-5-oupton@google.com
Oliver Upton [Tue, 3 May 2022 06:02:01 +0000 (06:02 +0000)]
KVM: arm64: Wire up CP15 feature registers to their AArch64 equivalents
KVM currently does not trap ID register accesses from an AArch32 EL1.
This is painful for a couple of reasons. Certain unimplemented features
are visible to AArch32 EL1, as we limit PMU to version 3 and the debug
architecture to v8.0. Additionally, we attempt to paper over
heterogeneous systems by using register values that are safe
system-wide. All this hard work is completely sidestepped because KVM
does not set TID3 for AArch32 guests.
Fix up handling of CP15 feature registers by simply rerouting to their
AArch64 aliases. Punt setting HCR_EL2.TID3 to a later change, as we need
to fix up the oddball CP10 feature registers still.
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220503060205.2823727-4-oupton@google.com
Oliver Upton [Tue, 3 May 2022 06:02:00 +0000 (06:02 +0000)]
KVM: arm64: Don't write to Rt unless sys_reg emulation succeeds
emulate_sys_reg() returns 1 unconditionally, even though a a system
register access can fail. Furthermore, kvm_handle_sys_reg() writes to Rt
for every register read, regardless of if it actually succeeded.
Though this pattern is safe (as params.regval is initialized with the
current value of Rt) it is a bit ugly. Indicate failure if the register
access could not be emulated and only write to Rt on success.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220503060205.2823727-3-oupton@google.com
Oliver Upton [Tue, 3 May 2022 06:01:59 +0000 (06:01 +0000)]
KVM: arm64: Return a bool from emulate_cp()
KVM indicates success/failure in several ways, but generally an integer
is used when conditionally bouncing to userspace is involved. That is
not the case from emulate_cp(); just use a bool instead.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220503060205.2823727-2-oupton@google.com
Linus Torvalds [Sun, 1 May 2022 20:57:58 +0000 (13:57 -0700)]
Linux 5.18-rc5
Linus Torvalds [Sun, 1 May 2022 18:49:32 +0000 (11:49 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Take care of faults occuring between the PARange and IPA range by
injecting an exception
- Fix S2 faults taken from a host EL0 in protected mode
- Work around Oops caused by a PMU access from a 32bit guest when PMU
has been created. This is a temporary bodge until we fix it for
good.
x86:
- Fix potential races when walking host page table
- Fix shadow page table leak when KVM runs nested
- Work around bug in userspace when KVM synthesizes leaf 0x80000021
on older (pre-EPYC) or Intel processors
Generic (but affects only RISC-V):
- Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: work around QEMU issue with synthetic CPUID leaves
Revert "x86/mm: Introduce lookup_address_in_mm()"
KVM: x86/mmu: fix potential races when walking host page table
KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR
KVM: arm64: Inject exception on out-of-IPA-range translation fault
KVM/arm64: Don't emulate a PMU for 32-bit guests if feature not set
KVM: arm64: Handle host stage-2 faults from 32-bit EL0
Linus Torvalds [Sun, 1 May 2022 17:03:36 +0000 (10:03 -0700)]
Merge tag 'x86_urgent_for_v5.18_rc5' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- A fix to disable PCI/MSI[-X] masking for XEN_HVM guests as that is
solely controlled by the hypervisor
- A build fix to make the function prototype (__warn()) as visible as
the definition itself
- A bunch of objtool annotation fixes which have accumulated over time
- An ORC unwinder fix to handle bad input gracefully
- Well, we thought the microcode gets loaded in time in order to
restore the microcode-emulated MSRs but we thought wrong. So there's
a fix for that to have the ordering done properly
- Add new Intel model numbers
- A spelling fix
* tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
bug: Have __warn() prototype defined unconditionally
x86/Kconfig: fix the spelling of 'becoming' in X86_KERNEL_IBT config
objtool: Use offstr() to print address of missing ENDBR
objtool: Print data address for "!ENDBR" data warnings
x86/xen: Add ANNOTATE_NOENDBR to startup_xen()
x86/uaccess: Add ENDBR to __put_user_nocheck*()
x86/retpoline: Add ANNOTATE_NOENDBR for retpolines
x86/static_call: Add ANNOTATE_NOENDBR to static call trampoline
objtool: Enable unreachable warnings for CLANG LTO
x86,objtool: Explicitly mark idtentry_body()s tail REACHABLE
x86,objtool: Mark cpu_startup_entry() __noreturn
x86,xen,objtool: Add UNWIND hint
lib/strn*,objtool: Enforce user_access_begin() rules
MAINTAINERS: Add x86 unwinding entry
x86/unwind/orc: Recheck address range after stack info was updated
x86/cpu: Load microcode during restore_processor_state()
x86/cpu: Add new Alderlake and Raptorlake CPU model numbers
Linus Torvalds [Sun, 1 May 2022 16:34:54 +0000 (09:34 -0700)]
Merge tag 'objtool_urgent_for_v5.18_rc5' of git://git./linux/kernel/git/tip/tip
Pull objtool fixes from Borislav Petkov:
"A bunch of objtool fixes to improve unwinding, sibling call detection,
fallthrough detection and relocation handling of weak symbols when the
toolchain strips section symbols"
* tag 'objtool_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix code relocs vs weak symbols
objtool: Fix type of reloc::addend
objtool: Fix function fallthrough detection for vmlinux
objtool: Fix sibling call detection in alternatives
objtool: Don't set 'jump_dest' for sibling calls
x86/uaccess: Don't jump between functions
Linus Torvalds [Sun, 1 May 2022 16:30:47 +0000 (09:30 -0700)]
Merge tag 'irq_urgent_for_v5.18_rc5' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
- Fix locking when accessing device MSI descriptors
* tag 'irq_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bus: fsl-mc-msi: Fix MSI descriptor mutex lock for msi_first_desc()
Linus Torvalds [Sat, 30 Apr 2022 17:24:21 +0000 (10:24 -0700)]
Merge tag 'driver-core-5.18-rc5' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some small driver core and kernfs fixes for some reported
problems. They include:
- kernfs regression that is causing oopses in 5.17 and newer releases
- topology sysfs fixes for a few small reported problems.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: fix NULL dereferencing in kernfs_remove
topology: Fix up build warning in topology_is_visible()
arch_topology: Do not set llc_sibling if llc_id is invalid
topology: make core_mask include at least cluster_siblings
topology/sysfs: Hide PPIN on systems that do not support it.
Linus Torvalds [Sat, 30 Apr 2022 17:15:57 +0000 (10:15 -0700)]
Merge tag 'char-misc-5.18-rc5' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a small number of char/misc/other driver fixes for 5.18-rc5
Nothing major in here, this is mostly IIO driver fixes along with some
other small things:
- at25 driver fix for systems without a dma-able stack
- phy driver fixes for reported issues
- binder driver fixes for reported issues
All of these have been in linux-next without any reported problems"
* tag 'char-misc-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (31 commits)
eeprom: at25: Use DMA safe buffers
binder: Gracefully handle BINDER_TYPE_FDA objects with num_fds=0
binder: Address corner cases in deferred copy and fixup
phy: amlogic: fix error path in phy_g12a_usb3_pcie_probe()
iio: imu: inv_icm42600: Fix I2C init possible nack
iio: dac: ltc2688: fix voltage scale read
interconnect: qcom: sdx55: Drop IP0 interconnects
interconnect: qcom: sc7180: Drop IP0 interconnects
phy: ti: Add missing pm_runtime_disable() in serdes_am654_probe
phy: mapphone-mdm6600: Fix PM error handling in phy_mdm6600_probe
phy: ti: omap-usb2: Fix error handling in omap_usb2_enable_clocks
bus: mhi: host: pci_generic: Flush recovery worker during freeze
bus: mhi: host: pci_generic: Add missing poweroff() PM callback
phy: ti: tusb1210: Fix an error handling path in tusb1210_probe()
phy: samsung: exynos5250-sata: fix missing device put in probe error paths
phy: samsung: Fix missing of_node_put() in exynos_sata_phy_probe
phy: ti: Fix missing of_node_put in ti_pipe3_get_sysctrl()
phy: ti: tusb1210: Make tusb1210_chg_det_states static
iio:dac:ad3552r: Fix an IS_ERR() vs NULL check
iio: sx9324: Fix default precharge internal resistance register
...
Linus Torvalds [Sat, 30 Apr 2022 17:09:14 +0000 (10:09 -0700)]
Merge tag 'tty-5.18-rc5' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small serial driver fixes, and a larger number of GSM
line discipline fixes for 5.18-rc5.
These include:
- lots of tiny n_gsm fixes for issues to resolve a number of reported
problems. Seems that people are starting to actually use this code
again.
- 8250 driver fixes for some devices
- imx serial driver fix
- amba-pl011 driver fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (27 commits)
tty: n_gsm: fix sometimes uninitialized warning in gsm_dlci_modem_output()
serial: 8250: Correct the clock for EndRun PTP/1588 PCIe device
serial: 8250: Also set sticky MCR bits in console restoration
tty: n_gsm: fix software flow control handling
tty: n_gsm: fix invalid use of MSC in advanced option
tty: n_gsm: fix broken virtual tty handling
Revert "serial: sc16is7xx: Clear RS485 bits in the shutdown"
tty: n_gsm: fix missing update of modem controls after DLCI open
serial: 8250: Fix runtime PM for start_tx() for empty buffer
serial: imx: fix overrun interrupts in DMA mode
serial: amba-pl011: do not time out prematurely when draining tx fifo
tty: n_gsm: fix incorrect UA handling
tty: n_gsm: fix reset fifo race condition
tty: n_gsm: fix missing tty wakeup in convergence layer type 2
tty: n_gsm: fix wrong signal octets encoding in MSC
tty: n_gsm: fix wrong command frame length field encoding
tty: n_gsm: fix wrong command retry handling
tty: n_gsm: fix missing explicit ldisc flush
tty: n_gsm: fix wrong DLCI release order
tty: n_gsm: fix insufficient txframe size
...
Linus Torvalds [Sat, 30 Apr 2022 16:58:46 +0000 (09:58 -0700)]
Merge tag 'usb-5.18-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of small USB driver fixes for 5.18-rc5 for some
reported issues and new quirks. They include:
- dwc3 driver fixes
- xhci driver fixes
- typec driver fixes
- new usb-serial driver ids
- added new USB devices to existing quirk tables
- other tiny fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (31 commits)
usb: phy: generic: Get the vbus supply
usb: dwc3: gadget: Return proper request status
usb: dwc3: pci: add support for the Intel Meteor Lake-P
usb: dwc3: core: Only handle soft-reset in DCTL
usb: gadget: configfs: clear deactivation flag in configfs_composite_unbind()
usb: misc: eud: Fix an error handling path in eud_probe()
usb: core: Don't hold the device lock while sleeping in do_proc_control()
usb: dwc3: Try usb-role-switch first in dwc3_drd_init
usb: dwc3: core: Fix tx/rx threshold settings
usb: mtu3: fix USB 3.0 dual-role-switch from device to host
xhci: Enable runtime PM on second Alderlake controller
usb: dwc3: fix backwards compat with rockchip devices
dt-bindings: usb: samsung,exynos-usb2: add missing required reg
usb: misc: fix improper handling of refcount in uss720_probe()
USB: Fix ehci infinite suspend-resume loop issue in zhaoxin
usb: typec: tcpm: Fix undefined behavior due to shift overflowing the constant
usb: typec: rt1719: Fix build error without CONFIG_POWER_SUPPLY
usb: typec: ucsi: Fix role swapping
usb: typec: ucsi: Fix reuse of completion structure
usb: xhci: tegra:Fix PM usage reference leak of tegra_xusb_unpowergate_partitions
...
Linus Torvalds [Sat, 30 Apr 2022 16:47:59 +0000 (09:47 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"One fix for an endless error loop with the target driver affecting
tapes"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: pscsi: Set SCF_TREAT_READ_AS_NORMAL flag only if there is valid data
Linus Torvalds [Fri, 29 Apr 2022 22:51:05 +0000 (15:51 -0700)]
Merge tag 'soc-fixes-5.18-3' of git://git./linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
- A fix for a regression caused by the previous set of bugfixes
changing tegra and at91 pinctrl properties.
More work is needed to figure out what this should actually be, but a
revert makes it work for the moment.
- Defconfig regression fixes for tegra after renamed symbols
- Build-time warning and static checker fixes for imx, op-tee, sunxi,
meson, at91, and omap
- More at91 DT fixes for audio, regulator and spi nodes
- A regression fix for Renesas Hyperflash memory probe
- A stability fix for amlogic boards, modifying the allowed cpufreq
states
- Multiple fixes for system suspend on omap2+
- DT fixes for various i.MX bugs
- A probe error fix for imx6ull-colibri MMC
- A MAINTAINERS file entry for samsung bug reports
* tag 'soc-fixes-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
Revert "arm: dts: at91: Fix boolean properties with values"
bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()
Revert "arm64: dts: tegra: Fix boolean properties with values"
arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
ARM: dts: imx6ull-colibri: fix vqmmc regulator
MAINTAINERS: add Bug entry for Samsung and memory controller drivers
memory: renesas-rpc-if: Fix HF/OSPI data transfer in Manual Mode
ARM: dts: logicpd-som-lv: Fix wrong pinmuxing on OMAP35
ARM: dts: am3517-evm: Fix misc pinmuxing
ARM: dts: am33xx-l4: Add missing touchscreen clock properties
ARM: dts: Fix mmc order for omap3-gta04
ARM: dts: at91: fix pinctrl phandles
ARM: dts: at91: sama5d4_xplained: fix pinctrl phandle name
ARM: dts: at91: Describe regulators on at91sam9g20ek
ARM: dts: at91: Map MCLK for wm8731 on at91sam9g20ek
ARM: dts: at91: Fix boolean properties with values
ARM: dts: at91: use generic node name for dataflash
ARM: dts: at91: align SPI NOR node name with dtschema
ARM: dts: at91: sama7g5ek: Align the impedance of the QSPI0's HSIO and PCB lines
ARM: dts: at91: sama7g5ek: enable pull-up on flexcom3 console lines
...
Linus Torvalds [Fri, 29 Apr 2022 22:38:23 +0000 (15:38 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git./linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A semi-large pile of clk driver fixes this time around.
Nothing is touching the core so these fixes are fairly well contained
to specific devices that use these clk drivers.
- Some Allwinner SoC fixes to gracefully handle errors and mark an
RTC clk as critical so that the RTC keeps ticking.
- Fix AXI bus clks and RTC clk design for Microchip PolarFire SoC
driver introduced this cycle. This has some devicetree bits acked
by riscv maintainers. We're fixing it now so that the prior
bindings aren't released in a major kernel version.
- Remove a reset on Microchip PolarFire SoCs that broke when enabling
CONFIG_PM.
- Set a min/max for the Qualcomm graphics clk. This got broken by the
clk rate range patches introduced this cycle"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi: sun9i-mmc: check return value after calling platform_get_resource()
clk: sunxi-ng: sun6i-rtc: Mark rtc-32k as critical
riscv: dts: microchip: reparent mpfs clocks
clk: microchip: mpfs: add RTCREF clock control
clk: microchip: mpfs: re-parent the configurable clocks
dt-bindings: rtc: add refclk to mpfs-rtc
dt-bindings: clk: mpfs: add defines for two new clocks
dt-bindings: clk: mpfs document msspll dri registers
riscv: dts: microchip: fix usage of fic clocks on mpfs
clk: microchip: mpfs: mark CLK_ATHENA as critical
clk: microchip: mpfs: fix parents for FIC clocks
clk: qcom: clk-rcg2: fix gfx3d frequency calculation
clk: microchip: mpfs: don't reset disabled peripherals
clk: sunxi-ng: fix not NULL terminated coccicheck error
Linus Torvalds [Fri, 29 Apr 2022 22:28:42 +0000 (15:28 -0700)]
Merge tag 'block-5.18-2022-04-29' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Revert of a patch that caused timestamp issues (Tejun)
- iocost warning fix (Tejun)
- bfq warning fix (Jan)
* tag 'block-5.18-2022-04-29' of git://git.kernel.dk/linux-block:
bfq: Fix warning in bfqq_request_over_limit()
Revert "block: inherit request start time from bio for BLK_CGROUP"
iocost: don't reset the inuse weight of under-weighted debtors
Linus Torvalds [Fri, 29 Apr 2022 21:51:57 +0000 (14:51 -0700)]
Merge tag 'io_uring-5.18-2022-04-29' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Pretty boring:
- three patches just adding reserved field checks (me, Eugene)
- Fixing a potential regression with IOPOLL caused by a block change
(Joseph)"
Boring is good.
* tag 'io_uring-5.18-2022-04-29' of git://git.kernel.dk/linux-block:
io_uring: check that data field is 0 in ringfd unregister
io_uring: fix uninitialized field in rw io_kiocb
io_uring: check reserved fields for recv/recvmsg
io_uring: check reserved fields for send/sendmsg
Linus Torvalds [Fri, 29 Apr 2022 21:47:17 +0000 (14:47 -0700)]
Merge tag 'random-5.18-rc5-for-linus' of git://git./linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
- Eric noticed that the memmove() in crng_fast_key_erasure() was bogus,
so this has been changed to a memcpy() and the confusing situation
clarified with a detailed comment.
- [Half]SipHash documentation updates from Bagas and Eric, after Eric
pointed out that the use of HalfSipHash in random.c made a bit of the
text potentially misleading.
* tag 'random-5.18-rc5-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
Documentation: siphash: disambiguate HalfSipHash algorithm from hsiphash functions
Documentation: siphash: enclose HalfSipHash usage example in the literal block
Documentation: siphash: convert danger note to warning for HalfSipHash
random: document crng_fast_key_erasure() destination possibility
Linus Torvalds [Fri, 29 Apr 2022 21:37:35 +0000 (14:37 -0700)]
Merge tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client
Pull ceph client fixes from Ilya Dryomov:
"A fix for a NULL dereference that turns out to be easily triggerable
by fsync (marked for stable) and a false positive WARN and snap_rwsem
locking fixups"
* tag 'ceph-for-5.18-rc5' of https://github.com/ceph/ceph-client:
ceph: fix possible NULL pointer dereference for req->r_session
ceph: remove incorrect session state check
ceph: get snap_rwsem read lock in handle_cap_export for ceph_add_cap
libceph: disambiguate cluster/pool full log message
Arnd Bergmann [Fri, 29 Apr 2022 21:09:49 +0000 (23:09 +0200)]
Revert "arm: dts: at91: Fix boolean properties with values"
This reverts commit
0dc23d1a8e17, which caused another regression
as the pinctrl code actually expects an integer value of 0 or 1
rather than a simple boolean property.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Paolo Bonzini [Fri, 29 Apr 2022 18:43:04 +0000 (14:43 -0400)]
KVM: x86: work around QEMU issue with synthetic CPUID leaves
Synthesizing AMD leaves up to 0x80000021 caused problems with QEMU,
which assumes the *host* CPUID[0x80000000].EAX is higher or equal
to what KVM_GET_SUPPORTED_CPUID reports.
This causes QEMU to issue bogus host CPUIDs when preparing the input
to KVM_SET_CPUID2. It can even get into an infinite loop, which is
only terminated by an abort():
cpuid_data is full, no space for cpuid(eax:0x8000001d,ecx:0x3e)
To work around this, only synthesize those leaves if 0x8000001d exists
on the host. The synthetic 0x80000021 leaf is mostly useful on Zen2,
which satisfies the condition.
Fixes:
f144c49e8c39 ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful")
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Fri, 29 Apr 2022 18:34:07 +0000 (11:34 -0700)]
Merge tag 'perf-tools-fixes-for-v5.18-2022-04-29' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix Intel PT (Processor Trace) timeless decoding with perf.data
directory.
- ARM SPE (Statistical Profiling Extensions) address fixes, for
synthesized events and for SPE events with physical addresses. Add a
simple 'perf test' entry to make sure this doesn't regress.
- Remove arch specific processing of kallsyms data to fixup symbol end
address, fixing excessive memory consumption in the annotation code.
* tag 'perf-tools-fixes-for-v5.18-2022-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf symbol: Remove arch__symbols__fixup_end()
perf symbol: Update symbols__fixup_end()
perf symbol: Pass is_kallsyms to symbols__fixup_end()
perf test: Add perf_event_attr test for Arm SPE
perf arm-spe: Fix SPE events with phys addresses
perf arm-spe: Fix addresses of synthesized SPE events
perf intel-pt: Fix timeless decoding with perf.data directory
Linus Torvalds [Fri, 29 Apr 2022 17:44:58 +0000 (10:44 -0700)]
Merge tag 'riscv-for-linus-5.18-rc5' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to properly ensure a single CPU is running during patch_text().
- A defconfig update to include RPMSG_CTRL when RPMSG_CHAR was set,
necessary after a recent refactoring.
* tag 'riscv-for-linus-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
RISC-V: configs: Configs that had RPMSG_CHAR now get RPMSG_CTRL
riscv: patch_text: Fixup last cpu should be master
Linus Torvalds [Fri, 29 Apr 2022 17:36:47 +0000 (10:36 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"Rename and reallocate the PT_ARM_MEMTAG_MTE ELF segment type.
This is a fix to the MTE ELF ABI for a bug that was added during the
most recent merge window as part of the coredump support.
The issue is that the value assigned to the new PT_ARM_MEMTAG_MTE
segment type has already been allocated to PT_AARCH64_UNWIND by the
ELF ABI, so we've bumped the value and changed the name of the
identifier to be better aligned with the existing one"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
elf: Fix the arm64 MTE ELF segment name and value
Sean Christopherson [Fri, 29 Apr 2022 14:57:53 +0000 (07:57 -0700)]
Revert "x86/mm: Introduce lookup_address_in_mm()"
Drop lookup_address_in_mm() now that KVM is providing it's own variant
of lookup_address_in_pgd() that is safe for use with user addresses, e.g.
guards against page tables being torn down. A variant that provides a
non-init mm is inherently dangerous and flawed, as the only reason to use
an mm other than init_mm is to walk a userspace mapping, and
lookup_address_in_pgd() does not play nice with userspace mappings, e.g.
doesn't disable IRQs to block TLB shootdowns and doesn't use READ_ONCE()
to ensure an upper level entry isn't converted to a huge page between
checking the PAGE_SIZE bit and grabbing the address of the next level
down.
This reverts commit
13c72c060f1ba6f4eddd7b1c4f52a8aded43d6d9.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <YmwIi3bXr/1yhYV/@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 29 Apr 2022 10:38:56 +0000 (06:38 -0400)]
Merge branch 'kvm-fixes-for-5.18-rc5' into HEAD
Fixes for (relatively) old bugs, to be merged in both the -rc and next
development trees:
* Fix potential races when walking host page table
* Fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
* Fix shadow page table leak when KVM runs nested
Mingwei Zhang [Fri, 29 Apr 2022 03:17:57 +0000 (03:17 +0000)]
KVM: x86/mmu: fix potential races when walking host page table
KVM uses lookup_address_in_mm() to detect the hugepage size that the host
uses to map a pfn. The function suffers from several issues:
- no usage of READ_ONCE(*). This allows multiple dereference of the same
page table entry. The TOCTOU problem because of that may cause KVM to
incorrectly treat a newly generated leaf entry as a nonleaf one, and
dereference the content by using its pfn value.
- the information returned does not match what KVM needs; for non-present
entries it returns the level at which the walk was terminated, as long
as the entry is not 'none'. KVM needs level information of only 'present'
entries, otherwise it may regard a non-present PXE entry as a present
large page mapping.
- the function is not safe for mappings that can be torn down, because it
does not disable IRQs and because it returns a PTE pointer which is never
safe to dereference after the function returns.
So implement the logic for walking host page tables directly in KVM, and
stop using lookup_address_in_mm().
Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <
20220429031757.2042406-1-mizhang@google.com>
[Inline in host_pfn_mapping_level, ensure no semantic change for its
callers. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 22 Apr 2022 10:30:13 +0000 (12:30 +0200)]
KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
member that at the time was unused. Unfortunately this extensibility
mechanism has several issues:
- x86 is not writing the member, so it would not be possible to use it
on x86 except for new events
- the member is not aligned to 64 bits, so the definition of the
uAPI struct is incorrect for 32- on 64-bit userspace. This is a
problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
usage of flags was only introduced in 5.18.
Since padding has to be introduced, place a new field in there
that tells if the flags field is valid. To allow further extensibility,
in fact, change flags to an array of 16 values, and store how many
of the values are valid. The availability of the new ndata field
is tied to a system capability; all architectures are changed to
fill in the field.
To avoid breaking compilation of userspace that was using the flags
field, provide a userspace-only union to overlap flags with data[0].
The new field is placed at the same offset for both 32- and 64-bit
userspace.
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <
20220422103013.34832-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sean Christopherson [Thu, 28 Apr 2022 23:34:16 +0000 (23:34 +0000)]
KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR
Disallow memslots and MMIO SPTEs whose gpa range would exceed the host's
MAXPHYADDR, i.e. don't create SPTEs for gfns that exceed host.MAXPHYADDR.
The TDP MMU bounds its zapping based on host.MAXPHYADDR, and so if the
guest, possibly with help from userspace, manages to coerce KVM into
creating a SPTE for an "impossible" gfn, KVM will leak the associated
shadow pages (page tables):
WARNING: CPU: 10 PID: 1122 at arch/x86/kvm/mmu/tdp_mmu.c:57
kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
Modules linked in: kvm_intel kvm irqbypass
CPU: 10 PID: 1122 Comm: set_memory_regi Tainted: G W 5.18.0-rc1+ #293
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_mmu_uninit_tdp_mmu+0x4b/0x60 [kvm]
Call Trace:
<TASK>
kvm_arch_destroy_vm+0x130/0x1b0 [kvm]
kvm_destroy_vm+0x162/0x2d0 [kvm]
kvm_vm_release+0x1d/0x30 [kvm]
__fput+0x82/0x240
task_work_run+0x5b/0x90
exit_to_user_mode_prepare+0xd2/0xe0
syscall_exit_to_user_mode+0x1d/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
On bare metal, encountering an impossible gpa in the page fault path is
well and truly impossible, barring CPU bugs, as the CPU will signal #PF
during the gva=>gpa translation (or a similar failure when stuffing a
physical address into e.g. the VMCS/VMCB). But if KVM is running as a VM
itself, the MAXPHYADDR enumerated to KVM may not be the actual MAXPHYADDR
of the underlying hardware, in which case the hardware will not fault on
the illegal-from-KVM's-perspective gpa.
Alternatively, KVM could continue allowing the dodgy behavior and simply
zap the max possible range. But, for hosts with MAXPHYADDR < 52, that's
a (minor) waste of cycles, and more importantly, KVM can't reasonably
support impossible memslots when running on bare metal (or with an
accurate MAXPHYADDR as a VM). Note, limiting the overhead by checking if
KVM is running as a guest is not a safe option as the host isn't required
to announce itself to the guest in any way, e.g. doesn't need to set the
HYPERVISOR CPUID bit.
A second alternative to disallowing the memslot behavior would be to
disallow creating a VM with guest.MAXPHYADDR > host.MAXPHYADDR. That
restriction is undesirable as there are legitimate use cases for doing
so, e.g. using the highest host.MAXPHYADDR out of a pool of heterogeneous
systems so that VMs can be migrated between hosts with different
MAXPHYADDRs without running afoul of the allow_smaller_maxphyaddr mess.
Note that any guest.MAXPHYADDR is valid with shadow paging, and it is
even useful in order to test KVM with MAXPHYADDR=52 (i.e. without
any reserved physical address bits).
The now common kvm_mmu_max_gfn() is inclusive instead of exclusive.
The memslot and TDP MMU code want an exclusive value, but the name
implies the returned value is inclusive, and the MMIO path needs an
inclusive check.
Fixes:
faaf05b00aec ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes:
524a1e4e381f ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20220428233416.2446833-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 29 Apr 2022 16:32:14 +0000 (12:32 -0400)]
Merge tag 'kvmarm-fixes-5.18-2' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.18, take #2
- Take care of faults occuring between the PARange and
IPA range by injecting an exception
- Fix S2 faults taken from a host EL0 in protected mode
- Work around Oops caused by a PMU access from a 32bit
guest when PMU has been created. This is a temporary
bodge until we fix it for good.
Wan Jiabing [Tue, 26 Apr 2022 11:30:53 +0000 (19:30 +0800)]
arm64/sme: Fix NULL check after kzalloc
Fix following coccicheck error:
./arch/arm64/kernel/process.c:322:2-23: alloc with no test, possible model on line 326
Here should be dst->thread.sve_state.
Fixes:
8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviwed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220426113054.630983-1-wanjiabing@vivo.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Arnd Bergmann [Fri, 29 Apr 2022 14:41:21 +0000 (16:41 +0200)]
Merge tag 'tegra-for-5.18-arm-defconfig-fixes' of git://git./linux/kernel/git/tegra/linux into arm/fixes
ARM: tegra: Default configuration fixes for v5.18
This contains two updates to the default configuration needed because of
a Kconfig symbol name change. This fixes a failure that was detected in
the NVIDIA automated test farm.
* tag 'tegra-for-5.18-arm-defconfig-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux:
ARM: config: multi v7: Enable NVIDIA Tegra video decoder driver
ARM: tegra_defconfig: Update CONFIG_TEGRA_VDE option
Link: https://lore.kernel.org/r/20220429080626.494150-1-thierry.reding@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Eugene Syromiatnikov [Fri, 29 Apr 2022 14:22:18 +0000 (16:22 +0200)]
io_uring: check that data field is 0 in ringfd unregister
Only allow data field to be 0 in struct io_uring_rsrc_update user
arguments to allow for future possible usage.
Fixes:
e7a6c00dc77a ("io_uring: add support for registering ring file descriptors")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
Link: https://lore.kernel.org/r/20220429142218.GA28696@asgard.redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Arnd Bergmann [Fri, 29 Apr 2022 14:23:30 +0000 (16:23 +0200)]
Merge tag 'imx-fixes-5.18-2' of git://git./linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 5.18, 2nd round:
- Fix one sparse warning on imx-weim driver.
- Fix vqmmc regulator to get UHS-I mode work on imx6ull-colibri board.
- Add missing 32.768 kHz PMIC clock for imx8mn-ddr4-evk board to fix
bd718xx-clk probe error.
* tag 'imx-fixes-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
arm64: dts: imx8mn-ddr4-evk: Describe the 32.768 kHz PMIC clock
ARM: dts: imx6ull-colibri: fix vqmmc regulator
bus: imx-weim: make symbol 'weim_of_notifier' static
Link: https://lore.kernel.org/r/20220426013427.GB14615@dragon
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Fri, 29 Apr 2022 14:23:00 +0000 (16:23 +0200)]
Merge tag 'sunxi-fixes-for-5.18-1' of git://git./linux/kernel/git/sunxi/linux into arm/fixes
Fix return value in RSB bus driver
* tag 'sunxi-fixes-for-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
bus: sunxi-rsb: Fix the return value of sunxi_rsb_device_create()
Link: https://lore.kernel.org/r/Ymbkd+/dDmRJz66w@kista.localdomain
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Jan Kara [Thu, 7 Apr 2022 14:07:38 +0000 (16:07 +0200)]
bfq: Fix warning in bfqq_request_over_limit()
People are occasionally reporting a warning bfqq_request_over_limit()
triggering reporting that BFQ's idea of cgroup hierarchy (and its depth)
does not match what generic blkcg code thinks. This can actually happen
when bfqq gets moved between BFQ groups while bfqq_request_over_limit()
is running. Make sure the code is safe against BFQ queue being moved to
a different BFQ group.
Fixes:
76f1df88bbc2 ("bfq: Limit number of requests consumed by each cgroup")
CC: stable@vger.kernel.org
Link: https://lore.kernel.org/all/CAJCQCtTw_2C7ZSz7as5Gvq=OmnDiio=HRkQekqWpKot84sQhFA@mail.gmail.com/
Reported-by: Chris Murphy <lists@colorremedies.com>
Reported-by: "yukuai (C)" <yukuai3@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220407140738.9723-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Thomas Gleixner [Thu, 28 Apr 2022 13:50:54 +0000 (15:50 +0200)]
x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then
PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to
XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI
layer.
This can lead to a situation where the PCI/MSI layer masks an MSI[-X]
interrupt and the hypervisor grants the write despite the fact that it
already requested the interrupt. As a consequence interrupt delivery on the
affected device is not happening ever.
Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests
already.
Fixes:
809f9267bbab ("xen: map MSIs into pirqs")
Reported-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
Reported-by: Dusty Mabe <dustymabe@redhat.com>
Reported-by: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Noah Meyerhans <noahm@debian.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglx
Linus Torvalds [Fri, 29 Apr 2022 01:00:34 +0000 (18:00 -0700)]
Merge tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Another relatively quiet week, amdgpu leads the way, some i915 display
fixes, and a single sunxi fix.
amdgpu:
- Runtime pm fix
- DCN memory leak fix in error path
- SI DPM deadlock fix
- S0ix fix
amdkfd:
- GWS fix
- GWS support for CRIU
i915:
- Fix #5284: Backlight control regression on XMG Core 15 e21
- Fix black display plane on Acer One AO532h
- Two smaller display fixes
sunxi:
- Single fix removing applying PHYS_OFFSET twice"
* tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: keep mmhub clock gating being enabled during s2idle suspend
drm/amd/pm: fix the deadlock issue observed on SI
drm/amd/display: Fix memory leak in dcn21_clock_source_create
drm/amdgpu: don't runtime suspend if there are displays attached (v3)
drm/amdkfd: CRIU add support for GWS queues
drm/amdkfd: Fix GWS queue count
drm/sun4i: Remove obsolete references to PHYS_OFFSET
drm/i915/fbc: Consult hw.crtc instead of uapi.crtc
drm/i915: Fix SEL_FETCH_PLANE_*(PIPE_B+) register addresses
drm/i915: Check EDID for HDR static metadata when choosing blc
drm/i915: Fix DISP_POS_Y and DISP_HEIGHT defines
Dave Airlie [Fri, 29 Apr 2022 00:27:04 +0000 (10:27 +1000)]
Merge tag 'amd-drm-fixes-5.18-2022-04-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.18-2022-04-27:
amdgpu:
- Runtime pm fix
- DCN memory leak fix in error path
- SI DPM deadlock fix
- S0ix fix
amdkfd:
- GWS fix
- GWS support for CRIU
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220428023232.5794-1-alexander.deucher@amd.com
Dave Airlie [Fri, 29 Apr 2022 00:17:46 +0000 (10:17 +1000)]
Merge tag 'drm-intel-fixes-2022-04-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix #5284: Backlight control regression on XMG Core 15 e21
- Fix black display plane on Acer One AO532h
- Two smaller display fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Ymotel5VfZUrJahf@jlahtine-mobl.ger.corp.intel.com
Dave Airlie [Fri, 29 Apr 2022 00:02:04 +0000 (10:02 +1000)]
Merge tag 'drm-misc-fixes-2022-04-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.18-rc5:
- Single fix removing applying PHYS_OFFSET twice in sunxi.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f692bb62-5620-1868-91b7-dffb8d6f9175@linux.intel.com
Kalesh Singh [Wed, 20 Apr 2022 21:42:57 +0000 (14:42 -0700)]
KVM: arm64: Symbolize the nVHE HYP addresses
Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.
With the necessary symbols now in kallsyms we can symbolize nVHE
addresses using the %p print format specifier:
[ 98.916444][ T426] kvm [426]: nVHE hyp panic at: [<
ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-7-kaleshsingh@google.com
Kalesh Singh [Wed, 20 Apr 2022 21:42:56 +0000 (14:42 -0700)]
KVM: arm64: Detect and handle hypervisor stack overflows
The hypervisor stacks (for both nVHE Hyp mode and nVHE protected mode)
are aligned such that any valid stack address has PAGE_SHIFT bit as 1.
This allows us to conveniently check for overflow in the exception entry
without corrupting any GPRs. We won't recover from a stack overflow so
panic the hypervisor.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-6-kaleshsingh@google.com
Kalesh Singh [Wed, 20 Apr 2022 21:42:55 +0000 (14:42 -0700)]
KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
Map the stack pages in the flexible private VA range and allocate
guard pages below the stack as unbacked VA space. The stack is aligned
so that any valid stack address has PAGE_SHIFT bit as 1 - this is used
for overflow detection (implemented in a subsequent patch in the series)
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-5-kaleshsingh@google.com
Kalesh Singh [Wed, 20 Apr 2022 21:42:54 +0000 (14:42 -0700)]
KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
Map the stack pages in the flexible private VA range and allocate
guard pages below the stack as unbacked VA space. The stack is aligned
so that any valid stack address has PAGE_SHIFT bit as 1 - this is used
for overflow detection (implemented in a subsequent patch in the series).
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-4-kaleshsingh@google.com
Kalesh Singh [Wed, 20 Apr 2022 21:42:53 +0000 (14:42 -0700)]
KVM: arm64: Introduce pkvm_alloc_private_va_range()
pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the pKVM nVHE hypervisor. Allocations are aligned based on the order of
the requested size.
This will be used to implement stack guard pages for pKVM nVHE hypervisor
(in a subsequent patch in the series).
Credits to Quentin Perret <qperret@google.com> for the idea of moving
private VA allocation out of __pkvm_create_private_mapping()
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-3-kaleshsingh@google.com
Kalesh Singh [Wed, 20 Apr 2022 21:42:52 +0000 (14:42 -0700)]
KVM: arm64: Introduce hyp_alloc_private_va_range()
hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the nVHE hypervisor. Allocations are aligned based on the order of
the requested size.
This will be used to implement stack guard pages for KVM nVHE hypervisor
(nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Tested-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220420214317.3303360-2-kaleshsingh@google.com
Linus Torvalds [Thu, 28 Apr 2022 19:34:50 +0000 (12:34 -0700)]
Merge tag 'net-5.18-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, bpf and netfilter.
Current release - new code bugs:
- bridge: switchdev: check br_vlan_group() return value
- use this_cpu_inc() to increment net->core_stats, fix preempt-rt
Previous releases - regressions:
- eth: stmmac: fix write to sgmii_adapter_base
Previous releases - always broken:
- netfilter: nf_conntrack_tcp: re-init for syn packets only,
resolving issues with TCP fastopen
- tcp: md5: fix incorrect tcp_header_len for incoming connections
- tcp: fix F-RTO may not work correctly when receiving DSACK
- tcp: ensure use of most recently sent skb when filling rate samples
- tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
- virtio_net: fix wrong buf address calculation when using xdp
- xsk: fix forwarding when combining copy mode with busy poll
- xsk: fix possible crash when multiple sockets are created
- bpf: lwt: fix crash when using bpf_skb_set_tunnel_key() from
bpf_xmit lwt hook
- sctp: null-check asoc strreset_chunk in sctp_generate_reconf_event
- wireguard: device: check for metadata_dst with skb_valid_dst()
- netfilter: update ip6_route_me_harder to consider L3 domain
- gre: make o_seqno start from 0 in native mode
- gre: switch o_seqno to atomic to prevent races in collect_md mode
Misc:
- add Eric Dumazet to networking maintainers
- dt: dsa: realtek: remove realtek,rtl8367s string
- netfilter: flowtable: Remove the empty file"
* tag 'net-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
tcp: fix F-RTO may not work correctly when receiving DSACK
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
ixgbe: ensure IPsec VF<->PF compatibility
MAINTAINERS: Update BNXT entry with firmware files
netfilter: nft_socket: only do sk lookups when indev is available
net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
bnx2x: fix napi API usage sequence
tls: Skip tls_append_frag on zero copy size
Add Eric Dumazet to networking maintainers
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
net: dsa: lantiq_gswip: Don't set GSWIP_MII_CFG_RMII_CLK
net: Use this_cpu_inc() to increment net->core_stats
Bluetooth: hci_sync: Cleanup hci_conn if it cannot be aborted
Bluetooth: hci_event: Fix creating hci_conn object on error status
Bluetooth: hci_event: Fix checking for invalid handle on error status
ice: fix use-after-free when deinitializing mailbox snapshot
ice: wait 5 s for EMP reset after firmware flash
ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
...
Joseph Ravichandran [Thu, 28 Apr 2022 16:57:52 +0000 (12:57 -0400)]
io_uring: fix uninitialized field in rw io_kiocb
io_rw_init_file does not initialize kiocb->private, so when iocb_bio_iopoll
reads kiocb->private it can contain uninitialized data.
Fixes:
3e08773c3841 ("block: switch polling to be bio based")
Signed-off-by: Joseph Ravichandran <jravi@mit.edu>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Thu, 28 Apr 2022 18:57:00 +0000 (11:57 -0700)]
Merge tag 'thermal-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull thermal control fixes from Rafael Wysocki:
"These take back recent chages that started to confuse users and fix up
an attr.show callback prototype in a driver.
Specifics:
- Stop warning about deprecation of the userspace thermal governor
and cooling device status interface, because there are cases in
which user space has to drive thermal management with the help of
them (Daniel Lezcano)
- Fix attr.show callback prototype in the int340x thermal driver
(Kees Cook)"
* tag 'thermal-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal/governor: Remove deprecated information
Revert "thermal/core: Deprecate changing cooling device state from userspace"
thermal: int340x: Fix attr.show callback prototype
Linus Torvalds [Thu, 28 Apr 2022 18:50:21 +0000 (11:50 -0700)]
Merge tag 'pm-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix up recent intel_idle driver changes and fix some ARM cpufreq
driver issues.
Specifics:
- Fix issues with the Qualcomm's cpufreq driver (Dmitry Baryshkov,
Vladimir Zapolskiy).
- Fix memory leak with the Sun501 driver (Xiaobing Luo).
- Make intel_idle enable C1E promotion on all CPUs when C1E is
preferred to C1 (Artem Bityutskiy).
- Make C6 optimization on Sapphire Rapids added recently work as
expected if both C1E and C1 are "preferred" (Artem Bityutskiy)"
* tag 'pm-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: Fix SPR C6 optimization
intel_idle: Fix the 'preferred_cstates' module parameter
cpufreq: qcom-cpufreq-hw: Clear dcvs interrupts
cpufreq: fix memory leak in sun50i_cpufreq_nvmem_probe
cpufreq: qcom-cpufreq-hw: Fix throttle frequency value on EPSS platforms
cpufreq: qcom-hw: provide online/offline operations
cpufreq: qcom-hw: fix the opp entries refcounting
cpufreq: qcom-hw: fix the race between LMH worker and cpuhp
cpufreq: qcom-hw: drop affinity hint before freeing the IRQ
Linus Torvalds [Thu, 28 Apr 2022 18:37:20 +0000 (11:37 -0700)]
Merge tag 'acpi-5.18-rc5' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael WysockiL
"These fix up the ACPI processor driver after a change made during the
5.16 cycle that inadvertently broke falling back to shallower C-states
when C3 cannot be used.
Specifics:
- Make the ACPI processor driver avoid falling back to C3 type of
C-states when C3 cannot be requested (Ville Syrjälä)
- Revert a quirk that is not necessary any more after fixing the
underlying issue properly (Ville Syrjälä)"
* tag 'acpi-5.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI: processor: idle: fix lockup regression on 32-bit ThinkPad T40"
ACPI: processor: idle: Avoid falling back to C3 type C-states
Linus Torvalds [Thu, 28 Apr 2022 18:13:00 +0000 (11:13 -0700)]
Merge tag 'platform-drivers-x86-v5.18-3' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Highlights:
- asus-wmi bug-fixes
- intel-sdsu bug-fixes
- build (warning) fixes
- couple of hw-id additions"
* tag 'platform-drivers-x86-v5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel: pmc/core: change pmc_lpm_modes to static
platform/x86/intel/sdsi: Fix bug in multi packet reads
platform/x86/intel/sdsi: Poll on ready bit for writes
platform/x86/intel/sdsi: Handle leaky bucket
platform/x86: intel-uncore-freq: Prevent driver loading in guests
platform/x86: gigabyte-wmi: added support for B660 GAMING X DDR4 motherboard
platform/x86: dell-laptop: Add quirk entry for Latitude 7520
platform/x86: asus-wmi: Fix driver not binding when fan curve control probe fails
platform/x86: asus-wmi: Potential buffer overflow in asus_wmi_evaluate_method_buf()
tools/power/x86/intel-speed-select: fix build failure when using -Wl,--as-needed
Linus Torvalds [Thu, 28 Apr 2022 18:07:49 +0000 (11:07 -0700)]
Merge tag 'regulator-fix-v5.18-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"A minor fix for the DT binding documentation of the rt5190a driver"
* tag 'regulator-fix-v5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: dt-bindings: Revise the rt5190a buck/ldo description
Pengcheng Yang [Tue, 26 Apr 2022 10:03:39 +0000 (18:03 +0800)]
tcp: fix F-RTO may not work correctly when receiving DSACK
Currently DSACK is regarded as a dupack, which may cause
F-RTO to incorrectly enter "loss was real" when receiving
DSACK.
Packetdrill to demonstrate:
// Enable F-RTO and TLP
0 `sysctl -q net.ipv4.tcp_frto=2`
0 `sysctl -q net.ipv4.tcp_early_retrans=3`
0 `sysctl -q net.ipv4.tcp_congestion_control=cubic`
// Establish a connection
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
// RTT 10ms, RTO 210ms
+.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <...>
+.01 < . 1:1(0) ack 1 win 257
+0 accept(3, ..., ...) = 4
// Send 2 data segments
+0 write(4, ..., 2000) = 2000
+0 > P. 1:2001(2000) ack 1
// TLP
+.022 > P. 1001:2001(1000) ack 1
// Continue to send 8 data segments
+0 write(4, ..., 10000) = 10000
+0 > P. 2001:10001(8000) ack 1
// RTO
+.188 > . 1:1001(1000) ack 1
// The original data is acked and new data is sent(F-RTO step 2.b)
+0 < . 1:1(0) ack 2001 win 257
+0 > P. 10001:12001(2000) ack 1
// D-SACK caused by TLP is regarded as a dupack, this results in
// the incorrect judgment of "loss was real"(F-RTO step 3.a)
+.022 < . 1:1(0) ack 2001 win 257 <sack 1001:2001,nop,nop>
// Never-retransmitted data(3001:4001) are acked and
// expect to switch to open state(F-RTO step 3.b)
+0 < . 1:1(0) ack 4001 win 257
+0 %{ assert tcpi_ca_state == 0, tcpi_ca_state }%
Fixes:
e33099f96d99 ("tcp: implement RFC5682 F-RTO")
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/1650967419-2150-1-git-send-email-yangpc@wangsu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 28 Apr 2022 16:55:59 +0000 (09:55 -0700)]
Merge git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Fix incorrect TCP connection tracking window reset for non-syn
packets, from Florian Westphal.
2) Incorrect dependency on CONFIG_NFT_FLOW_OFFLOAD, from Volodymyr Mytnyk.
3) Fix nft_socket from the output path, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nft_socket: only do sk lookups when indev is available
netfilter: conntrack: fix udp offload timeout sysctl
netfilter: nf_conntrack_tcp: re-init for syn packets only
====================
Link: https://lore.kernel.org/r/20220428142109.38726-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Apr 2022 16:50:29 +0000 (09:50 -0700)]
Merge tag 'gfs2-v5.18-rc4-fix2' of git://git./linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
- No short reads or writes upon glock contention
* tag 'gfs2-v5.18-rc4-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: No short reads or writes upon glock contention
Dany Madden [Wed, 27 Apr 2022 23:51:46 +0000 (18:51 -0500)]
Revert "ibmvnic: Add ethtool private flag for driver-defined queue limits"
This reverts commit
723ad916134784b317b72f3f6cf0f7ba774e5dae
When client requests channel or ring size larger than what the server
can support the server will cap the request to the supported max. So,
the client would not be able to successfully request resources that
exceed the server limit.
Fixes:
723ad9161347 ("ibmvnic: Add ethtool private flag for driver-defined queue limits")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220427235146.23189-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Wed, 27 Apr 2022 20:30:17 +0000 (23:30 +0300)]
net: enetc: allow tc-etf offload even with NETIF_F_CSUM_MASK
The Time-Specified Departure feature is indeed mutually exclusive with
TX IP checksumming in ENETC, but TX checksumming in itself is broken and
was removed from this driver in commit
82728b91f124 ("enetc: Remove Tx
checksumming offload code").
The blamed commit declared NETIF_F_HW_CSUM in dev->features to comply
with software TSO's expectations, and still did the checksumming in
software by calling skb_checksum_help(). So there isn't any restriction
for the Time-Specified Departure feature.
However, enetc_setup_tc_txtime() doesn't understand that, and blindly
looks for NETIF_F_CSUM_MASK.
Instead of checking for things which can literally never happen in the
current code base, just remove the check and let the driver offload
tc-etf qdiscs.
Fixes:
acede3c5dad5 ("net: enetc: declare NETIF_F_HW_CSUM and do it in software")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20220427203017.1291634-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Wed, 27 Apr 2022 17:31:52 +0000 (10:31 -0700)]
ixgbe: ensure IPsec VF<->PF compatibility
The VF driver can forward any IPsec flags and such makes the function
is not extendable and prone to backward/forward incompatibility.
If new software runs on VF, it won't know that PF configured something
completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag.
Fixes:
eda0333ac293 ("ixgbe: add VF IPsec management")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shannon Nelson <snelson@pensando.io>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220427173152.443102-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Apr 2022 16:37:56 +0000 (09:37 -0700)]
Merge tag 'xfs-5.18-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Dave Chinner:
- define buffer bit flags as unsigned to fix gcc-5 + c11 warnings
- remove redundant XFS fields from MAINTAINERS
- fix inode buffer locking order regression
* tag 'xfs-5.18-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reorder iunlink remove operation in xfs_ifree
MAINTAINERS: update IOMAP FILESYSTEM LIBRARY and XFS FILESYSTEM
xfs: convert buffer flags to unsigned.
Florian Fainelli [Wed, 27 Apr 2022 16:36:06 +0000 (09:36 -0700)]
MAINTAINERS: Update BNXT entry with firmware files
There appears to be a maintainer gap for BNXT TEE firmware files which
causes some patches to be missed. Update the entry for the BNXT Ethernet
controller with its companion firmware files.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220427163606.126154-1-f.fainelli@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Rafael J. Wysocki [Thu, 28 Apr 2022 14:51:24 +0000 (16:51 +0200)]
Merge branch 'thermal-int340x'
Merge a fix for the attr.show callback prototype in the int340x thermal
driver (Kees Cook).
* thermal-int340x:
thermal: int340x: Fix attr.show callback prototype