review.tizen.org Git - platform/kernel/linux-rpi.git/log

Merge tag 'rcu-urgent.2022.01.26a' of git://git./linux/kernel/git/paulmck/linux-rcu

Pull RCU fix from Paul McKenney:
"This fixes a brown-paper-bag bug in RCU tasks that causes things like
  BPF and ftrace to fail miserably on systems with non-power-of-two
  numbers of CPUs.

  It fixes a math error added in 7a30871b6a27 ("rcu-tasks: Introduce
  ->percpu_enqueue_shift for dynamic queue selection') during the v5.17
  merge window. This commit works correctly only on systems with a
  power-of-two number of CPUs, which just so happens to be the kind that
  rcutorture always uses by default.

  This pull request fixes the math so that things also work on systems
  that don't happen to have a power-of-two number of CPUs"

* tag 'rcu-urgent.2022.01.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu-tasks: Fix computation of CPU-to-list shift counts

Merge tag 'hyperv-fixes-signed-20220128' of git://git./linux/kernel/git/hyperv/linux

Pull hyperv fixes from Wei Liu:

- Fix screen resolution for hyperv framebuffer (Michael Kelley)

- Fix packet header accounting for balloon driver (Yanming Liu)

* tag 'hyperv-fixes-signed-20220128' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
video: hyperv_fb: Fix validation of screen resolution
Drivers: hv: balloon: account for vmbus packet header in max_pkt_size

Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"Two larger x86 series:

   - Redo incorrect fix for SEV/SMAP erratum

   - Windows 11 Hyper-V workaround

  Other x86 changes:

   - Various x86 cleanups

   - Re-enable access_tracking_perf_test

   - Fix for #GP handling on SVM

   - Fix for CPUID leaf 0Dh in KVM_GET_SUPPORTED_CPUID

   - Fix for ICEBP in interrupt shadow

   - Avoid false-positive RCU splat

   - Enable Enlightened MSR-Bitmap support for real

  ARM:

   - Correctly update the shadow register on exception injection when
     running in nVHE mode

   - Correctly use the mm_ops indirection when performing cache
     invalidation from the page-table walker

   - Restrict the vgic-v3 workaround for SEIS to the two known broken
     implementations

  Generic code changes:

   - Dead code cleanup"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
  KVM: eventfd: Fix false positive RCU usage warning
  KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
  KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
  KVM: nVMX: Rename vmcs_to_field_offset{,_table}
  KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
  KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS
  selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP
  KVM: x86: add system attribute to retrieve full set of supported xsave states
  KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr
  selftests: kvm: move vm_xsave_req_perm call to amx_test
  KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time
  KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS
  KVM: x86: Keep MSR_IA32_XSS unchanged for INIT
  KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2}
  KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02
  KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest
  KVM: x86: Check .flags in kvm_cpuid_check_equal() too
  KVM: x86: Forcibly leave nested virt when SMM state is toggled
  KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments()
  KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real
  ...

Merge tag 'mips-fixes-5.17_1' of git://git./linux/kernel/git/mips/linux

Pull MIPS build fix from Thomas Bogendoerfer:
"Fix for allmodconfig build"

* tag 'mips-fixes-5.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Fix build error due to PTR used in more places

Merge tag 's390-5.17-3' of git://git./linux/kernel/git/s390/linux

Pull s390 fixes from Vasily Gorbik:

- Fix loading of modules with lots of relocations and add a regression
   test for it.

- Fix machine check handling for vector validity and guarded storage
   validity failures in KVM guests.

- Fix hypervisor performance data to include z/VM guests with access
   control group set.

- Fix z900 build problem in uaccess code.

- Update defconfigs.

* tag 's390-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/hypfs: include z/VM guests with access control group set
  s390: update defconfigs
  s390/module: test loading modules with a lot of relocations
  s390/module: fix loading modules with a lot of relocations
  s390/uaccess: fix compile error
  s390/nmi: handle vector validity failures for KVM guests
  s390/nmi: handle guarded storage validity failures for KVM guests

Merge tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
"A ZERO_SIZE_PTR dereference fix from Xiubo and two fixes for async
  creates interacting with pool namespace-constrained OSD permissions
  from Jeff (marked for stable)"

* tag 'ceph-for-5.17-rc2' of git://github.com/ceph/ceph-client:
  ceph: set pool_ns in new inode layout for async creates
  ceph: properly put ceph_string reference after async create attempt
  ceph: put the requests/sessions when it fails to alloc memory

ocfs2: fix subdirectory registration with register_sysctl()

The kernel test robot reports that commit c42ff46f97c1 ("ocfs2: simplify
subdirectory registration with register_sysctl()") is broken, and
results in kernel warning messages like

  sysctl table check failed: fs/ocfs2/nm Not a file
  sysctl table check failed: fs/ocfs2/nm No proc_handler
  sysctl table check failed: fs/ocfs2/nm bogus .mode 0555

and in fact this was already reported back in linux-next, but nobody
seems to have reacted to that report.  Possibly that original report
only ever made it to the lkp list.

The problem seems to be that the simplification didn't actually go far
enough, and should have converted the whole directory path to the final
sysctl file, rather than just the two first components.

So take that last step.

Fixes: c42ff46f97c1 ("ocfs2: simplify subdirectory registration with register_sysctl()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220128065310.GF8421@xsang-OptiPlex-9020/
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/KQ2F6TPJWMDVEXJM4WTUC4DU3EH3YJVT/
Tested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'fsnotify_for_v5.17-rc2' of git://git./linux/kernel/git/jack/linux-fs

Pull fsnotify fixes from Jan Kara:
"Fixes for userspace breakage caused by fsnotify changes ~3 years ago
  and one fanotify cleanup"

* tag 'fsnotify_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: fix fsnotify hooks in pseudo filesystems
  fsnotify: invalidate dcache before IN_DELETE event
  fanotify: remove variable set but not used

Merge tag 'fs_for_v5.17-rc2' of git://git./linux/kernel/git/jack/linux-fs

Pull udf and quota fixes from Jan Kara:
"Fixes for crashes in UDF when inode expansion fails and one quota
  cleanup"

* tag 'fs_for_v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: cleanup double word in comment
  udf: Restore i_lenAlloc when inode expansion fails
  udf: Fix NULL ptr deref when converting from inline format

Merge tag 'kvmarm-fixes-5.17-1' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 5.17, take #1

- Correctly update the shadow register on exception injection when
  running in nVHE mode

- Correctly use the mm_ops indirection when performing cache invalidation
  from the page-table walker

- Restrict the vgic-v3 workaround for SEIS to the two known broken
  implementations

KVM: eventfd: Fix false positive RCU usage warning

Fix the following false positive warning:
=============================
WARNING: suspicious RCU usage
5.16.0-rc4+ #57 Not tainted
-----------------------------
arch/x86/kvm/../../../virt/kvm/eventfd.c:484 RCU-list traversed in non-reader section!!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
3 locks held by fc_vcpu 0/330:
  #0: ffff8884835fc0b0 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x88/0x6f0 [kvm]
  #1: ffffc90004c0bb68 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0x600/0x1860 [kvm]
  #2: ffffc90004c0c1d0 (&kvm->irq_srcu){....}-{0:0}, at: kvm_notify_acked_irq+0x36/0x180 [kvm]

stack backtrace:
CPU: 26 PID: 330 Comm: fc_vcpu 0 Not tainted 5.16.0-rc4+
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
  <TASK>
  dump_stack_lvl+0x44/0x57
  kvm_notify_acked_gsi+0x6b/0x70 [kvm]
  kvm_notify_acked_irq+0x8d/0x180 [kvm]
  kvm_ioapic_update_eoi+0x92/0x240 [kvm]
  kvm_apic_set_eoi_accelerated+0x2a/0xe0 [kvm]
  handle_apic_eoi_induced+0x3d/0x60 [kvm_intel]
  vmx_handle_exit+0x19c/0x6a0 [kvm_intel]
  vcpu_enter_guest+0x66e/0x1860 [kvm]
  kvm_arch_vcpu_ioctl_run+0x438/0x7f0 [kvm]
  kvm_vcpu_ioctl+0x38a/0x6f0 [kvm]
  __x64_sys_ioctl+0x89/0xc0
  do_syscall_64+0x3a/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Since kvm_unregister_irq_ack_notifier() does synchronize_srcu(&kvm->irq_srcu),
kvm->irq_ack_notifier_list is protected by kvm->irq_srcu. In fact,
kvm->irq_srcu SRCU read lock is held in kvm_notify_acked_irq(), making it
a false positive warning. So use hlist_for_each_entry_srcu() instead of
hlist_for_each_entry_rcu().

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
Message-Id: <f98bac4f5052bad2c26df9ad50f7019e40434512.1643265976.git.houwenlong.hwl@antgroup.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use

Hyper-V TLFS explicitly forbids VMREAD and VMWRITE instructions when
Enlightened VMCS interface is in use:

"Any VMREAD or VMWRITE instructions while an enlightened VMCS is
active is unsupported and can result in unexpected behavior.""

Windows 11 + WSL2 seems to ignore this, attempts to VMREAD VMCS field
0x4404 ("VM-exit interruption information") are observed. Failing
these attempts with nested_vmx_failInvalid() makes such guests
unbootable.

Microsoft confirms this is a Hyper-V bug and claims that it'll get fixed
eventually but for the time being we need a workaround. (Temporary) allow
VMREAD to get data from the currently loaded Enlightened VMCS.

Note: VMWRITE instructions remain forbidden, it is not clear how to
handle them properly and hopefully won't ever be needed.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()

In preparation to allowing reads from Enlightened VMCS from
handle_vmread(), implement evmcs_field_offset() to get the correct
read offset. get_evmcs_offset(), which is being used by KVM-on-Hyper-V,
is almost what's needed but a few things need to be adjusted. First,
WARN_ON() is unacceptable for handle_vmread() as any field can (in
theory) be supplied by the guest and not all fields are defined in
eVMCS v1. Second, we need to handle 'holes' in eVMCS (missing fields).
It also sounds like a good idea to WARN_ON() if such fields are ever
accessed by KVM-on-Hyper-V.

Implement dedicated evmcs_field_offset() helper.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: Rename vmcs_to_field_offset{,_table}

vmcs_to_field_offset{,_table} may sound misleading as VMCS is an opaque
blob which is not supposed to be accessed directly. In fact,
vmcs_to_field_offset{,_table} are related to KVM defined VMCS12 structure.

Rename vmcs_field_to_offset() to get_vmcs12_field_offset() for clarity.

No functional change intended.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER

Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field,
PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes
sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too.

Note, none of the currently existing Windows/Hyper-V versions are known
to enable 'save VMX-preemption timer value' when eVMCS is in use, the
change is aimed at making the filtering future proof.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS

Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS,
MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair,
MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way
MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely
on 'true' MSR data.

Note, none of the currently existing Windows/Hyper-V versions are known
to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change
is aimed at making the filtering future proof.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP

Provide coverage for the new API.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: add system attribute to retrieve full set of supported xsave states

Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded
VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that
have not been enabled. Probing those, for example so that they can be
passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl.
The latter is in fact worse, even though that is what the rest of the
API uses, because it would require supported_xcr0 to be moved from the
KVM module to the kernel just for this use. In addition, the value
would be nonsensical (or an error would have to be returned) until
the KVM module is loaded in.

Therefore, to limit the growth of system ioctls, add a /dev/kvm
variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86
with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr

Add a helper to handle converting the u64 userspace address embedded in
struct kvm_device_attr into a userspace pointer, it's all too easy to
forget the intermediate "unsigned long" cast as well as the truncation
check.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'ata-5.17-rc2' of git://git./linux/kernel/git/dlemoal/libata

Pull ATA fix from Damien Le Moal:
"A single fix for 5.17-rc2, adding a missing resource allocation error
check in the pata_platform driver, from Zhou"

* tag 'ata-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: pata_platform: Fix a NULL pointer dereference in __pata_platform_probe()

Merge tag 'hwmon-for-v5.17-rc2' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- Fix crash in nct6775 driver

- Prevent divide by zero in adt7470 driver

- Fix conditional compile warning in pmbus/ir38064 driver

- Various minor fixes in lm90 driver

* tag 'hwmon-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (nct6775) Fix crash in clear_caseopen
  hwmon: (adt7470) Prevent divide by zero in adt7470_fan_write()
  hwmon: (pmbus/ir38064) Mark ir38064_of_match as __maybe_unused
  hwmon: (lm90) Fix sysfs and udev notifications
  hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
  hwmon: (lm90) Mark alert as broken for MAX6680
  hwmon: (lm90) Mark alert as broken for MAX6654
  hwmon: (lm90) Re-enable interrupts after alert clears
  hwmon: (lm90) Reduce maximum conversion rate for G781

Merge tag 'drm-fixes-2022-01-28' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
"This week's regular normal fixes. amdgpu and msm make up the bulk of
  it, with a scattering of fixes elsewhere.

  atomic:
   - fix CRTC handling during modeset

  privcy-screen:
   - honor acpi=off

  ttm:
   - build fix for um

  panel:
   - add orientation quirk for 1NetBook OneXPlayer

  amdgpu:
   - Proper fix for otg synchronization logic regression
   - DCN3.01 fixes
   - Filter out secondary radeon PCI IDs
   - udelay fixes
   - Fix a memory leak in an error path

  msm:
   - parameter check fixes
   - put_device balancing
   - idle/suspend fixes

  etnaviv:
   - relax submit size checks

  vc4:
   - fix potential deadlock in DSI code

  ast:
   - revert 1600x900 mode change"

* tag 'drm-fixes-2022-01-28' of git://anongit.freedesktop.org/drm/drm: (25 commits)
  drm/privacy-screen: honor acpi=off in detect_thinkpad_privacy_screen
  Revert "drm/ast: Support 1600x900 with 108MHz PCLK"
  drm/amdgpu/display: Remove t_srx_delay_us.
  drm/amd/display: Wrap dcn301_calculate_wm_and_dlg for FPU.
  drm/amd/display: Fix FP start/end for dcn30_internal_validate_bw.
  drm/amd/display/dc/calcs/dce_calcs: Fix a memleak in calculate_bandwidth()
  drm/amdgpu/display: use msleep rather than udelay for long delays
  drm/amdgpu/display: adjust msleep limit in dp_wait_for_training_aux_rd_interval
  drm/amdgpu: filter out radeon secondary ids as well
  drm/amd/display: change FIFO reset condition to embedded display only
  drm/amd/display: Correct MPC split policy for DCN301
  drm/amd/display: Fix for otg synchronization logic
  drm/etnaviv: relax submit size limits
  drm/msm/gpu: Cancel idle/boost work on suspend
  drm/msm/gpu: Wait for idle before suspending
  drm/atomic: Add the crtc to affected crtc only if uapi.enable = true
  drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
  drm/msm/a6xx: Add missing suspend_count increment
  drm/msm: Fix wrong size calculation
  drm/msm/dpu: invalid parameter check in dpu_setup_dspp_pcc
  ...

Merge tag 'amd-drm-fixes-5.17-2022-01-26' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-5.17-2022-01-26:

amdgpu:
- Proper fix for otg synchronization logic regression
- DCN3.01 fixes
- Filter out secondary radeon PCI IDs
- udelay fixes
- Fix a memory leak in an error path

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220127041006.5695-1-alexander.deucher@amd.com

Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux into drm-fixes

- relax submit size checks.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas Stach <l.stach@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/8c2cb3e3a702be86db9d43ca8927b6b78ac2b1d2.camel@pengutronix.de

Merge tag 'drm-msm-fixes-2022-01-25' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

A few msm fixes.
- parameter checks
- put_device balancing
- idle/suspend fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvAfsgtr==VM4wixAC_hSTuV=eNWXxX=BhZqQrbxHjKgg@mail.gmail.com

Merge tag 'drm-misc-fixes-2022-01-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

* drm/ast: Revert 1600x800 with 108MHz PCLK
* drm/atomic: fix CRTC handling during modeset
* drm/privacy-screen: Honor acpi=off
* drm/ttm: build fix for ARCH=um

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YfJgH9fz+oo7YSXd@linux-uq9g.fritz.box

Merge tag 'drm-misc-next-fixes-2022-01-21' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes

* vc4: Fix potential deadlock in DSI code
* panel: Add orientation quirk for 1Netbook OneXPlayer

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Yepuhj+Ks+IyJ9Dp@linux-uq9g

Merge tag 'net-5.17-rc2' of git://git./linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and can.

  Current release - new code bugs:

   - tcp: add a missing sk_defer_free_flush() in tcp_splice_read()

   - tcp: add a stub for sk_defer_free_flush(), fix CONFIG_INET=n

   - nf_tables: set last expression in register tracking area

   - nft_connlimit: fix memleak if nf_ct_netns_get() fails

   - mptcp: fix removing ids bitmap setting

   - bonding: use rcu_dereference_rtnl when getting active slave

   - fix three cases of sleep in atomic context in drivers: lan966x, gve

   - handful of build fixes for esoteric drivers after netdev->dev_addr
     was made const

  Previous releases - regressions:

   - revert "ipv6: Honor all IPv6 PIO Valid Lifetime values", it broke
     Linux compatibility with USGv6 tests

   - procfs: show net device bound packet types

   - ipv4: fix ip option filtering for locally generated fragments

   - phy: broadcom: hook up soft_reset for BCM54616S

  Previous releases - always broken:

   - ipv4: raw: lock the socket in raw_bind()

   - ipv4: decrease the use of shared IPID generator to decrease the
     chance of attackers guessing the values

   - procfs: fix cross-netns information leakage in /proc/net/ptype

   - ethtool: fix link extended state for big endian

   - bridge: vlan: fix single net device option dumping

   - ping: fix the sk_bound_dev_if match in ping_lookup"

* tag 'net-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits)
  net: bridge: vlan: fix memory leak in __allowed_ingress
  net: socket: rename SKB_DROP_REASON_SOCKET_FILTER
  ipv4: remove sparse error in ip_neigh_gw4()
  ipv4: avoid using shared IP generator for connected sockets
  ipv4: tcp: send zero IPID in SYNACK messages
  ipv4: raw: lock the socket in raw_bind()
  MAINTAINERS: add missing IPv4/IPv6 header paths
  MAINTAINERS: add more files to eth PHY
  net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
  net: bridge: vlan: fix single net device option dumping
  net: stmmac: skip only stmmac_ptp_register when resume from suspend
  net: stmmac: configure PTP clock source prior to PTP initialization
  Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"
  connector/cn_proc: Use task_is_in_init_pid_ns()
  pid: Introduce helper task_is_in_init_pid_ns()
  gve: Fix GFP flags when allocing pages
  net: lan966x: Fix sleep in atomic context when updating MAC table
  net: lan966x: Fix sleep in atomic context when injecting frames
  ethernet: seeq/ether3: don't write directly to netdev->dev_addr
  ethernet: 8390/etherh: don't write directly to netdev->dev_addr
  ...

net: bridge: vlan: fix memory leak in __allowed_ingress

When using per-vlan state, if vlan snooping and stats are disabled,
untagged or priority-tagged ingress frame will go to check pvid state.
If the port state is forwarding and the pvid state is not
learning/forwarding, untagged or priority-tagged frame will be dropped
but skb memory is not freed.
Should free skb when __allowed_ingress returns false.

Fixes: a580c76d534c ("net: bridge: vlan: add per-vlan state")
Signed-off-by: Tim Yi <tim.yi@pica8.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20220127074953.12632-1-tim.yi@pica8.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: socket: rename SKB_DROP_REASON_SOCKET_FILTER

Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
as the reason of skb drop out of socket filter before
it's part of a released kernel. It will be used for
more protocols than just TCP in future series.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/all/20220127091308.91401-2-imagedong@tencent.com/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: remove sparse error in ip_neigh_gw4()

./include/net/route.h:373:48: warning: incorrect type in argument 2 (different base types)
./include/net/route.h:373:48: expected unsigned int [usertype] key
./include/net/route.h:373:48: got restricted __be32 [usertype] daddr

Fixes: 5c9f7c1dfc2e ("ipv4: Add helpers for neigh lookup for nexthop")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220127013404.1279313-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'ipv4-less-uses-of-shared-ip-generator'

Eric Dumazet says:

====================
ipv4: less uses of shared IP generator

From: Eric Dumazet <edumazet@google.com>

We keep receiving research reports based on linux IPID generation.

Before breaking part of the Internet by switching to pure
random generator, this series reduces the need for the
shared IP generator for TCP sockets.
====================

Link: https://lore.kernel.org/r/20220127011022.1274803-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: avoid using shared IP generator for connected sockets

ip_select_ident_segs() has been very conservative about using
the connected socket private generator only for packets with IP_DF
set, claiming it was needed for some VJ compression implementations.

As mentioned in this referenced document, this can be abused.
(Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)

Before switching to pure random IPID generation and possibly hurt
some workloads, lets use the private inet socket generator.

Not only this will remove one vulnerability, this will also
improve performance of TCP flows using pmtudisc==IP_PMTUDISC_DONT

Fixes: 73f156a6e8c1 ("inetpeer: get rid of ip_id_count")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reported-by: Ray Che <xijiache@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ipv4: tcp: send zero IPID in SYNACK messages

In commit 431280eebed9 ("ipv4: tcp: send zero IPID for RST and
ACK sent in SYN-RECV and TIME-WAIT state") we took care of some
ctl packets sent by TCP.

It turns out we need to use a similar strategy for SYNACK packets.

By default, they carry IP_DF and IPID==0, but there are ways
to ask them to use the hashed IP ident generator and thus
be used to build off-path attacks.
(Ref: Off-Path TCP Exploits of the Mixed IPID Assignment)

One of this way is to force (before listener is started)
echo 1 >/proc/sys/net/ipv4/ip_no_pmtu_disc

Another way is using forged ICMP ICMP_FRAG_NEEDED
with a very small MTU (like 68) to force a false return from
ip_dont_fragment()

In this patch, ip_build_and_send_pkt() uses the following
heuristics.

1) Most SYNACK packets are smaller than IPV4_MIN_MTU and therefore
can use IP_DF regardless of the listener or route pmtu setting.

2) In case the SYNACK packet is bigger than IPV4_MIN_MTU,
we use prandom_u32() generator instead of the IPv4 hashed ident one.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ray Che <xijiache@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Cc: Geoff Alexander <alexandg@cs.unm.edu>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drm/vmwgfx: Fix stale file descriptors on failed usercopy

A failing usercopy of the fence_rep object will lead to a stale entry in
the file descriptor table as put_unused_fd() won't release it. This
enables userland to refer to a dangling 'file' object through that still
valid file descriptor, leading to all kinds of use-after-free
exploitation scenarios.

Fix this by deferring the call to fd_install() until after the usercopy
has succeeded.

Fixes: c906965dee22 ("drm/vmwgfx: Add export fence to file descriptor support")
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ipv4: raw: lock the socket in raw_bind()

For some reason, raw_bind() forgot to lock the socket.

BUG: KCSAN: data-race in __ip4_datagram_connect / raw_bind

write to 0xffff8881170d4308 of 4 bytes by task 5466 on cpu 0:
raw_bind+0x1b0/0x250 net/ipv4/raw.c:739
inet_bind+0x56/0xa0 net/ipv4/af_inet.c:443
__sys_bind+0x14b/0x1b0 net/socket.c:1697
__do_sys_bind net/socket.c:1708 [inline]
__se_sys_bind net/socket.c:1706 [inline]
__x64_sys_bind+0x3d/0x50 net/socket.c:1706
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

read to 0xffff8881170d4308 of 4 bytes by task 5468 on cpu 1:
__ip4_datagram_connect+0xb7/0x7b0 net/ipv4/datagram.c:39
ip4_datagram_connect+0x2a/0x40 net/ipv4/datagram.c:89
inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576
__sys_connect_file net/socket.c:1900 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1917
__do_sys_connect net/socket.c:1927 [inline]
__se_sys_connect net/socket.c:1924 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:1924
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00000000 -> 0x0003007f

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 5468 Comm: syz-executor.5 Not tainted 5.17.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: add missing IPv4/IPv6 header paths

Add missing headers to the IP entry.

Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: add more files to eth PHY

include/linux/linkmode.h and include/linux/mii.h
do not match anything in MAINTAINERS. Looks like
they should be under Ethernet PHY.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()

When readl_poll_timeout() timeout, we'd better directly use its return
value.

Before this patch:
[ 2.145528] dwmac-sun8i: probe of 4500000.ethernet failed with error -14

After this patch:
[ 2.138520] dwmac-sun8i: probe of 4500000.ethernet failed with error -110

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: vlan: fix single net device option dumping

When dumping vlan options for a single net device we send the same
entries infinitely because user-space expects a 0 return at the end but
we keep returning skb->len and restarting the dump on retry. Fix it by
returning the value from br_vlan_dump_dev() if it completed or there was
an error. The only case that must return skb->len is when the dump was
incomplete and needs to continue (-EMSGSIZE).

Reported-by: Benjamin Poirier <bpoirier@nvidia.com>
Fixes: 8dcea187088b ("net: bridge: vlan: add rtm definitions and dump support")
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'stmmac-ptp-fix'

Mohammad Athari Bin Ismail says:

====================
Fix PTP issue in stmmac

This patch series to fix PTP issue in stmmac related to:
1/ PTP clock source configuration during initialization.
2/ PTP initialization during resume from suspend.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: skip only stmmac_ptp_register when resume from suspend

When resume from suspend, besides skipping PTP registration, it also
skipping PTP HW initialization. This could cause PTP clock not able to
operate properly when resume from suspend.

To fix this, only stmmac_ptp_register() is skipped when resume from
suspend.

Fixes: fe1319291150 ("stmmac: Don't init ptp again when resume from suspend/hibernation")
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: configure PTP clock source prior to PTP initialization

For Intel platform, it is required to configure PTP clock source prior PTP
initialization in MAC. So, need to move ptp_clk_freq_config execution from
stmmac_ptp_register() to stmmac_init_ptp().

Fixes: 76da35dc99af ("stmmac: intel: Add PSE and PCH PTP clock source selection")
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "ipv6: Honor all IPv6 PIO Valid Lifetime values"

This reverts commit b75326c201242de9495ff98e5d5cff41d7fc0d9d.

This commit breaks Linux compatibility with USGv6 tests. The RFC this
commit was based on is actually an expired draft: no published RFC
currently allows the new behaviour it introduced.

Without full IETF endorsement, the flash renumbering scenario this
patch was supposed to enable is never going to work, as other IPv6
equipements on the same LAN will keep the 2 hours limit.

Fixes: b75326c20124 ("ipv6: Honor all IPv6 PIO Valid Lifetime values")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'rpmsg-v5.17-fixes' of git://git./linux/kernel/git/remoteproc/linux

Pull rpmsg fixes from Bjorn Andersson:
"The cdev cleanup in the rpmsg_char driver was not performed properly,
  resulting in unpredicable behaviour when the parent remote processor
  is stopped with any of the cdevs open by a client.

  Two patches transitions the implementation to use cdev_device_add()
  and cdev_del_device(), to capture the relationship between the two
  objects, and relocates the incorrectly placed cdev_del()"

* tag 'rpmsg-v5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev
  rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev

Merge tag 'rproc-v5.17-fixes' of git://git./linux/kernel/git/remoteproc/linux

Pull remoteproc fix from Bjorn Andersson:
"The interaction between the various Qualcomm remoteproc drivers and
  the Qualcomm 'QMP' driver (used to communicate with the
  power-management hardware) was reworked in v5.17-rc1, but failed to
  account for the new Kconfig dependency"

* tag 'rproc-v5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
  remoteproc: qcom: q6v5: fix service routines build errors

MIPS: Fix build error due to PTR used in more places

Use PTR_WD instead of PTR to avoid clashes with other parts.

Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

Merge branch 'pid-introduce-helper-task_is_in_root_ns'

Leo Yan says:

====================
pid: Introduce helper task_is_in_root_ns()

This patch series introduces a helper function task_is_in_init_pid_ns()
to replace open code. The two patches are extracted from the original
series [1] for network subsystem.

As a plan, we can firstly land this patch set into kernel 5.18; there
have 5 patches are left out from original series [1], as a next step,
I will resend them for appropriate linux-next merging.

[1] https://lore.kernel.org/lkml/20211208083320.472503-1-leo.yan@linaro.org/
====================

Link: https://lore.kernel.org/r/20220126050427.605628-1-leo.yan@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

connector/cn_proc: Use task_is_in_init_pid_ns()

This patch replaces open code with task_is_in_init_pid_ns() to check if
a task is in root PID namespace.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

pid: Introduce helper task_is_in_init_pid_ns()

Currently the kernel uses open code in multiple places to check if a
task is in the root PID namespace with the kind of format:

  if (task_active_pid_ns(current) == &init_pid_ns)
      do_something();

This patch creates a new helper function, task_is_in_init_pid_ns(), it
returns true if a passed task is in the root PID namespace, otherwise
returns false.  So it will be used to replace open codes.

Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: Fix GFP flags when allocing pages

Use GFP_ATOMIC when allocating pages out of the hotpath,
continue to use GFP_KERNEL when allocating pages during setup.

GFP_KERNEL will allow blocking which allows it to succeed
more often in a low memory enviornment but in the hotpath we do
not want to allow the allocation to block.

Fixes: f5cedc84a30d2 ("gve: Add transmit and receive support")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David Awogbemila <awogbemila@google.com>
Link: https://lore.kernel.org/r/20220126003843.3584521-1-awogbemila@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ata: pata_platform: Fix a NULL pointer dereference in __pata_platform_probe()

In __pata_platform_probe(), devm_kzalloc() is assigned to ap->ops and
there is a dereference of it right after that, which could introduce a
NULL pointer dereference bug.

Fix this by adding a NULL check of ap->ops.

This bug was found by a static analyzer.

Builds with 'make allyesconfig' show no new warnings,
and our static analyzer no longer warns about this code.

Fixes: f3d5e4f18dba ("ata: pata_of_platform: Allow to use 16-bit wide data transfer")
Signed-off-by: Zhou Qingyang <zhou1615@umn.edu>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>

rcu-tasks: Fix computation of CPU-to-list shift counts

The ->percpu_enqueue_shift field is used to map from the running CPU
number to the index of the corresponding callback list. This mapping
can change at runtime in response to varying callback load, resulting
in varying levels of contention on the callback-list locks.

Unfortunately, the initial value of this field is correct only if the
system happens to have a power-of-two number of CPUs, otherwise the
callbacks from the high-numbered CPUs can be placed into the callback list
indexed by 1 (rather than 0), and those index-1 callbacks will be ignored.
This can result in soft lockups and hangs.

This commit therefore corrects this mapping, adding one to this shift
count as needed for systems having odd numbers of CPUs.

Fixes: 7a30871b6a27 ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

ceph: set pool_ns in new inode layout for async creates

Dan reported that he was unable to write to files that had been
asynchronously created when the client's OSD caps are restricted to a
particular namespace.

The issue is that the layout for the new inode is only partially being
filled. Ensure that we populate the pool_ns_data and pool_ns_len in the
iinfo before calling ceph_fill_inode.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/54013
Fixes: 9a8d03ca2e2c ("ceph: attempt to do async create when possible")
Reported-by: Dan van der Ster <dan@vanderster.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: properly put ceph_string reference after async create attempt

The reference acquired by try_prep_async_create is currently leaked.
Ensure we put it.

Cc: stable@vger.kernel.org
Fixes: 9a8d03ca2e2c ("ceph: attempt to do async create when possible")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

ceph: put the requests/sessions when it fails to alloc memory

When failing to allocate the sessions memory we should make sure
the req1 and req2 and the sessions get put. And also in case the
max_sessions decreased so when kreallocate the new memory some
sessions maybe missed being put.

And if the max_sessions is 0 krealloc will return ZERO_SIZE_PTR,
which will lead to a distinct access fault.

URL: https://tracker.ceph.com/issues/53819
Fixes: e1a4541ec0b9 ("ceph: flush the mdlog before waiting on unsafe reqs")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

selftests: kvm: move vm_xsave_req_perm call to amx_test

There is no need for tests other than amx_test to enable dynamic xsave
states. Remove the call to vm_xsave_req_perm from generic code,
and move it inside the test. While at it, allow customizing the bit
that is requested, so that future tests can use it differently.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time

XCR0 is reset to 1 by RESET but not INIT and IA32_XSS is zeroed by
both RESET and INIT. The kvm_set_msr_common()'s handling of MSR_IA32_XSS
also needs to update kvm_update_cpuid_runtime(). In the above cases, the
size in bytes of the XSAVE area containing all states enabled by XCR0 or
(XCRO | IA32_XSS) needs to be updated.

For simplicity and consistency, existing helpers are used to write values
and call kvm_update_cpuid_runtime(), and it's not exactly a fast path.

Fixes: a554d207dc46 ("KVM: X86: Processor States following Reset or INIT")
Cc: stable@vger.kernel.org
Signed-off-by: Like Xu <likexu@tencent.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220126172226.2298529-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS

Do a runtime CPUID update for a vCPU if MSR_IA32_XSS is written, as the
size in bytes of the XSAVE area is affected by the states enabled in XSS.

Fixes: 203000993de5 ("kvm: vmx: add MSR logic for XSAVES")
Cc: stable@vger.kernel.org
Signed-off-by: Like Xu <likexu@tencent.com>
[sean: split out as a separate patch, adjust Fixes tag]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220126172226.2298529-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Keep MSR_IA32_XSS unchanged for INIT

It has been corrected from SDM version 075 that MSR_IA32_XSS is reset to
zero on Power up and Reset but keeps unchanged on INIT.

Fixes: a554d207dc46 ("KVM: X86: Processor States following Reset or INIT")
Cc: stable@vger.kernel.org
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220126172226.2298529-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

s390/hypfs: include z/VM guests with access control group set

Currently if z/VM guest is allowed to retrieve hypervisor performance
data globally for all guests (privilege class B) the query is formed in a
way to include all guests but the group name is left empty. This leads to
that z/VM guests which have access control group set not being included
in the results (even local vm).

Change the query group identifier from empty to "any" to retrieve
information about all guests from any groups (or without a group set).

Cc: stable@vger.kernel.org
Fixes: 31cb4bd31a48 ("[S390] Hypervisor filesystem (s390_hypfs) for z/VM")
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>

KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2}

Free the "struct kvm_cpuid_entry2" array on successful post-KVM_RUN
KVM_SET_CPUID{,2} to fix a memory leak, the callers of kvm_set_cpuid()
free the array only on failure.

BUG: memory leak
unreferenced object 0xffff88810963a800 (size 2048):
  comm "syz-executor025", pid 3610, jiffies 4294944928 (age 8.080s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 0d 00 00 00  ................
    47 65 6e 75 6e 74 65 6c 69 6e 65 49 00 00 00 00  GenuntelineI....
  backtrace:
    [<ffffffff814948ee>] kmalloc_node include/linux/slab.h:604 [inline]
    [<ffffffff814948ee>] kvmalloc_node+0x3e/0x100 mm/util.c:580
    [<ffffffff814950f2>] kvmalloc include/linux/slab.h:732 [inline]
    [<ffffffff814950f2>] vmemdup_user+0x22/0x100 mm/util.c:199
    [<ffffffff8109f5ff>] kvm_vcpu_ioctl_set_cpuid2+0x8f/0xf0 arch/x86/kvm/cpuid.c:423
    [<ffffffff810711b9>] kvm_arch_vcpu_ioctl+0xb99/0x1e60 arch/x86/kvm/x86.c:5251
    [<ffffffff8103e92d>] kvm_vcpu_ioctl+0x4ad/0x950 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4066
    [<ffffffff815afacc>] vfs_ioctl fs/ioctl.c:51 [inline]
    [<ffffffff815afacc>] __do_sys_ioctl fs/ioctl.c:874 [inline]
    [<ffffffff815afacc>] __se_sys_ioctl fs/ioctl.c:860 [inline]
    [<ffffffff815afacc>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
    [<ffffffff844a3335>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff844a3335>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: c6617c61e8fe ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN")
Cc: stable@vger.kernel.org
Reported-by: syzbot+be576ad7655690586eec@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220125210445.2053429-1-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02

WARN if KVM attempts to allocate a shadow VMCS for vmcs02. KVM emulates
VMCS shadowing but doesn't virtualize it, i.e. KVM should never allocate
a "real" shadow VMCS for L2.

The previous code WARNed but continued anyway with the allocation,
presumably in an attempt to avoid NULL pointer dereference.
However, alloc_vmcs (and hence alloc_shadow_vmcs) can fail, and
indeed the sole caller does:

if (enable_shadow_vmcs && !alloc_shadow_vmcs(vcpu))
goto out_shadow_vmcs;

which makes it not a useful attempt.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220125220527.2093146-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest

Don't skip the vmcall() in l2_guest_code() prior to re-entering L2, doing
so will result in L2 running to completion, popping '0' off the stack for
RET, jumping to address '0', and ultimately dying with a triple fault
shutdown.

It's not at all obvious why the test re-enters L2 and re-executes VMCALL,
but presumably it serves a purpose. The VMX path doesn't skip vmcall(),
and the test can't possibly have passed on SVM, so just do what VMX does.

Fixes: d951b2210c1a ("KVM: selftests: smm_test: Test SMM enter from L2")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220125221725.2101126-1-seanjc@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Check .flags in kvm_cpuid_check_equal() too

kvm_cpuid_check_equal() checks for the (full) equality of the supplied
CPUID data so .flags need to be checked too.

Reported-by: Sean Christopherson <seanjc@google.com>
Fixes: c6617c61e8fe ("KVM: x86: Partially allow KVM_SET_CPUID{,2} after KVM_RUN")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220126131804.2839410-1-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Forcibly leave nested virt when SMM state is toggled

Forcibly leave nested virtualization operation if userspace toggles SMM
state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
vmxon=false and smm.vmxon=false, but all other nVMX state allocated.

Don't attempt to gracefully handle the transition as (a) most transitions
are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
sufficient information to handle all transitions, e.g. SVM wants access
to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
KVM_SET_NESTED_STATE during state restore as the latter disallows putting
the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
being post-VMXON in SMM if SMM is not active.

Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
in an architecturally impossible state.

  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Modules linked in:
  CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
  Call Trace:
   <TASK>
   kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
   kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
   kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
   kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
   kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
   kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
   kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
   kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
   __fput+0x286/0x9f0 fs/file_table.c:311
   task_work_run+0xdd/0x1a0 kernel/task_work.c:164
   exit_task_work include/linux/task_work.h:32 [inline]
   do_exit+0xb29/0x2a30 kernel/exit.c:806
   do_group_exit+0xd2/0x2f0 kernel/exit.c:935
   get_signal+0x4b0/0x28c0 kernel/signal.c:2862
   arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
   handle_signal_work kernel/entry/common.c:148 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
   exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
   __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
   syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
   do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Cc: stable@vger.kernel.org
Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220125220358.2091737-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments()

Commit 3fa5e8fd0a0e4 ("KVM: SVM: delay svm_vcpu_init_msrpm after
svm->vmcb is initialized") re-arranged svm_vcpu_init_msrpm() call in
svm_create_vcpu(), thus making the comment about vmcb being NULL
obsolete. Drop it.

While on it, drop superfluous vmcb_is_clean() check: vmcb_mark_dirty()
is a bit flip, an extra check is unlikely to bring any performance gain.
Drop now-unneeded vmcb_is_clean() helper as well.

Fixes: 3fa5e8fd0a0e4 ("KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211220152139.418372-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real

Commit c4327f15dfc7 ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support")
introduced enlightened MSR-Bitmap support for KVM-on-Hyper-V but it didn't
actually enable the support. Similar to enlightened NPT TLB flush and
direct TLB flush features, the guest (KVM) has to tell L0 (Hyper-V) that
it's using the feature by setting the appropriate feature fit in VMCB
control area (sw reserved fields).

Fixes: c4327f15dfc7 ("KVM: SVM: hyper-v: Enlightened MSR-Bitmap support")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211220152139.418372-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: Don't kill SEV guest if SMAP erratum triggers in usermode

Inject a #GP instead of synthesizing triple fault to try to avoid killing
the guest if emulation of an SEV guest fails due to encountering the SMAP
erratum. The injected #GP may still be fatal to the guest, e.g. if the
userspace process is providing critical functionality, but KVM should
make every attempt to keep the guest alive.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: Don't apply SEV+SMAP workaround on code fetch or PT access

Resume the guest instead of synthesizing a triple fault shutdown if the
instruction bytes buffer is empty due to the #NPF being on the code fetch
itself or on a page table access. The SMAP errata applies if and only if
the code fetch was successful and ucode's subsequent data read from the
code page encountered a SMAP violation. In practice, the guest is likely
hosed either way, but crashing the guest on a code fetch to emulated MMIO
is technically wrong according to the behavior described in the APM.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: Inject #UD on attempted emulation for SEV guest w/o insn buffer

Inject #UD if KVM attempts emulation for an SEV guests without an insn
buffer and instruction decoding is required. The previous behavior of
allowing emulation if there is no insn buffer is undesirable as doing so
means KVM is reading guest private memory and thus decoding cyphertext,
i.e. is emulating garbage. The check was previously necessary as the
emulation type was not provided, i.e. SVM needed to allow emulation to
handle completion of emulation after exiting to userspace to handle I/O.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: WARN if KVM attempts emulation on #UD or #GP for SEV guests

WARN if KVM attempts to emulate in response to #UD or #GP for SEV guests,
i.e. if KVM intercepts #UD or #GP, as emulation on any fault except #NPF
is impossible since KVM cannot read guest private memory to get the code
stream, and the CPU's DecodeAssists feature only provides the instruction
bytes on #NPF.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-7-seanjc@google.com>
[Warn on EMULTYPE_TRAP_UD_FORCED according to Liam Merwick's review. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: Pass emulation type to can_emulate_instruction()

Pass the emulation type to kvm_x86_ops.can_emulate_insutrction() so that
a future commit can harden KVM's SEV support to WARN on emulation
scenarios that should never happen.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: Explicitly require DECODEASSISTS to enable SEV support

Add a sanity check on DECODEASSIST being support if SEV is supported, as
KVM cannot read guest private memory and thus relies on the CPU to
provide the instruction byte stream on #NPF for emulation. The intent of
the check is to document the dependency, it should never fail in practice
as producing hardware that supports SEV but not DECODEASSISTS would be
non-sensical.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: Don't intercept #GP for SEV guests

Never intercept #GP for SEV guests as reading SEV guest private memory
will return cyphertext, i.e. emulating on #GP can't work as intended.

Cc: stable@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Revert "KVM: SVM: avoid infinite loop on NPF from bad address"

Revert a completely broken check on an "invalid" RIP in SVM's workaround
for the DecodeAssists SMAP errata. kvm_vcpu_gfn_to_memslot() obviously
expects a gfn, i.e. operates in the guest physical address space, whereas
RIP is a virtual (not even linear) address. The "fix" worked for the
problematic KVM selftest because the test identity mapped RIP.

Fully revert the hack instead of trying to translate RIP to a GPA, as the
non-SEV case is now handled earlier, and KVM cannot access guest page
tables to translate RIP.

This reverts commit e72436bc3a5206f95bb384e741154166ddb3202e.

Fixes: e72436bc3a52 ("KVM: SVM: avoid infinite loop on NPF from bad address")
Reported-by: Liam Merwick <liam.merwick@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: SVM: Never reject emulation due to SMAP errata for !SEV guests

Always signal that emulation is possible for !SEV guests regardless of
whether or not the CPU provided a valid instruction byte stream. KVM can
read all guest state (memory and registers) for !SEV guests, i.e. can
fetch the code stream from memory even if the CPU failed to do so because
of the SMAP errata.

Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
Cc: stable@vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <20220120010719.711476-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86: nSVM: skip eax alignment check for non-SVM instructions

The bug occurs on #GP triggered by VMware backdoor when eax value is
unaligned. eax alignment check should not be applied to non-SVM
instructions because it leads to incorrect omission of the instructions
emulation.
Apply the alignment check only to SVM instructions to fix.

Fixes: d1cba6c92237 ("KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround")
Signed-off-by: Denis Valeev <lemniscattaden@gmail.com>
Message-Id: <Yexlhaoe1Fscm59u@q>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: x86/cpuid: Exclude unpermitted xfeatures sizes at KVM_GET_SUPPORTED_CPUID

With the help of xstate_get_guest_group_perm(), KVM can exclude unpermitted
xfeatures in cpuid.0xd.0.eax, in which case the corresponding xfeatures
sizes should also be matched to the permitted xfeatures.

To fix this inconsistency, the permitted_xcr0 and permitted_xss are defined
consistently, which implies 'supported' plus certain permissions for this
task, and it also fixes cpuid.0xd.1.ebx and later leaf-by-leaf queries.

Fixes: 445ecdf79be0 ("kvm: x86: Exclude unpermitted xfeatures at KVM_GET_SUPPORTED_CPUID")
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220125115223.33707-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: LAPIC: Also cancel preemption timer during SET_LAPIC

The below warning is splatting during guest reboot.

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1931 at arch/x86/kvm/x86.c:10322 kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm]
  CPU: 0 PID: 1931 Comm: qemu-system-x86 Tainted: G          I       5.17.0-rc1+ #5
  RIP: 0010:kvm_arch_vcpu_ioctl_run+0x874/0x880 [kvm]
  Call Trace:
   <TASK>
   kvm_vcpu_ioctl+0x279/0x710 [kvm]
   __x64_sys_ioctl+0x83/0xb0
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x7fd39797350b

This can be triggered by not exposing tsc-deadline mode and doing a reboot in
the guest. The lapic_shutdown() function which is called in sys_reboot path
will not disarm the flying timer, it just masks LVTT. lapic_shutdown() clears
APIC state w/ LVT_MASKED and timer-mode bit is 0, this can trigger timer-mode
switch between tsc-deadline and oneshot/periodic, which can result in preemption
timer be cancelled in apic_update_lvtt(). However, We can't depend on this when
not exposing tsc-deadline mode and oneshot/periodic modes emulated by preemption
timer. Qemu will synchronise states around reset, let's cancel preemption timer
under KVM_SET_LAPIC.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1643102220-35667-1-git-send-email-wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

KVM: VMX: Remove vmcs_config.order

The maximum size of a VMCS (or VMXON region) is 4096. By definition,
these are order 0 allocations.

Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220125004359.147600-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'lan966x-fixes'

Horatiu Vultur says:

====================
net: lan966x: Fixes for sleep in atomic context

This patch series contains 2 fixes for lan966x that is sleeping in atomic
context. The first patch fixes the injection of the frames while the second
one fixes the updating of the MAC table.

v1->v2:
- correct the fix tag in the second patch, it was using the wrong sha.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: lan966x: Fix sleep in atomic context when updating MAC table

The function lan966x_mac_wait_for_completion is used to poll the status
of the MAC table using the function readx_poll_timeout. The problem with
this function is that is called also from atomic context. Therefore
update the function to use readx_poll_timeout_atomic.

Fixes: e18aba8941b40b ("net: lan966x: add mactable support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: lan966x: Fix sleep in atomic context when injecting frames

On lan966x, when injecting a frame it was polling the register
QS_INJ_STATUS to see if it can continue with the injection of the frame.
The problem was that it was using readx_poll_timeout which could sleep
in atomic context.
This patch fixes this issue by using readx_poll_timeout_atomic.

Fixes: d28d6d2e37d10d ("net: lan966x: add port module support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dev_addr-const-fixes'

Jakub Kicinski says:

====================
ethernet: fix some esoteric drivers after netdev->dev_addr constification

Looking at recent fixes for drivers which don't get included with
allmodconfig builds I thought it's worth grepping for more instances of:

dev->dev_addr\[.*\] =

This set contains the fixes.

v2: add last 3 patches which fix drivers for the RiscPC ARM platform.
Thanks to Arnd Bergmann for explaining how to build test that.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: seeq/ether3: don't write directly to netdev->dev_addr

netdev->dev_addr is const now.

Compile tested rpc_defconfig w/ GCC 8.5.

Fixes: adeef3e32146 ("net: constify netdev->dev_addr")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: 8390/etherh: don't write directly to netdev->dev_addr

netdev->dev_addr is const now.

Compile tested rpc_defconfig w/ GCC 8.5.

Fixes: adeef3e32146 ("net: constify netdev->dev_addr")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: i825xx: don't write directly to netdev->dev_addr

netdev->dev_addr is const now.

Compile tested rpc_defconfig w/ GCC 8.5.

Fixes: adeef3e32146 ("net: constify netdev->dev_addr")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: broadcom/sb1250-mac: don't write directly to netdev->dev_addr

netdev->dev_addr is const now.

Compile tested bigsur_defconfig and sb1250_swarm_defconfig.

Fixes: adeef3e32146 ("net: constify netdev->dev_addr")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: tundra: don't write directly to netdev->dev_addr

netdev->dev_addr is const now.

Maintain the questionable offsetting in ndo_set_mac_address.

Compile tested holly_defconfig and mpc7448_hpc2_defconfig.

Fixes: adeef3e32146 ("net: constify netdev->dev_addr")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: 3com/typhoon: don't write directly to netdev->dev_addr

This driver casts off the const and writes directly to netdev->dev_addr.
This will result in a MAC address tree corruption and a warning.

Compile tested ppc6xx_defconfig.

Fixes: adeef3e32146 ("net: constify netdev->dev_addr")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drm/privacy-screen: honor acpi=off in detect_thinkpad_privacy_screen

when acpi=off is provided in bootarg, kernel crash with

[    1.252739] BUG: kernel NULL pointer dereference, address: 0000000000000018
[    1.258308] Call Trace:
[    1.258490]  ? acpi_walk_namespace+0x147/0x147
[    1.258770]  acpi_get_devices+0xe4/0x137
[    1.258921]  ? drm_core_init+0xc0/0xc0 [drm]
[    1.259108]  detect_thinkpad_privacy_screen+0x5e/0xa8 [drm]
[    1.259337]  drm_privacy_screen_lookup_init+0xe/0xe85 [drm]

The reason is that acpi_walk_namespace expects acpi related stuff
initialized but in fact it wouldn't when acpi is set to off. In this case
we should honor acpi=off in detect_thinkpad_privacy_screen().

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220123091004.763775-1-ztong0001@gmail.com

Revert "drm/ast: Support 1600x900 with 108MHz PCLK"

This reverts commit 9bb7b689274b67ecb3641e399e76f84adc627df1.

This caused a regression reported to Red Hat.

Fixes: 9bb7b689274b ("drm/ast: Support 1600x900 with 108MHz PCLK")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220120040527.552068-1-airlied@gmail.com

Merge drm/drm-fixes into drm-misc-fixes

Backmerging drm/drm-fixes into drm-misc-fixes for v5.17-rc1.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>

sch_htb: Fail on unsupported parameters when offload is requested

The current implementation of HTB offload doesn't support some
parameters. Instead of ignoring them, actively return the EINVAL error
when they are set to non-defaults.

As this patch goes to stable, the driver API is not changed here. If
future drivers support more offload parameters, the checks can be moved
to the driver side.

Note that the buffer and cbuffer parameters are also not supported, but
the tc userspace tool assigns some default values derived from rate and
ceil, and identifying these defaults in sch_htb would be unreliable, so
they are still ignored.

Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220125100654.424570-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

drm/amdgpu/display: Remove t_srx_delay_us.

Unused. Convert the divisions into asserts on the divisor, to
debug why it is zero. The divide by zero is suspected of causing
kernel panics.

While I have no idea where the zero is coming from I think this
patch is a positive either way.

Cc: stable@vger.kernel.org
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Wrap dcn301_calculate_wm_and_dlg for FPU.

Mirrors the logic for dcn30. Cue lots of WARNs and some
kernel panics without this fix.

Cc: stable@vger.kernel.org
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix FP start/end for dcn30_internal_validate_bw.

It calls populate_dml_pipes which uses doubles to initialize the
scale_ratio_depth params. Mirrors the dcn20 logic.

Cc: stable@vger.kernel.org
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display/dc/calcs/dce_calcs: Fix a memleak in calculate_bandwidth()

In calculate_bandwidth(), the tag free_sclk and free_yclk are reversed,
which could lead to a memory leak of yclk.

Fix this bug by changing the location of free_sclk and free_yclk.

This bug was found by a static analyzer.

Builds with 'make allyesconfig' show no new warnings,
and our static analyzer no longer warns about this code.

Fixes: 2be8989d0fc2 ("drm/amd/display/dc/calcs/dce_calcs: Move some large variables from the stack to the heap")
Signed-off-by: Zhou Qingyang <zhou1615@umn.edu>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/display: use msleep rather than udelay for long delays

Some architectures (e.g., ARM) throw an compilation error if the
udelay is too long. In general udelays of longer than 2000us are
not recommended on any architecture. Switch to msleep in these
cases.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>