Linus Torvalds [Fri, 8 Sep 2023 20:07:50 +0000 (13:07 -0700)]
Merge tag 'sound-fix-6.6-rc1' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of fixes for 6.6-rc1. All small and easy ones.
- The corrections of the previous PCM iov_iter transitions
- Regression fixes in MIDI 2.0 / USB changes
- Various ASoC codec fixes for Cirrus, Realtek, WCD
- ASoC AMD quirks and ASoC Intel AVS driver workaround"
* tag 'sound-fix-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek - ALC287 I2S speaker platform support
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82TL
ASoC: Intel: avs: Provide support for fallback topology
ALSA: seq: Fix snd_seq_expand_var_event() call to user-space
ALSA: usb-audio: Fix potential memory leaks at error path for UMP open
ALSA: hda/cirrus: Fix broken audio on hardware with two CS42L42 codecs.
ASoC: rt5645: NULL pointer access when removing jack
ASoC: amd: yc: Add DMI entries to support Victus by HP Gaming Laptop 15-fb0xxx (8A3E)
MAINTAINERS: Update the MAINTAINERS enties for TEXAS INSTRUMENTS ASoC DRIVERS
ALSA: sb: Fix wrong argument in commented code
ALSA: pcm: Fix error checks of default read/write copy ops
ASoC: Name iov_iter argument as iterator instead of buffer
ASoC: dmaengine: Drop unused iov_iter for process callback
ALSA: hda/tas2781: Use standard clamp() macro
ASoC: cs35l56: Waiting for firmware to boot must be tolerant of I/O errors
ASoC: dt-bindings: fsl_easrc: Add support for imx8mp-easrc
ASoC: cs42l43: Fix missing error code in cs42l43_codec_probe()
ASoC: cs35l45: Rename DACPCM1 Source control
ASoC: cs35l45: Fix "Dead assigment" warning
ASoC: cs35l45: Add support for Chip ID 0x35A460
...
Linus Torvalds [Fri, 8 Sep 2023 19:48:37 +0000 (12:48 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main one is a fix for a broken strscpy() conversion that landed in
the merge window and broke early parsing of the kernel command line.
- Fix an incorrect mask in the CXL PMU driver
- Fix a regression in early parsing of the kernel command line
- Fix an IP checksum OoB access reported by syzbot"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: csum: Fix OoB access in IP checksum code for negative lengths
arm64/sysreg: Fix broken strncpy() -> strscpy() conversion
perf: CXL: fix mismatched number of counters mask
Linus Torvalds [Fri, 8 Sep 2023 19:16:52 +0000 (12:16 -0700)]
Merge tag 'loongarch-6.6' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Allow usage of LSX/LASX in the kernel, and use them for
SIMD-optimized RAID5/RAID6 routines
- Add Loongson Binary Translation (LBT) extension support
- Add basic KGDB & KDB support
- Add building with kcov coverage
- Add KFENCE (Kernel Electric-Fence) support
- Add KASAN (Kernel Address Sanitizer) support
- Some bug fixes and other small changes
- Update the default config file
* tag 'loongarch-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (25 commits)
LoongArch: Update Loongson-3 default config file
LoongArch: Add KASAN (Kernel Address Sanitizer) support
LoongArch: Simplify the processing of jumping new kernel for KASLR
kasan: Add (pmd|pud)_init for LoongArch zero_(pud|p4d)_populate process
kasan: Add __HAVE_ARCH_SHADOW_MAP to support arch specific mapping
LoongArch: Add KFENCE (Kernel Electric-Fence) support
LoongArch: Get partial stack information when providing regs parameter
LoongArch: mm: Add page table mapped mode support for virt_to_page()
kfence: Defer the assignment of the local variable addr
LoongArch: Allow building with kcov coverage
LoongArch: Provide kaslr_offset() to get kernel offset
LoongArch: Add basic KGDB & KDB support
LoongArch: Add Loongson Binary Translation (LBT) extension support
raid6: Add LoongArch SIMD recovery implementation
raid6: Add LoongArch SIMD syndrome calculation
LoongArch: Add SIMD-optimized XOR routines
LoongArch: Allow usage of LSX/LASX in the kernel
LoongArch: Define symbol 'fault' as a local label in fpu.S
LoongArch: Adjust {copy, clear}_user exception handler behavior
LoongArch: Use static defined zero page rather than allocated
...
Linus Torvalds [Fri, 8 Sep 2023 19:13:01 +0000 (12:13 -0700)]
Merge tag 'printk-for-6.6-fixup' of git://git./linux/kernel/git/printk/linux
Pull printk fix from Petr Mladek:
- Revert exporting symbols needed for dumping the raw printk buffer in
panic().
I pushed the export prematurely before the user was ready for merging
into the mainline.
* tag 'printk-for-6.6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
Revert "printk: export symbols for debug modules"
Linus Torvalds [Fri, 8 Sep 2023 19:06:51 +0000 (12:06 -0700)]
Merge tag 'landlock-6.6-rc1' of git://git./linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"One test fix and a __counted_by annotation"
* tag 'landlock-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Fix a resource leak
landlock: Annotate struct landlock_rule with __counted_by
Linus Torvalds [Fri, 8 Sep 2023 02:47:04 +0000 (19:47 -0700)]
Merge tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular rounds of rc1 fixes, a large bunch for amdgpu since it's three
weeks in one go, one i915, one nouveau and one ivpu.
I think there might be a few more fixes in misc that I haven't pulled
in yet, but we should get them all for rc2.
amdgpu:
- Display replay fixes
- Fixes for headless boards
- Fix documentation breakage
- RAS fixes
- Handle newer IP discovery tables
- SMU 13.0.6 fixes
- SR-IOV fixes
- Display vstartup fixes
- NBIO 7.9 fixes
- Display scaling mode fixes
- Debugfs power reporting fix
- GC 9.4.3 fixes
- Dirty framebuffer fixes for fbcon
- eDP fixes
- DCN 3.1.5 fix
- Display ODM fixes
- GPU core dump fix
- Re-enable zops property now that IGT test is fixed
- Fix possible UAF in CS code
- Cursor degamma fix
amdkfd:
- HMM fixes
- Interrupt masking fix
- GFX11 MQD fixes
i915:
- Mark requests for GuC virtual engines to avoid use-after-free
nouveau:
- Fix fence state in nouveau_fence_emit()
ivpu:
- replace strncpy"
* tag 'drm-next-2023-09-08' of git://anongit.freedesktop.org/drm/drm: (51 commits)
drm/amdgpu: Restrict bootloader wait to SMUv13.0.6
drm/amd/display: prevent potential division by zero errors
drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma
drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1
Revert "drm/amd/display: Remove v_startup workaround for dcn3+"
drm/amdgpu: fix amdgpu_cs_p1_user_fence
Revert "Revert "drm/amd/display: Implement zpos property""
drm/amdkfd: Add missing gfx11 MQD manager callbacks
drm/amdgpu: Free ras cmd input buffer properly
drm/amdgpu: Hide xcp partition sysfs under SRIOV
drm/amdgpu: use read-modify-write mode for gfx v9_4_3 SQ setting
drm/amdkfd: use mask to get v9 interrupt sq data bits correctly
drm/amdgpu: Allocate coredump memory in a nonblocking way
drm/amdgpu: Support query ecc cap for aqua_vanjaram
drm/amdgpu: Add umc_info v4_0 structure
drm/amd/display: always switch off ODM before committing more streams
drm/amd/display: Remove wait while locked
drm/amd/display: update blank state on ODM changes
drm/amd/display: Add smu write msg id fail retry process
drm/amdgpu: Add SMU v13.0.6 default reset methods
...
Linus Torvalds [Fri, 8 Sep 2023 01:33:07 +0000 (18:33 -0700)]
Merge tag 'net-6.6-rc1' of git://git./linux/kernel/git/netdev/net
Pull networking updates from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- eth: stmmac: fix failure to probe without MAC interface specified
Current release - new code bugs:
- docs: netlink: fix missing classic_netlink doc reference
Previous releases - regressions:
- deal with integer overflows in kmalloc_reserve()
- use sk_forward_alloc_get() in sk_get_meminfo()
- bpf_sk_storage: fix the missing uncharge in sk_omem_alloc
- fib: avoid warn splat in flow dissector after packet mangling
- skb_segment: call zero copy functions before using skbuff frags
- eth: sfc: check for zero length in EF10 RX prefix
Previous releases - always broken:
- af_unix: fix msg_controllen test in scm_pidfd_recv() for
MSG_CMSG_COMPAT
- xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()
- netfilter:
- nft_exthdr: fix non-linear header modification
- xt_u32, xt_sctp: validate user space input
- nftables: exthdr: fix 4-byte stack OOB write
- nfnetlink_osf: avoid OOB read
- one more fix for the garbage collection work from last release
- igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
- bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t
- handshake: fix null-deref in handshake_nl_done_doit()
- ip: ignore dst hint for multipath routes to ensure packets are
hashed across the nexthops
- phy: micrel:
- correct bit assignments for cable test errata
- disable EEE according to the KSZ9477 errata
Misc:
- docs/bpf: document compile-once-run-everywhere (CO-RE) relocations
- Revert "net: macsec: preserve ingress frame ordering", it appears
to have been developed against an older kernel, problem doesn't
exist upstream"
* tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
Revert "net: team: do not use dynamic lockdep key"
net: hns3: remove GSO partial feature bit
net: hns3: fix the port information display when sfp is absent
net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
net: hns3: fix debugfs concurrency issue between kfree buffer and read
net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
net: hns3: Support query tx timeout threshold by debugfs
net: hns3: fix tx timeout issue
net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
s390/bpf: Pass through tail call counter in trampolines
...
Linus Torvalds [Fri, 8 Sep 2023 01:16:37 +0000 (18:16 -0700)]
Merge tag 'devicetree-fixes-for-6.6-1' of git://git./linux/kernel/git/robh/linux
Pull more devicetree updates from Rob Herring:
"A couple of conversions which didn't get picked up by the subsystems
and one fix:
- Convert st,stih407-irq-syscfg and Omnivision OV7251 bindings to DT
schema
- Merge Omnivision OV5695 into OV5693 binding
- Fix of_overlay_fdt_apply prototype when !CONFIG_OF_OVERLAY"
* tag 'devicetree-fixes-for-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: irqchip: convert st,stih407-irq-syscfg to DT schema
media: dt-bindings: Convert Omnivision OV7251 to DT schema
media: dt-bindings: Merge OV5695 into OV5693 binding
of: overlay: Fix of_overlay_fdt_apply prototype when !CONFIG_OF_OVERLAY
Linus Torvalds [Fri, 8 Sep 2023 01:05:58 +0000 (18:05 -0700)]
Merge tag 'pwm/for-6.6-rc1' of git://git./linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"Various cleanups and fixes across the board"
* tag 'pwm/for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (31 commits)
pwm: lpc32xx: Remove handling of PWM channels
pwm: atmel: Simplify using devm functions
dt-bindings: pwm: brcm,kona-pwm: convert to YAML
pwm: stmpe: Handle errors when disabling the signal
pwm: stm32: Simplify using devm_pwmchip_add()
pwm: stm32: Don't modify HW state in .remove() callback
pwm: Fix order of freeing resources in pwmchip_remove()
pwm: ntxec: Use device_set_of_node_from_dev()
pwm: ntxec: Drop a write-only variable from driver data
pwm: pxa: Don't reimplement of_device_get_match_data()
pwm: lpc18xx-sct: Simplify using devm_clk_get_enabled()
pwm: atmel-tcb: Don't track polarity in driver data
pwm: atmel-tcb: Unroll atmel_tcb_pwm_set_polarity() into only caller
pwm: atmel-tcb: Put per-channel data into driver data
pwm: atmel-tcb: Fix resource freeing in error path and remove
pwm: atmel-tcb: Harmonize resource allocation order
pwm: Drop unused #include <linux/radix-tree.h>
pwm: rz-mtu3: Fix build warning 'num_channel_ios' not described
pwm: Remove outdated documentation for pwmchip_remove()
pwm: atmel: Enable clk when pwm already enabled in bootloader
...
Dave Airlie [Fri, 8 Sep 2023 00:43:59 +0000 (10:43 +1000)]
Merge tag 'amd-drm-fixes-6.6-2023-09-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-fixes-6.6-2023-09-06:
amdgpu:
- Display replay fixes
- Fixes for headless boards
- Fix documentation breakage
- RAS fixes
- Handle newer IP discovery tables
- SMU 13.0.6 fixes
- SR-IOV fixes
- Display vstartup fixes
- NBIO 7.9 fixes
- Display scaling mode fixes
- Debugfs power reporting fix
- GC 9.4.3 fixes
- Dirty framebuffer fixes for fbcon
- eDP fixes
- DCN 3.1.5 fix
- Display ODM fixes
- GPU core dump fix
- Re-enable zops property now that IGT test is fixed
- Fix possible UAF in CS code
- Cursor degamma fix
amdkfd:
- HMM fixes
- Interrupt masking fix
- GFX11 MQD fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230907033049.7811-1-alexander.deucher@amd.com
Dave Airlie [Fri, 8 Sep 2023 00:35:17 +0000 (10:35 +1000)]
Merge tag 'drm-intel-next-fixes-2023-08-31' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Mark requests for GuC virtual engines to avoid use-after-free (Andrzej).
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZPEGEeP2EwCtx9hM@intel.com
Linus Torvalds [Thu, 7 Sep 2023 23:07:35 +0000 (16:07 -0700)]
Merge tag 'rtc-6.6' of git://git./linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Subsystem:
- Add a way for drivers to tell the core the supported alarm range is
smaller than the date range. This is not used yet but will be
useful for the alarmtimers in the next release.
- fix Wvoid-pointer-to-enum-cast warnings
- remove redundant of_match_ptr()
- stop warning for invalid alarms when the alarm is disabled
Drivers:
- isl12022: allow setting the trip level for battery level detection
- pcf2127: add support for PCF2131 and multiple timestamps
- stm32: time precision improvement, many fixes
- twl: NVRAM support"
* tag 'rtc-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (73 commits)
dt-bindings: rtc: ds3231: Remove text binding
rtc: wm8350: remove unnecessary messages
rtc: twl: remove unnecessary messages
rtc: sun6i: remove unnecessary message
rtc: stop warning for invalid alarms when the alarm is disabled
rtc: twl: add NVRAM support
rtc: pcf85363: Allow to wake up system without IRQ
rtc: m48t86: add DT support for m48t86
dt-bindings: rtc: Add ST M48T86
rtc: pcf2127: remove useless check
rtc: rzn1: Report maximum alarm limit to rtc core
rtc: ds1305: Report maximum alarm limit to rtc core
rtc: tps6586x: Report maximum alarm limit to rtc core
rtc: cmos: Report supported alarm limit to rtc infrastructure
rtc: cros-ec: Detect and report supported alarm window size
rtc: Add support for limited alarm timer offsets
rtc: isl1208: Fix incorrect logic in isl1208_set_xtoscb()
MAINTAINERS: remove obsolete pattern in RTC SUBSYSTEM section
rtc: tps65910: Remove redundant dev_warn() and do not check for 0 return after calling platform_get_irq()
rtc: omap: Do not check for 0 return after calling platform_get_irq()
...
Linus Torvalds [Thu, 7 Sep 2023 22:59:57 +0000 (15:59 -0700)]
Merge tag 'i3c/for-6.6' of git://git./linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"Core:
- Fix SETDASA when static and dynamic adress are equal
- Fix cmd_v1 DAA exit criteria
Drivers:
- svc: allow probing without any device"
* tag 'i3c/for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c: master: svc: fix probe failure when no i3c device exist
i3c: master: Fix SETDASA process
dt-bindings: i3c: Fix description for assigned-address
i3c: master: svc: Describe member 'saved_regs'
i3c: master: svc: Do not check for 0 return after calling platform_get_irq()
i3c/master: cmd_v1: Fix the exit criteria for the daa procedure
i3c: Explicitly include correct DT includes
Linus Torvalds [Thu, 7 Sep 2023 22:51:07 +0000 (15:51 -0700)]
Merge tag 'regulator-fix-v6.6-merge-window' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes that came in during the merge window, both driver
specific - one for a bug that came up in testing, one for a bug due
to a misreading of the datasheet"
* tag 'regulator-fix-v6.6-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: tps6594-regulator: Fix random kernel crash
regulator: tps6287x: Fix n_voltages
Linus Torvalds [Thu, 7 Sep 2023 22:49:20 +0000 (15:49 -0700)]
Merge tag 'spi-fix-v6.6-merge-window' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A couple of fixes for the sun6i driver. The patch to reduce DMA RX to
single byte width all the time is *hopefully* excessively cautious but
it's unclear which SoCs are affected so the fix just covers everything
for safety"
* tag 'spi-fix-v6.6-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: sun6i: fix race between DMA RX transfer completion and RX FIFO drain
spi: sun6i: reduce DMA RX transfer width to single byte
Linus Torvalds [Thu, 7 Sep 2023 20:52:20 +0000 (13:52 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Clean up vCPU targets, always returning generic v8 as the preferred
target
- Trap forwarding infrastructure for nested virtualization (used for
traps that are taken from an L2 guest and are needed by the L1
hypervisor)
- FEAT_TLBIRANGE support to only invalidate specific ranges of
addresses when collapsing a table PTE to a block PTE. This avoids
that the guest refills the TLBs again for addresses that aren't
covered by the table PTE.
- Fix vPMU issues related to handling of PMUver.
- Don't unnecessary align non-stack allocations in the EL2 VA space
- Drop HCR_VIRT_EXCP_MASK, which was never used...
- Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
parameter instead
- Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
- Remove prototypes without implementations
RISC-V:
- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
- Added ONE_REG interface for SATP mode
- Added ONE_REG interface to enable/disable multiple ISA extensions
- Improved error codes returned by ONE_REG interfaces
- Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
- Added get-reg-list selftest for KVM RISC-V
s390:
- PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
- Guest debug fixes (Ilya)
x86:
- Clean up KVM's handling of Intel architectural events
- Intel bugfixes
- Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
debug registers and generate/handle #DBs
- Clean up LBR virtualization code
- Fix a bug where KVM fails to set the target pCPU during an IRTE
update
- Fix fatal bugs in SEV-ES intrahost migration
- Fix a bug where the recent (architecturally correct) change to
reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
skip it)
- Retry APIC map recalculation if a vCPU is added/enabled
- Overhaul emergency reboot code to bring SVM up to par with VMX, tie
the "emergency disabling" behavior to KVM actually being loaded,
and move all of the logic within KVM
- Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
TSC ratio MSR cannot diverge from the default when TSC scaling is
disabled up related code
- Add a framework to allow "caching" feature flags so that KVM can
check if the guest can use a feature without needing to search
guest CPUID
- Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
- Fix KVM's handling of !visible guest roots to avoid premature
triple fault injection
- Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
API surface that is needed by external users (currently only
KVMGT), and fix a variety of issues in the process
Generic:
- Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
events to pass action specific data without needing to constantly
update the main handlers.
- Drop unused function declarations
Selftests:
- Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
- Add support for printf() in guest code and covert all guest asserts
to use printf-based reporting
- Clean up the PMU event filter test and add new testcases
- Include x86 selftests in the KVM x86 MAINTAINERS entry"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
KVM: x86/mmu: Include mmu.h in spte.h
KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
KVM: x86/mmu: Disallow guest from using !visible slots for page tables
KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
KVM: x86/mmu: Harden new PGD against roots without shadow pages
KVM: x86/mmu: Add helper to convert root hpa to shadow page
drm/i915/gvt: Drop final dependencies on KVM internal details
KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
KVM: x86/mmu: Assert that correct locks are held for page write-tracking
KVM: x86/mmu: Rename page-track APIs to reflect the new reality
KVM: x86/mmu: Drop infrastructure for multiple page-track modes
KVM: x86/mmu: Use page-track notifiers iff there are external users
KVM: x86/mmu: Move KVM-only page-track declarations to internal header
KVM: x86: Remove the unused page-track hook track_flush_slot()
drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
KVM: x86: Add a new page-track hook to handle memslot deletion
drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
KVM: x86: Reject memslot MOVE operations if KVMGT is attached
...
Dave Airlie [Thu, 7 Sep 2023 20:36:29 +0000 (06:36 +1000)]
Merge tag 'drm-misc-next-fixes-2023-09-01' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
Short summary of fixes pull:
* ivpu: Replace strncpy
* nouveau: Fix fence state in nouveau_fence_emit()
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230901070123.GA6987@linux-uq9g
Vladimir Oltean [Wed, 6 Sep 2023 14:16:09 +0000 (17:16 +0300)]
net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
enetc_psi_create() returns an ERR_PTR() or a valid station interface
pointer, but checking for the non-NULL quality of the return code blurs
that difference away. So if enetc_psi_create() fails, we call
enetc_psi_destroy() when we shouldn't. This will likely result in
crashes, since enetc_psi_create() cleans up everything after itself when
it returns an ERR_PTR().
Fixes: f0168042a212 ("net: enetc: reimplement RFS/RSS memory clearing as PCI quirk")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/netdev/582183ef-e03b-402b-8e2d-6d9bb3c83bd9@moroto.mountain/
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20230906141609.247579-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 7 Sep 2023 18:01:04 +0000 (11:01 -0700)]
Revert "net: team: do not use dynamic lockdep key"
This reverts commit
39285e124edbc752331e98ace37cc141a6a3747a.
Looks like the change has unintended consequences in exposing
objects before they are initialized. Let's drop this patch
and try again in net-next.
Reported-by: syzbot+44ae022028805f4600fc@syzkaller.appspotmail.com
Fixes: 39285e124edb ("net: team: do not use dynamic lockdep key")
Link: https://lore.kernel.org/all/20230907103124.6adb7256@kernel.org/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 7 Sep 2023 17:52:13 +0000 (10:52 -0700)]
Merge tag 's390-6.6-2' of git://git./linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- A couple of virtual vs physical address confusion fixes
- Rework locking in dcssblk driver to address a lockdep warning
- Remove support for "noexec" kernel command line option since there is
no use case where it would make sense
- Simplify kernel mapping setup and get rid of quite a bit of code
- Add architecture specific __set_memory_yy() functions which allow us
to modify kernel mappings. Unlike the set_memory_xx() variants they
take void pointer start and end parameters, which allows using them
without the usual casts, and also to use them on areas larger than
8TB.
Note that the set_memory_xx() family comes with an int num_pages
parameter which overflows with 8TB. This could be addressed by
changing the num_pages parameter to unsigned long, however requires
to change all architectures, since the module code expects an int
parameter (see module_set_memory()).
This was indeed an issue since for debug_pagealloc() we call
set_memory_4k() on the whole identity mapping. Therefore address this
for now with the __set_memory_yy() variant, and address common code
later
- Use dev_set_name() and also fix memory leak in zcrypt driver error
handling
- Remove unused lsi_mask from airq_struct
- Add warning for invalid kernel mapping requests
* tag 's390-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vmem: do not silently ignore mapping limit
s390/zcrypt: utilize dev_set_name() ability to use a formatted string
s390/zcrypt: don't leak memory if dev_set_name() fails
s390/mm: fix MAX_DMA_ADDRESS physical vs virtual confusion
s390/airq: remove lsi_mask from airq_struct
s390/mm: use __set_memory() variants where useful
s390/set_memory: add __set_memory() variant
s390/set_memory: generate all set_memory() functions
s390/mm: improve description of mapping permissions of prefix pages
s390/amode31: change type of __samode31, __eamode31, etc
s390/mm: simplify kernel mapping setup
s390: remove "noexec" option
s390/vmem: fix virtual vs physical address confusion
s390/dcssblk: fix lockdep warning
s390/monreader: fix virtual vs physical address confusion
Linus Torvalds [Thu, 7 Sep 2023 17:35:14 +0000 (10:35 -0700)]
Merge tag 'mips_6.6' of git://git./linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
"Just cleanups and fixes"
* tag 'mips_6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: TXx9: Do PCI error checks on own line
arch/mips/configs/*_defconfig cleanup
MIPS: VDSO: Conditionally export __vdso_gettimeofday()
Mips: loongson3_defconfig: Enable ast drm driver by default
mips: remove <asm/export.h>
mips: replace #include <asm/export.h> with #include <linux/export.h>
mips: remove unneeded #include <asm/export.h>
MIPS: Loongson64: Fix more __iomem attributes
MIPS: loongson32: Remove regs-rtc.h
MIPS: loongson32: Remove regs-clk.h
MIPS: More explicit DT include clean-ups
MIPS: Fixup explicit DT include clean-up
Revert MIPS: Loongson: Fix build error when make modules_install
MIPS: Only fiddle with CHECKFLAGS if `need-compiler'
MIPS: Fix CONFIG_CPU_DADDI_WORKAROUNDS `modules_install' regression
MIPS: Explicitly include correct DT includes
Linus Torvalds [Thu, 7 Sep 2023 17:30:17 +0000 (10:30 -0700)]
Merge tag 'xtensa-
20230905' of https://github.com/jcmvbkbc/linux-xtensa
Pull xtensa updates from Max Filippov:
- enable MTD XIP support
- fix base address of the xtensa perf module in newer hardware
* tag 'xtensa-
20230905' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: add XIP-aware MTD support
xtensa: PMU: fix base address for the newer hardware
Christian Brauner [Thu, 7 Sep 2023 16:03:40 +0000 (18:03 +0200)]
ntfs3: drop inode references in ntfs_put_super()
Recently we moved most cleanup from ntfs_put_super() into
ntfs3_kill_sb() as part of a bigger cleanup. This accidently also moved
dropping inode references stashed in ntfs3's sb->s_fs_info from
@sb->put_super() to @sb->kill_sb(). But generic_shutdown_super()
verifies that there are no busy inodes past sb->put_super(). Fix this
and disentangle dropping inode references from freeing @sb->s_fs_info.
Fixes: a4f64a300a29 ("ntfs3: free the sbi in ->kill_sb") # mainline only
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 3 Sep 2023 20:08:03 +0000 (13:08 -0700)]
vfs: mostly undo glibc turning 'fstat()' into 'fstatat(AT_EMPTY_PATH)'
Mateusz reports that glibc turns 'fstat()' calls into 'fstatat()', and
that seems to have been going on for quite a long time due to glibc
having tried to simplify its stat logic into just one point.
This turns out to cause completely unnecessary overhead, where we then
go off and allocate the kernel side pathname, and actually look up the
empty path. Sure, our path lookup is quite optimized, but it still
causes a fair bit of allocation overhead and a couple of completely
unnecessary rounds of lockref accesses etc.
This is all hopefully getting fixed in user space, and there is a patch
floating around for just having glibc use the native fstat() system
call. But even with the current situation we can at least improve on
things by catching the situation and short-circuiting it.
Note that this is still measurably slower than just a plain 'fstat()',
since just checking that the filename is actually empty is somewhat
expensive due to inevitable user space access overhead from the kernel
(ie verifying pointers, and SMAP on x86). But it's still quite a bit
faster than actually looking up the path for real.
To quote numers from Mateusz:
"Sapphire Rapids, will-it-scale, ops/s
stock fstat
5088199
patched fstat
7625244 (+49%)
real fstat
8540383 (+67% / +12%)"
where that 'stock fstat' is the glibc translation of fstat into
fstatat() with an empty path, the 'patched fstat' is with this short
circuiting of the path lookup, and the 'real fstat' is the actual native
fstat() system call with none of this overhead.
Link: https://lore.kernel.org/lkml/20230903204858.lv7i3kqvw6eamhgz@f/
Reported-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Tue, 5 Sep 2023 08:19:02 +0000 (10:19 +0200)]
Revert "printk: export symbols for debug modules"
This reverts commit
3e00123a13d824d63072b1824c9da59cd78356d9.
No, we never export random symbols for out of tree modules.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230905081902.321778-1-hch@lst.de
Takashi Iwai [Thu, 7 Sep 2023 12:10:36 +0000 (14:10 +0200)]
Merge tag 'asoc-fix-v6.6-merge-window' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.6
A bunch of fixes and new IDs that came in since the initial pull request
- all driver specific and nothing too exciting.
There's a trivial conflict in the AMD driver ID table due to the last
v6.5 fixes not having been merged up.
Paolo Abeni [Thu, 7 Sep 2023 09:47:15 +0000 (11:47 +0200)]
Merge tag 'nf-23-09-06' of https://git./linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter updates for net
This PR contains nf_tables updates for your *net* tree.
This time almost all fixes are for old bugs:
First patch fixes a 4-byte stack OOB write, from myself.
This was broken ever since nftables was switches from 128 to 32bit
register addressing in v4.1.
2nd patch fixes an out-of-bounds read.
This has been broken ever since xt_osf got added in 2.6.31, the bug
was then just moved around during refactoring, from Wander Lairson Costa.
3rd patch adds a missing enum description, from Phil Sutter.
4th patch fixes a UaF inftables that occurs when userspace adds
elements with a timeout so small that expiration happens while the
transaction is still in progress. Fix from Pablo Neira Ayuso.
Patch 5 fixes a memory out of bounds access, this was
broken since v4.20. Patch from Kyle Zeng and Jozsef Kadlecsik.
Patch 6 fixes another bogus memory access when building audit
record. Bug added in the previous pull request, fix from Pablo.
netfilter pull request 2023-09-06
* tag 'nf-23-09-06' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
====================
Link: https://lore.kernel.org/r/20230906162525.11079-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Will Deacon [Thu, 7 Sep 2023 08:54:11 +0000 (09:54 +0100)]
arm64: csum: Fix OoB access in IP checksum code for negative lengths
Although commit
c2c24edb1d9c ("arm64: csum: Fix pathological zero-length
calls") added an early return for zero-length input, syzkaller has
popped up with an example of a _negative_ length which causes an
undefined shift and an out-of-bounds read:
| BUG: KASAN: slab-out-of-bounds in do_csum+0x44/0x254 arch/arm64/lib/csum.c:39
| Read of size
4294966928 at addr
ffff0000d7ac0170 by task syz-executor412/5975
|
| CPU: 0 PID: 5975 Comm: syz-executor412 Not tainted
6.4.0-rc4-syzkaller-g908f31f2a05b #0
| Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
| Call trace:
| dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:233
| show_stack+0x2c/0x44 arch/arm64/kernel/stacktrace.c:240
| __dump_stack lib/dump_stack.c:88 [inline]
| dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
| print_address_description mm/kasan/report.c:351 [inline]
| print_report+0x174/0x514 mm/kasan/report.c:462
| kasan_report+0xd4/0x130 mm/kasan/report.c:572
| kasan_check_range+0x264/0x2a4 mm/kasan/generic.c:187
| __kasan_check_read+0x20/0x30 mm/kasan/shadow.c:31
| do_csum+0x44/0x254 arch/arm64/lib/csum.c:39
| csum_partial+0x30/0x58 lib/checksum.c:128
| gso_make_checksum include/linux/skbuff.h:4928 [inline]
| __udp_gso_segment+0xaf4/0x1bc4 net/ipv4/udp_offload.c:332
| udp6_ufo_fragment+0x540/0xca0 net/ipv6/udp_offload.c:47
| ipv6_gso_segment+0x5cc/0x1760 net/ipv6/ip6_offload.c:119
| skb_mac_gso_segment+0x2b4/0x5b0 net/core/gro.c:141
| __skb_gso_segment+0x250/0x3d0 net/core/dev.c:3401
| skb_gso_segment include/linux/netdevice.h:4859 [inline]
| validate_xmit_skb+0x364/0xdbc net/core/dev.c:3659
| validate_xmit_skb_list+0x94/0x130 net/core/dev.c:3709
| sch_direct_xmit+0xe8/0x548 net/sched/sch_generic.c:327
| __dev_xmit_skb net/core/dev.c:3805 [inline]
| __dev_queue_xmit+0x147c/0x3318 net/core/dev.c:4210
| dev_queue_xmit include/linux/netdevice.h:3085 [inline]
| packet_xmit+0x6c/0x318 net/packet/af_packet.c:276
| packet_snd net/packet/af_packet.c:3081 [inline]
| packet_sendmsg+0x376c/0x4c98 net/packet/af_packet.c:3113
| sock_sendmsg_nosec net/socket.c:724 [inline]
| sock_sendmsg net/socket.c:747 [inline]
| __sys_sendto+0x3b4/0x538 net/socket.c:2144
Extend the early return to reject negative lengths as well, aligning our
implementation with the generic code in lib/checksum.c
Cc: Robin Murphy <robin.murphy@arm.com>
Fixes: 5777eaed566a ("arm64: Implement optimised checksum routine")
Reported-by: syzbot+4a9f9820bd8d302e22f7@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000e0e94c0603f8d213@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Paolo Abeni [Thu, 7 Sep 2023 09:08:05 +0000 (11:08 +0200)]
Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'
Jijie Shao says:
====================
There are some bugfix for the HNS3 ethernet driver
====================
Link: https://lore.kernel.org/r/20230906072018.3020671-1-shaojijie@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jie Wang [Wed, 6 Sep 2023 07:20:18 +0000 (15:20 +0800)]
net: hns3: remove GSO partial feature bit
HNS3 NIC does not support GSO partial packets segmentation. Actually tunnel
packets for example NvGRE packets segment offload and checksum offload is
already supported. There is no need to keep gso partial feature bit. So
this patch removes it.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Yisen Zhuang [Wed, 6 Sep 2023 07:20:17 +0000 (15:20 +0800)]
net: hns3: fix the port information display when sfp is absent
When sfp is absent or unidentified, the port type should be
displayed as PORT_OTHERS, rather than PORT_FIBRE.
Fixes: 88d10bd6f730 ("net: hns3: add support for multiple media type")
Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Wed, 6 Sep 2023 07:20:16 +0000 (15:20 +0800)]
net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
We hope that tc qdisc and dcb ets commands can not be used crosswise.
If we want to use any of the commands to configure tc,
We must use the other command to clear the existing configuration.
However, when we configure a single tc with tc qdisc,
we can still configure it with dcb ets.
Because we use mqprio_active as the tag of tc qdisc configuration,
but with dcb ets, we do not check mqprio_active.
This patch fix this issue by check mqprio_active before
executing the dcb ets command. and add dcb_ets_active to
replace HCLGE_FLAG_DCB_ENABLE and HCLGE_FLAG_MQPRIO_ENABLE
at the hclge layer,
Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hao Chen [Wed, 6 Sep 2023 07:20:15 +0000 (15:20 +0800)]
net: hns3: fix debugfs concurrency issue between kfree buffer and read
Now in hns3_dbg_uninit(), there may be concurrency between
kfree buffer and read, it may result in memory error.
Moving debugfs_remove_recursive() in front of kfree buffer to ensure
they don't happen at the same time.
Fixes: 5e69ea7ee2a6 ("net: hns3: refactor the debugfs process")
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Hao Chen [Wed, 6 Sep 2023 07:20:14 +0000 (15:20 +0800)]
net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
req1->tcam_data is defined as "u8 tcam_data[8]", and we convert it as
(u32 *) without considerring byte order conversion,
it may result in printing wrong data for tcam_data.
Convert tcam_data to (__le32 *) first to fix it.
Fixes: b5a0b70d77b9 ("net: hns3: refactor dump fd tcam of debugfs")
Signed-off-by: Hao Chen <chenhao418@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jijie Shao [Wed, 6 Sep 2023 07:20:13 +0000 (15:20 +0800)]
net: hns3: Support query tx timeout threshold by debugfs
support query tx timeout threshold by debugfs
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jian Shen [Wed, 6 Sep 2023 07:20:12 +0000 (15:20 +0800)]
net: hns3: fix tx timeout issue
Currently, the driver knocks the ring doorbell before updating
the ring->last_to_use in tx flow. if the hardware transmiting
packet and napi poll scheduling are fast enough, it may get
the old ring->last_to_use in drivers' napi poll.
In this case, the driver will think the tx is not completed, and
return directly without clear the flag __QUEUE_STATE_STACK_XOFF,
which may cause tx timeout.
Fixes: 20d06ca2679c ("net: hns3: optimize the tx clean process")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Kailang Yang [Wed, 6 Sep 2023 08:50:41 +0000 (16:50 +0800)]
ALSA: hda/realtek - ALC287 I2S speaker platform support
0x17 was only speaker pin, DAC assigned will be 0x03. Headphone
assigned to 0x02.
Playback via headphone will get EQ filter processing. So,it needs to
swap DAC.
Tested-by: Mark Pearson <mpearson@lenovo.com>
Signed-off-by: Kailang Yang <kailang@realtek.com>
Link: https://lore.kernel.org/r/4e4cfa1b3b4c46838aecafc6e8b6f876@realtek.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Huacai Chen [Thu, 7 Sep 2023 04:06:20 +0000 (12:06 +0800)]
LoongArch: Update Loongson-3 default config file
1, Enable LSX and LASX.
2, Enable KASLR (CONFIG_RANDOMIZE_BASE).
3, Enable jump label (patching mechanism for static key).
4, Enable LoongArch CRC32(c) Acceleration.
5, Enable Loongson-specific drivers: I2C/RTC/DRM/SOC/CLK/PINCTRL/GPIO/SPI.
6, Enable EXFAT/NTFS3/JFS/GFS2/OCFS2/UBIFS/EROFS/CEPH file systems.
7, Enable WangXun NGBE/TXGBE NIC drivers.
8, Enable some IPVS options.
9, Remove CONFIG_SYSFS_DEPRECATED since it is removed in Kconfig.
10, Remove CONFIG_IP_NF_TARGET_CLUSTERIP since it is removed in Kconfig.
11, Remove CONFIG_NFT_OBJREF since it is removed in Kconfig.
12, Remove CONFIG_R8188EU since it is replaced by CONFIG_RTL8XXXU.
Signed-off-by: Trevor Woerner <twoerner@gmail.com>
Signed-off-by: Xuewen Wang <wangxuewen@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Lukasz Majewski [Tue, 5 Sep 2023 09:33:15 +0000 (11:33 +0200)]
net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
The KSZ9477 errata points out (in 'Module 4') the link up/down problems
when EEE (Energy Efficient Ethernet) is enabled in the device to which
the KSZ9477 tries to auto negotiate.
The suggested workaround is to clear advertisement of EEE for PHYs in
this chip driver.
To avoid regressions with other switch ICs the new MICREL_NO_EEE flag
has been introduced.
Moreover, the in-register disablement of MMD_DEVICE_ID_EEE_ADV.MMD_EEE_ADV
MMD register is removed, as this code is both; now executed too late
(after previous rework of the PHY and DSA for KSZ switches) and not
required as setting all members of eee_broken_modes bit field prevents
the KSZ9477 from advertising EEE.
Fixes: 69d3b36ca045 ("net: dsa: microchip: enable EEE support") # for KSZ9477
Signed-off-by: Lukasz Majewski <lukma@denx.de>
Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # Confirmed disabled EEE with oscilloscope.
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20230905093315.784052-1-lukma@denx.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lijo Lazar [Mon, 4 Sep 2023 12:45:13 +0000 (18:15 +0530)]
drm/amdgpu: Restrict bootloader wait to SMUv13.0.6
Restrict the wait for boot loader steady state only to SMUv13.0.6. For
older SOCs, ASIC init has a longer wait period and that takes care.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hamza Mahfooz [Tue, 5 Sep 2023 17:27:22 +0000 (13:27 -0400)]
drm/amd/display: prevent potential division by zero errors
There are two places in apply_below_the_range() where it's possible for
a divide by zero error to occur. So, to fix this make sure the divisor
is non-zero before attempting the computation in both cases.
Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2637
Fixes: a463b263032f ("drm/amd/display: Fix frames_to_insert math")
Fixes: ded6119e825a ("drm/amd/display: Reinstate LFC optimization")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Melissa Wen [Thu, 31 Aug 2023 16:12:28 +0000 (15:12 -0100)]
drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma
For DRM legacy gamma, AMD display manager applies implicit sRGB degamma
using a pre-defined sRGB transfer function. It works fine for DCN2
family where degamma ROM and custom curves go to the same color block.
But, on DCN3+, degamma is split into two blocks: degamma ROM for
pre-defined TFs and `gamma correction` for user/custom curves and
degamma ROM settings doesn't apply to cursor plane. To get DRM legacy
gamma working as expected, enable cursor degamma ROM for implict sRGB
degamma on HW with this configuration.
Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2803
Fixes: 96b020e2163f ("drm/amd/display: check attr flag before set cursor degamma on DCN3+")
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hamza Mahfooz [Thu, 31 Aug 2023 19:22:35 +0000 (15:22 -0400)]
drm/amd/display: limit the v_startup workaround to ASICs older than DCN3.1
Since, calling dcn20_adjust_freesync_v_startup() on DCN3.1+ ASICs
can cause the display to flicker and underflow to occur, we shouldn't
call it for them. So, ensure that the DCN version is less than
DCN_VERSION_3_1 before calling dcn20_adjust_freesync_v_startup().
Cc: stable@vger.kernel.org
Reviewed-by: Fangzhi Zuo <jerry.zuo@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Hamza Mahfooz [Thu, 31 Aug 2023 19:17:14 +0000 (15:17 -0400)]
Revert "drm/amd/display: Remove v_startup workaround for dcn3+"
This reverts commit
3a31e8b89b7240d9a17ace8a1ed050bdcb560f9e.
We still need to call dcn20_adjust_freesync_v_startup() for older DCN3+
ASICs. Otherwise, it can cause DP to HDMI 2.1 PCONs to fail to light up.
Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2809
Reviewed-by: Fangzhi Zuo <jerry.zuo@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jakub Kicinski [Thu, 7 Sep 2023 01:43:05 +0000 (18:43 -0700)]
Merge tag 'for-netdev' of https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-09-06
We've added 9 non-merge commits during the last 6 day(s) which contain
a total of 12 files changed, 189 insertions(+), 44 deletions(-).
The main changes are:
1) Fix bpf_sk_storage to address an invalid wait context lockdep
report and another one to address missing omem uncharge,
from Martin KaFai Lau.
2) Two BPF recursion detection related fixes,
from Sebastian Andrzej Siewior.
3) Fix tailcall limit enforcement in trampolines for s390 JIT,
from Ilya Leoshkevich.
4) Fix a sockmap refcount race where skbs in sk_psock_backlog can
be referenced after user space side has already skb_consumed them,
from John Fastabend.
5) Fix BPF CI flake/race wrt sockmap vsock write test where
the transport endpoint is not connected, from Xu Kuohai.
6) Follow-up doc fix to address a cross-link warning,
from Eduard Zingerman.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
s390/bpf: Pass through tail call counter in trampolines
bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().
bpf, sockmap: Fix skb refcnt race after locking changes
docs/bpf: Fix "file doesn't exist" warnings in {llvm_reloc,btf}.rst
selftests/bpf: Fix a CI failure caused by vsock write
====================
Link: https://lore.kernel.org/r/20230906095117.16941-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 6 Sep 2023 19:10:15 +0000 (12:10 -0700)]
Merge tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"Mixed with some fixes and cleanups, this brings in reasonably complete
fscrypt support to CephFS! The list of things which don't work with
encryption should be fairly short, mostly around the edges: fallocate
(not supported well in CephFS to begin with), copy_file_range
(requires re-encryption), non-default striping patterns.
This was a multi-year effort principally by Jeff Layton with
assistance from Xiubo Li, Luís Henriques and others, including several
dependant changes in the MDS, netfs helper library and fscrypt
framework itself"
* tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits)
ceph: make num_fwd and num_retry to __u32
ceph: make members in struct ceph_mds_request_args_ext a union
rbd: use list_for_each_entry() helper
libceph: do not include crypto/algapi.h
ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper
ceph: fix updating i_truncate_pagecache_size for fscrypt
ceph: wait for OSD requests' callbacks to finish when unmounting
ceph: drop messages from MDS when unmounting
ceph: update documentation regarding snapshot naming limitations
ceph: prevent snapshot creation in encrypted locked directories
ceph: add support for encrypted snapshot names
ceph: invalidate pages when doing direct/sync writes
ceph: plumb in decryption during reads
ceph: add encryption support to writepage and writepages
ceph: add read/modify/write to ceph_sync_write
ceph: align data in pages in ceph_sync_write
ceph: don't use special DIO path for encrypted inodes
ceph: add truncate size handling support for fscrypt
ceph: add object version support for sync read
libceph: allow ceph_osdc_new_request to accept a multi-op read
...
Mario Limonciello [Wed, 6 Sep 2023 18:22:57 +0000 (13:22 -0500)]
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82TL
Lenovo 82TL has DMIC connected like 82V2 does. Also match
82TL.
Reported-by: wildjim@kiwinet.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217063
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20230906182257.45736-1-mario.limonciello@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Will Deacon [Wed, 6 Sep 2023 18:15:37 +0000 (19:15 +0100)]
arm64/sysreg: Fix broken strncpy() -> strscpy() conversion
Mostafa reports that commit
d232606773a0 ("arm64/sysreg: refactor
deprecated strncpy") breaks our early command-line parsing because the
original code is working on space-delimited substrings rather than
NUL-terminated strings.
Rather than simply reverting the broken conversion patch, replace the
strscpy() with a simple memcpy() with an explicit NUL-termination of the
result.
Reported-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Mostafa Saleh <smostafa@google.com>
Fixes: d232606773a0 ("arm64/sysreg: refactor deprecated strncpy")
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20230905-strncpy-arch-arm64-v4-1-bc4b14ddfaef@google.com
Link: https://lore.kernel.org/r/20230831162227.2307863-1-smostafa@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Linus Torvalds [Wed, 6 Sep 2023 16:24:25 +0000 (09:24 -0700)]
Merge tag 'input-for-v6.6-rc0' of git://git./linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a new driver for Azoteq IQS7210A/7211A/E touch controllers
- support for Azoteq IQS7222D variant added to iqs7222 driver
- support for touch keys functionality added to Melfas MMS114 driver
- new hardware IDs added to exc3000 and Goodix drivers
- xpad driver gained support for GameSir T4 Kaleid Controller
- a fix for xpad driver to properly support some third-party
controllers that need a magic packet to start properly
- a fix for psmouse driver to more reliably switch to RMI4 mode on
devices that use native RMI4/SMbus protocol
- a quirk for i8042 for TUXEDO Gemini 17 Gen1/Clevo PD70PN laptops
- multiple drivers have been updated to make use of devm and other
newer APIs such as dev_err_probe(), devm_regulator_get_enable(), and
others.
* tag 'input-for-v6.6-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (83 commits)
Input: goodix - add support for ACPI ID GDX9110
Input: rpckbd - fix the return value handle for platform_get_irq()
Input: tca6416-keypad - switch to using input core's polling features
Input: tca6416-keypad - convert to use devm_* api
Input: tca6416-keypad - fix interrupt enable disbalance
Input: tca6416-keypad - rely on I2C core to set up suspend/resume
Input: tca6416-keypad - always expect proper IRQ number in i2c client
Input: lm8323 - convert to use devm_* api
Input: lm8323 - rely on device core to create kp_disable attribute
Input: qt2160 - convert to use devm_* api
Input: qt2160 - do not hard code interrupt trigger
Input: qt2160 - switch to using threaded interrupt handler
Input: qt2160 - tweak check for i2c adapter functionality
Input: psmouse - add delay when deactivating for SMBus mode
Input: mcs-touchkey - fix uninitialized use of error in mcs_touchkey_probe()
Input: qt1070 - convert to use devm_* api
Input: mcs-touchkey - convert to use devm_* api
Input: amikbd - convert to use devm_* api
Input: lm8333 - convert to use devm_* api
Input: mms114 - add support for touch keys
...
Linus Torvalds [Wed, 6 Sep 2023 16:19:12 +0000 (09:19 -0700)]
Merge tag 'linux-watchdog-6.6-rc1' of git://linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add marvell GTI watchdog driver
- add support for Amlogic-T7 SoCs
- document the IPQ5018 watchdog compatible
- enable COMPILE_TEST for more watchdog device drivers
- core: stop watchdog when executing poweroff command
- other small improvements and fixes
* tag 'linux-watchdog-6.6-rc1' of git://www.linux-watchdog.org/linux-watchdog: (21 commits)
watchdog: Add support for Amlogic-T7 SoCs
watchdog: Add a new struct for Amlogic-GXBB driver
dt-bindings: watchdog: Add support for Amlogic-T7 SoCs
dt-bindings: watchdog: qcom-wdt: document IPQ5018
watchdog: imx2_wdt: Improve dev_crit() message
watchdog: stm32: Drop unnecessary of_match_ptr()
watchdog: sama5d4: readout initial state
watchdog: intel-mid_wdt: add MODULE_ALIAS() to allow auto-load
watchdog: core: stop watchdog when executing poweroff command
watchdog: pm8916_wdt: Remove redundant of_match_ptr()
watchdog: xilinx_wwdt: Use div_u64() in xilinx_wwdt_start()
watchdog: starfive: Remove #ifdef guards for PM related functions
watchdog: s3c2410: Fix potential deadlock on &wdt->lock
watchdog:rit_wdt: Add support for WDIOF_CARDRESET
dt-bindings: watchdog: ti,rti-wdt: Add support for WDIOF_CARDRESET
watchdog: Enable COMPILE_TEST for more drivers
watchdog: advantech_ec_wdt: fix Kconfig dependencies
watchdog: Explicitly include correct DT includes
Watchdog: Add marvell GTI watchdog driver
dt-bindings: watchdog: marvell GTI system watchdog driver
...
Pablo Neira Ayuso [Wed, 6 Sep 2023 09:42:02 +0000 (11:42 +0200)]
netfilter: nf_tables: Unbreak audit log reset
Deliver audit log from __nf_tables_dump_rules(), table dereference at
the end of the table list loop might point to the list head, leading to
this crash.
[ 4137.407349] BUG: unable to handle page fault for address:
00000000001f3c50
[ 4137.407357] #PF: supervisor read access in kernel mode
[ 4137.407359] #PF: error_code(0x0000) - not-present page
[ 4137.407360] PGD 0 P4D 0
[ 4137.407363] Oops: 0000 [#1] PREEMPT SMP PTI
[ 4137.407365] CPU: 4 PID: 500177 Comm: nft Not tainted 6.5.0+ #277
[ 4137.407369] RIP: 0010:string+0x49/0xd0
[ 4137.407374] Code: ff 77 36 45 89 d1 31 f6 49 01 f9 66 45 85 d2 75 19 eb 1e 49 39 f8 76 02 88 07 48 83 c7 01 83 c6 01 48 83 c2 01 4c 39 cf 74 07 <0f> b6 02 84 c0 75 e2 4c 89 c2 e9 58 e5 ff ff 48 c7 c0 0e b2 ff 81
[ 4137.407377] RSP: 0018:
ffff8881179737f0 EFLAGS:
00010286
[ 4137.407379] RAX:
00000000001f2c50 RBX:
ffff888117973848 RCX:
ffff0a00ffffff04
[ 4137.407380] RDX:
00000000001f3c50 RSI:
0000000000000000 RDI:
0000000000000000
[ 4137.407381] RBP:
0000000000000000 R08:
0000000000000000 R09:
00000000ffffffff
[ 4137.407383] R10:
ffffffffffffffff R11:
ffff88813584d200 R12:
0000000000000000
[ 4137.407384] R13:
ffffffffa15cf709 R14:
0000000000000000 R15:
ffffffffa15cf709
[ 4137.407385] FS:
00007fcfc18bb580(0000) GS:
ffff88840e700000(0000) knlGS:
0000000000000000
[ 4137.407387] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 4137.407388] CR2:
00000000001f3c50 CR3:
00000001055b2001 CR4:
00000000001706e0
[ 4137.407390] Call Trace:
[ 4137.407392] <TASK>
[ 4137.407393] ? __die+0x1b/0x60
[ 4137.407397] ? page_fault_oops+0x6b/0xa0
[ 4137.407399] ? exc_page_fault+0x60/0x120
[ 4137.407403] ? asm_exc_page_fault+0x22/0x30
[ 4137.407408] ? string+0x49/0xd0
[ 4137.407410] vsnprintf+0x257/0x4f0
[ 4137.407414] kvasprintf+0x3e/0xb0
[ 4137.407417] kasprintf+0x3e/0x50
[ 4137.407419] nf_tables_dump_rules+0x1c0/0x360 [nf_tables]
[ 4137.407439] ? __alloc_skb+0xc3/0x170
[ 4137.407442] netlink_dump+0x170/0x330
[ 4137.407447] __netlink_dump_start+0x227/0x300
[ 4137.407449] nf_tables_getrule+0x205/0x390 [nf_tables]
Deliver audit log only once at the end of the rule dump+reset for
consistency with the set dump+reset.
Ensure audit reset access to table under rcu read side lock. The table
list iteration holds rcu read lock side, but recent audit code
dereferences table object out of the rcu read lock side.
Fixes: ea078ae9108e ("netfilter: nf_tables: Audit log rule reset")
Fixes: 7e9be1124dbe ("netfilter: nf_tables: Audit log setelem reset")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Kyle Zeng [Tue, 5 Sep 2023 22:04:09 +0000 (15:04 -0700)]
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
The missing IP_SET_HASH_WITH_NET0 macro in ip_set_hash_netportnet can
lead to the use of wrong `CIDR_POS(c)` for calculating array offsets,
which can lead to integer underflow. As a result, it leads to slab
out-of-bound access.
This patch adds back the IP_SET_HASH_WITH_NET0 macro to
ip_set_hash_netportnet to address the issue.
Fixes: 886503f34d63 ("netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net")
Suggested-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Kyle Zeng <zengyhkyle@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Pablo Neira Ayuso [Mon, 4 Sep 2023 00:14:36 +0000 (02:14 +0200)]
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
New elements in this transaction might expired before such transaction
ends. Skip sync GC for such elements otherwise commit path might walk
over an already released object. Once transaction is finished, async GC
will collect such expired element.
Fixes: f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Phil Sutter [Fri, 1 Sep 2023 12:15:16 +0000 (14:15 +0200)]
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
Add a brief description to the enum's comment.
Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
Wander Lairson Costa [Fri, 1 Sep 2023 13:50:20 +0000 (10:50 -0300)]
netfilter: nfnetlink_osf: avoid OOB read
The opt_num field is controlled by user mode and is not currently
validated inside the kernel. An attacker can take advantage of this to
trigger an OOB read and potentially leak information.
BUG: KASAN: slab-out-of-bounds in nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
Read of size 2 at addr
ffff88804bc64272 by task poc/6431
CPU: 1 PID: 6431 Comm: poc Not tainted 6.0.0-rc4 #1
Call Trace:
nf_osf_match_one+0xbed/0xd10 net/netfilter/nfnetlink_osf.c:88
nf_osf_find+0x186/0x2f0 net/netfilter/nfnetlink_osf.c:281
nft_osf_eval+0x37f/0x590 net/netfilter/nft_osf.c:47
expr_call_ops_eval net/netfilter/nf_tables_core.c:214
nft_do_chain+0x2b0/0x1490 net/netfilter/nf_tables_core.c:264
nft_do_chain_ipv4+0x17c/0x1f0 net/netfilter/nft_chain_filter.c:23
[..]
Also add validation to genre, subtype and version fields.
Fixes: 11eeef41d5f6 ("netfilter: passive OS fingerprint xtables match")
Reported-by: Lucas Leong <wmliang@infosec.exchange>
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Florian Westphal [Tue, 5 Sep 2023 21:13:56 +0000 (23:13 +0200)]
netfilter: nftables: exthdr: fix 4-byte stack OOB write
If priv->len is a multiple of 4, then dst[len / 4] can write past
the destination array which leads to stack corruption.
This construct is necessary to clean the remainder of the register
in case ->len is NOT a multiple of the register size, so make it
conditional just like nft_payload.c does.
The bug was added in 4.1 cycle and then copied/inherited when
tcp/sctp and ip option support was added.
Bug reported by Zero Day Initiative project (ZDI-CAN-21950,
ZDI-CAN-21951, ZDI-CAN-21961).
Fixes: 49499c3e6e18 ("netfilter: nf_tables: switch registers to 32 bit addressing")
Fixes: 935b7f643018 ("netfilter: nft_exthdr: add TCP option matching")
Fixes: 133dc203d77d ("netfilter: nft_exthdr: Support SCTP chunks")
Fixes: dbb5281a1f84 ("netfilter: nf_tables: add support for matching IPv4 options")
Signed-off-by: Florian Westphal <fw@strlen.de>
Linus Torvalds [Wed, 6 Sep 2023 16:00:37 +0000 (09:00 -0700)]
Merge tag 'backlight-next-6.6' of git://git./linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"New Functionality:
- Ensure correct includes are present and remove some that are not
required
- Drop redundant of_match_ptr() call to cast pointer to NULL
Bug Fixes:
- Revert to old (expected) behaviour of initialising PWM state on
first brightness change
- Correctly handle / propagate errors
- Fix 'sometimes-uninitialised' issues"
* tag 'backlight-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: led_bl: Remove redundant of_match_ptr()
backlight: lp855x: Drop ret variable in brightness change function
backlight: gpio_backlight: Drop output GPIO direction check for initial power state
backlight: lp855x: Catch errors when changing brightness
backlight: lp855x: Initialize PWM state on first brightness change
backlight: qcom-wled: Explicitly include correct DT includes
Qing Zhang [Wed, 6 Sep 2023 14:54:16 +0000 (22:54 +0800)]
LoongArch: Add KASAN (Kernel Address Sanitizer) support
1/8 of kernel addresses reserved for shadow memory. But for LoongArch,
There are a lot of holes between different segments and valid address
space (256T available) is insufficient to map all these segments to kasan
shadow memory with the common formula provided by kasan core, saying
(addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET
So LoongArch has a arch-specific mapping formula, different segments are
mapped individually, and only limited space lengths of these specific
segments are mapped to shadow.
At early boot stage the whole shadow region populated with just one
physical page (kasan_early_shadow_page). Later, this page is reused as
readonly zero shadow for some memory that kasan currently don't track.
After mapping the physical memory, pages for shadow memory are allocated
and mapped.
Functions like memset()/memcpy()/memmove() do a lot of memory accesses.
If bad pointer passed to one of these function it is important to be
caught. Compiler's instrumentation cannot do this since these functions
are written in assembly.
KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions in
mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in names, so we could call non-instrumented variant
if needed.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Qing Zhang [Wed, 6 Sep 2023 14:54:16 +0000 (22:54 +0800)]
LoongArch: Simplify the processing of jumping new kernel for KASLR
Modified relocate_kernel() doesn't return new kernel's entry point but
the random_offset. In this way we share the start_kernel() processing
with the normal kernel, which avoids calling 'jr a0' directly and allows
some other operations (e.g, kasan_early_init) before start_kernel() when
KASLR (CONFIG_RANDOMIZE_BASE) is turned on.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Qing Zhang [Wed, 6 Sep 2023 14:54:16 +0000 (22:54 +0800)]
kasan: Add (pmd|pud)_init for LoongArch zero_(pud|p4d)_populate process
LoongArch populates pmd/pud with invalid_pmd_table/invalid_pud_table in
pagetable_init, So pmd_init/pud_init(p) is required, define them as __weak
in mm/kasan/init.c, like mm/sparse-vmemmap.c.
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Qing Zhang [Wed, 6 Sep 2023 14:54:16 +0000 (22:54 +0800)]
kasan: Add __HAVE_ARCH_SHADOW_MAP to support arch specific mapping
MIPS, LoongArch and some other architectures have many holes between
different segments and the valid address space (256T available) is
insufficient to map all these segments to kasan shadow memory with the
common formula provided by kasan core. So we need architecture specific
mapping formulas to ensure different segments are mapped individually,
and only limited space lengths of those specific segments are mapped to
shadow.
Therefore, when the incoming address is converted to a shadow, we need
to add a condition to determine whether it is valid.
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Qing Zhang <zhangqing@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Enze Li [Wed, 6 Sep 2023 14:54:16 +0000 (22:54 +0800)]
LoongArch: Add KFENCE (Kernel Electric-Fence) support
The LoongArch architecture is quite different from other architectures.
When the allocating of KFENCE itself is done, it is mapped to the direct
mapping configuration window [1] by default on LoongArch. It means that
it is not possible to use the page table mapped mode which required by
the KFENCE system and therefore it should be remapped to the appropriate
region.
This patch adds architecture specific implementation details for KFENCE.
In particular, this implements the required interface in <asm/kfence.h>.
Tested this patch by running the testcases and all passed.
[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#virtual-address-space-and-address-translation-mode
Signed-off-by: Enze Li <lienze@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Enze Li [Wed, 6 Sep 2023 14:54:16 +0000 (22:54 +0800)]
LoongArch: Get partial stack information when providing regs parameter
Currently, arch_stack_walk() can only get the full stack information
including NMI. This is because the implementation of arch_stack_walk()
is forced to ignore the information passed by the regs parameter and use
the current stack information instead.
For some detection systems like KFENCE, only partial stack information
is needed. In particular, the stack frame where the interrupt occurred.
To support KFENCE, this patch modifies the implementation of the
arch_stack_walk() function so that if this function is called with the
regs argument passed, it retains all the stack information in regs and
uses it to provide accurate information.
Before this patch:
[ 1.531195 ] ==================================================================
[ 1.531442 ] BUG: KFENCE: out-of-bounds read in stack_trace_save_regs+0x48/0x6c
[ 1.531442 ]
[ 1.531900 ] Out-of-bounds read at 0xffff800012267fff (1B left of kfence-#12):
[ 1.532046 ] stack_trace_save_regs+0x48/0x6c
[ 1.532169 ] kfence_report_error+0xa4/0x528
[ 1.532276 ] kfence_handle_page_fault+0x124/0x270
[ 1.532388 ] no_context+0x50/0x94
[ 1.532453 ] do_page_fault+0x1a8/0x36c
[ 1.532524 ] tlb_do_page_fault_0+0x118/0x1b4
[ 1.532623 ] test_out_of_bounds_read+0xa0/0x1d8
[ 1.532745 ] kunit_generic_run_threadfn_adapter+0x1c/0x28
[ 1.532854 ] kthread+0x124/0x130
[ 1.532922 ] ret_from_kernel_thread+0xc/0xa4
<snip>
After this patch:
[ 1.320220 ] ==================================================================
[ 1.320401 ] BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa8/0x1d8
[ 1.320401 ]
[ 1.320898 ] Out-of-bounds read at 0xffff800012257fff (1B left of kfence-#10):
[ 1.321134 ] test_out_of_bounds_read+0xa8/0x1d8
[ 1.321264 ] kunit_generic_run_threadfn_adapter+0x1c/0x28
[ 1.321392 ] kthread+0x124/0x130
[ 1.321459 ] ret_from_kernel_thread+0xc/0xa4
<snip>
Suggested-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Enze Li <lienze@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Enze Li [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
LoongArch: mm: Add page table mapped mode support for virt_to_page()
According to LoongArch documentations, there are two types of address
translation modes: direct mapped address translation mode (DMW mode) and
page table mapped address translation mode (TLB mode).
Currently, virt_to_page() only supports direct mapped mode. This patch
determines which mode is used, and adds corresponding handling functions
for both modes.
For more details on the two modes, see [1].
[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#virtual-address-space-and-address-translation-mode
Signed-off-by: Enze Li <lienze@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Enze Li [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
kfence: Defer the assignment of the local variable addr
The LoongArch architecture is different from other architectures. It
needs to update __kfence_pool during arch_kfence_init_pool().
This patch modifies the assignment location of the local variable addr
in the kfence_init_pool() function to support the case of updating
__kfence_pool in arch_kfence_init_pool().
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Enze Li <lienze@kylinos.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Feiyang Chen [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
LoongArch: Allow building with kcov coverage
Add ARCH_HAS_KCOV and HAVE_GCC_PLUGINS to the LoongArch Kconfig. And
also disable instrumentation of vdso.
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Feiyang Chen [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
LoongArch: Provide kaslr_offset() to get kernel offset
Provide kaslr_offset() to get the kernel offset when KASLR is enabled.
Signed-off-by: Feiyang Chen <chenfeiyang@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Qing Zhang [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
LoongArch: Add basic KGDB & KDB support
KGDB is intended to be used as a source level debugger for the Linux
kernel. It is used along with gdb to debug a Linux kernel. GDB can be
used to "break in" to the kernel to inspect memory, variables and regs
similar to the way an application developer would use GDB to debug an
application. KDB is a frontend of KGDB which is similar to GDB.
By now, in addition to the generic KGDB features, the LoongArch KGDB
implements the following features:
- Hardware breakpoints/watchpoints;
- Software single-step support for KDB.
Signed-off-by: Qing Zhang <zhangqing@loongson.cn> # Framework & CoreFeature
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> # BreakPoint & SingleStep
Signed-off-by: Hui Li <lihui@loongson.cn> # Some Minor Improvements
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> # Some Build Error Fixes
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Qi Hu [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
LoongArch: Add Loongson Binary Translation (LBT) extension support
Loongson Binary Translation (LBT) is used to accelerate binary translation,
which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags)
and x87 fpu stack pointer (ftop).
This patch support kernel to save/restore these registers, handle the LBT
exception and maintain sigcontext.
Signed-off-by: Qi Hu <huqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
raid6: Add LoongArch SIMD recovery implementation
Similar to the syndrome calculation, the recovery algorithms also work
on 64 bytes at a time to align with the L1 cache line size of current
and future LoongArch cores (that we care about). Which means
unrolled-by-4 LSX and unrolled-by-2 LASX code.
The assembly is originally based on the x86 SSSE3/AVX2 ports, but
register allocation has been redone to take advantage of LSX/LASX's 32
vector registers, and instruction sequence has been optimized to suit
(e.g. LoongArch can perform per-byte srl and andi on vectors, but x86
cannot).
Performance numbers measured by instrumenting the raid6test code, on a
3A5000 system clocked at 2.5GHz:
> lasx 2data: 354.987 MiB/s
> lasx datap: 350.430 MiB/s
> lsx 2data: 340.026 MiB/s
> lsx datap: 337.318 MiB/s
> intx1 2data: 164.280 MiB/s
> intx1 datap: 187.966 MiB/s
Because recovery algorithms are chosen solely based on priority and
availability, lasx is marked as priority 2 and lsx priority 1. At least
for the current generation of LoongArch micro-architectures, LASX should
always be faster than LSX whenever supported, and have similar power
consumption characteristics (because the only known LASX-capable uarch,
the LA464, always compute the full 256-bit result for vector ops).
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
raid6: Add LoongArch SIMD syndrome calculation
The algorithms work on 64 bytes at a time, which is the L1 cache line
size of all current and future LoongArch cores (that we care about), as
confirmed by Huacai. The code is based on the generic int.uc algorithm,
unrolled 4 times for LSX and 2 times for LASX. Further unrolling does
not meaningfully improve the performance according to experiments.
Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:
> raid6: lasx gen() 12726 MB/s
> raid6: lsx gen() 10001 MB/s
> raid6: int64x8 gen() 2876 MB/s
> raid6: int64x4 gen() 3867 MB/s
> raid6: int64x2 gen() 2531 MB/s
> raid6: int64x1 gen() 1945 MB/s
Comparison of xor() speeds (from different boots but meaningful anyway):
> lasx: 11226 MB/s
> lsx: 6395 MB/s
> int64x4: 2147 MB/s
Performance as measured by raid6test:
> raid6: lasx gen() 25109 MB/s
> raid6: lsx gen() 13233 MB/s
> raid6: int64x8 gen() 4164 MB/s
> raid6: int64x4 gen() 6005 MB/s
> raid6: int64x2 gen() 5781 MB/s
> raid6: int64x1 gen() 4119 MB/s
> raid6: using algorithm lasx gen() 25109 MB/s
> raid6: .... xor() 14439 MB/s, rmw enabled
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
LoongArch: Add SIMD-optimized XOR routines
Add LSX and LASX implementations of xor operations, operating on 64
bytes (one L1 cache line) at a time, for a balance between memory
utilization and instruction mix. Huacai confirmed that all future
LoongArch implementations by Loongson (that we care) will likely also
feature 64-byte cache lines, and experiments show no throughput
improvement with further unrolling.
Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:
> 8regs : 12702 MB/sec
> 8regs_prefetch : 10920 MB/sec
> 32regs : 12686 MB/sec
> 32regs_prefetch : 10918 MB/sec
> lsx : 17589 MB/sec
> lasx : 26116 MB/sec
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Wed, 6 Sep 2023 14:53:55 +0000 (22:53 +0800)]
LoongArch: Allow usage of LSX/LASX in the kernel
Allow usage of LSX/LASX in the kernel by extending kernel_fpu_begin()
and kernel_fpu_end().
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Tiezhu Yang [Wed, 6 Sep 2023 14:53:10 +0000 (22:53 +0800)]
LoongArch: Define symbol 'fault' as a local label in fpu.S
The initial aim is to silence the following objtool warnings:
arch/loongarch/kernel/fpu.o: warning: objtool: _save_fp_context() falls through to next function fault()
arch/loongarch/kernel/fpu.o: warning: objtool: _restore_fp_context() falls through to next function fault()
arch/loongarch/kernel/fpu.o: warning: objtool: _save_lsx_context() falls through to next function fault()
arch/loongarch/kernel/fpu.o: warning: objtool: _restore_lsx_context() falls through to next function fault()
arch/loongarch/kernel/fpu.o: warning: objtool: _save_lasx_context() falls through to next function fault()
arch/loongarch/kernel/fpu.o: warning: objtool: _restore_lasx_context() falls through to next function fault()
Currently, SYM_FUNC_START()/SYM_FUNC_END() defines the symbol 'fault' as
SYM_T_FUNC which is STT_FUNC, the objtool warnings are generated through
the following code:
tools/objtool/include/objtool/check.h:
static inline struct symbol *insn_func(struct instruction *insn)
{
struct symbol *sym = insn->sym;
if (sym && sym->type != STT_FUNC)
sym = NULL;
return sym;
}
tools/objtool/check.c:
static int validate_branch(struct objtool_file *file, struct symbol *func,
struct instruction *insn, struct insn_state state)
{
...
if (func && insn_func(insn) && func != insn_func(insn)->pfunc) {
...
WARN("%s() falls through to next function %s()",
func->name, insn_func(insn)->name);
return 1;
}
...
}
We can see that the fixup can be a local label in the following code:
arch/loongarch/include/asm/asm-extable.h:
.pushsection __ex_table, "a"; \
.balign 4; \
.long ((insn) - .); \
.long ((fixup) - .); \
.short (type); \
.short (data); \
.popsection;
.macro _asm_extable, insn, fixup
__ASM_EXTABLE_RAW(\insn, \fixup, EX_TYPE_FIXUP, 0)
.endm
Like arch/loongarch/lib/*.S, just define the symbol 'fault' as a local
label in fpu.S.
Before:
$ readelf -s arch/loongarch/kernel/fpu.o | awk -F: /fault/'{print $2}'
000000000000053c 8 FUNC GLOBAL DEFAULT 1 fault
After:
$ readelf -s arch/loongarch/kernel/fpu.o | awk -F: /fault/'{print $2}'
000000000000053c 0 NOTYPE LOCAL DEFAULT 1 .L_fpu_fault
Co-developed-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Weihao Li [Wed, 6 Sep 2023 14:53:10 +0000 (22:53 +0800)]
LoongArch: Adjust {copy, clear}_user exception handler behavior
The {copy, clear}_user function should returns number of bytes that
could not be {copied, cleared}. So, try to {copy, clear} byte by byte
when ld.{d,w,h} and st.{d,w,h} trapped into an exception.
Reviewed-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Weihao Li <liweihao@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Wed, 6 Sep 2023 14:53:10 +0000 (22:53 +0800)]
LoongArch: Use static defined zero page rather than allocated
On LoongArch system, there is only one page needed for zero page (no
cache synonyms), and there is no COLOR_ZERO_PAGE, so zero_page_mask is
useless and the macro __HAVE_COLOR_ZERO_PAGE is not necessary.
Like other popular architectures, It is simpler to define the zero page
in kernel BSS code segment rather than dynamically allocate.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Wed, 6 Sep 2023 14:53:09 +0000 (22:53 +0800)]
LoongArch: mm: Introduce unified function populate_kernel_pte()
Function pcpu_populate_pte() and fixmap_pte() are similar, they populate
one page from kernel address space. And there is confusion between pgd
and p4d in the function fixmap_pte(), such as pgd_none() always returns
zero. This patch introduces a unified function populate_kernel_pte() and
then replaces pcpu_populate_pte() and fixmap_pte().
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Wed, 6 Sep 2023 14:53:09 +0000 (22:53 +0800)]
LoongArch: Code improvements in function pcpu_populate_pte()
Do some code improvements in function pcpu_populate_pte():
1. Add memory allocation failure handling;
2. Replace pgd_populate() with p4d_populate(), it will be useful if
there are four-level page tables.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Wed, 6 Sep 2023 14:53:09 +0000 (22:53 +0800)]
LoongArch: Remove shm_align_mask and use SHMLBA instead
Both shm_align_mask and SHMLBA want to avoid cache alias. But they are
inconsistent: shm_align_mask is (PAGE_SIZE - 1) while SHMLBA is SZ_64K,
but PAGE_SIZE is not always equal to SZ_64K.
This may cause problems when shmat() twice. Fix this problem by removing
shm_align_mask and using SHMLBA (strictly SHMLBA - 1) instead.
Reported-by: Jiantao Shan <shanjiantao@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Hongchen Zhang [Wed, 6 Sep 2023 14:53:09 +0000 (22:53 +0800)]
LoongArch: mm: Add p?d_leaf() definitions
When I do LTP test, LTP test case ksm06 caused panic at
break_ksm_pmd_entry
-> pmd_leaf (Huge page table but False)
-> pte_present (panic)
The reason is pmd_leaf() is not defined, So like commit
501b81046701
("mips: mm: add p?d_leaf() definitions") add p?d_leaf() definition for
LoongArch.
Fixes: 09cfefb7fa70 ("LoongArch: Add memory management")
Cc: stable@vger.kernel.org
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Nathan Chancellor [Wed, 6 Sep 2023 14:53:09 +0000 (22:53 +0800)]
LoongArch: Drop unused parse_r and parse_v macros
When building with CONFIG_LTO_CLANG_FULL, there are several errors due
to the way that parse_r is defined with an __asm__ statement in a
header:
ld.lld: error: ld-temp.o <inline asm>:105:1: macro 'parse_r' is already defined
.macro parse_r var r
^
This was an issue for arch/mips as well, which was resolved by commit
67512a8cf5a7 ("MIPS: Avoid macro redefinitions").
However, parse_r is unused in arch/loongarch after commit
83d8b38967d2
("LoongArch: Simplify the invtlb wrappers"), so doing the same change
does not make much sense now. Just remove parse_r (and parse_v, which
is also unused) to resolve the redefinition error. If it needs to be
brought back due to an actual use, it should be brought back with the
same changes as the aforementioned arch/mips commit.
Closes: https://github.com/ClangBuiltLinux/linux/issues/1924
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Martin KaFai Lau [Fri, 1 Sep 2023 23:11:29 +0000 (16:11 -0700)]
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
This patch checks the sk_omem_alloc has been uncharged by bpf_sk_storage
during the __sk_destruct.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230901231129.578493-4-martin.lau@linux.dev
Martin KaFai Lau [Fri, 1 Sep 2023 23:11:28 +0000 (16:11 -0700)]
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
The commit
c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions
for reuse"), refactored the bpf_{sk,task,inode}_storage_free() into
bpf_local_storage_unlink_nolock() which then later renamed to
bpf_local_storage_destroy(). The commit accidentally passed the
"bool uncharge_mem = false" argument to bpf_selem_unlink_storage_nolock()
which then stopped the uncharge from happening to the sk->sk_omem_alloc.
This missing uncharge only happens when the sk is going away (during
__sk_destruct).
This patch fixes it by always passing "uncharge_mem = true". It is a
noop to the task/inode/cgroup storage because they do not have the
map_local_storage_(un)charge enabled in the map_ops. A followup patch
will be done in bpf-next to remove the uncharge_mem argument.
A selftest is added in the next patch.
Fixes: c83597fa5dc6 ("bpf: Refactor some inode/task/sk storage functions for reuse")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230901231129.578493-3-martin.lau@linux.dev
Martin KaFai Lau [Fri, 1 Sep 2023 23:11:27 +0000 (16:11 -0700)]
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
'./test_progs -t test_local_storage' reported a splat:
[ 27.137569] =============================
[ 27.138122] [ BUG: Invalid wait context ]
[ 27.138650]
6.5.0-03980-gd11ae1b16b0a #247 Tainted: G O
[ 27.139542] -----------------------------
[ 27.140106] test_progs/1729 is trying to lock:
[ 27.140713]
ffff8883ef047b88 (stock_lock){-.-.}-{3:3}, at: local_lock_acquire+0x9/0x130
[ 27.141834] other info that might help us debug this:
[ 27.142437] context-{5:5}
[ 27.142856] 2 locks held by test_progs/1729:
[ 27.143352] #0:
ffffffff84bcd9c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x40
[ 27.144492] #1:
ffff888107deb2c0 (&storage->lock){..-.}-{2:2}, at: bpf_local_storage_update+0x39e/0x8e0
[ 27.145855] stack backtrace:
[ 27.146274] CPU: 0 PID: 1729 Comm: test_progs Tainted: G O
6.5.0-03980-gd11ae1b16b0a #247
[ 27.147550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 27.149127] Call Trace:
[ 27.149490] <TASK>
[ 27.149867] dump_stack_lvl+0x130/0x1d0
[ 27.152609] dump_stack+0x14/0x20
[ 27.153131] __lock_acquire+0x1657/0x2220
[ 27.153677] lock_acquire+0x1b8/0x510
[ 27.157908] local_lock_acquire+0x29/0x130
[ 27.159048] obj_cgroup_charge+0xf4/0x3c0
[ 27.160794] slab_pre_alloc_hook+0x28e/0x2b0
[ 27.161931] __kmem_cache_alloc_node+0x51/0x210
[ 27.163557] __kmalloc+0xaa/0x210
[ 27.164593] bpf_map_kzalloc+0xbc/0x170
[ 27.165147] bpf_selem_alloc+0x130/0x510
[ 27.166295] bpf_local_storage_update+0x5aa/0x8e0
[ 27.167042] bpf_fd_sk_storage_update_elem+0xdb/0x1a0
[ 27.169199] bpf_map_update_value+0x415/0x4f0
[ 27.169871] map_update_elem+0x413/0x550
[ 27.170330] __sys_bpf+0x5e9/0x640
[ 27.174065] __x64_sys_bpf+0x80/0x90
[ 27.174568] do_syscall_64+0x48/0xa0
[ 27.175201] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 27.175932] RIP: 0033:0x7effb40e41ad
[ 27.176357] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d8
[ 27.179028] RSP: 002b:
00007ffe64c21fc8 EFLAGS:
00000202 ORIG_RAX:
0000000000000141
[ 27.180088] RAX:
ffffffffffffffda RBX:
00007ffe64c22768 RCX:
00007effb40e41ad
[ 27.181082] RDX:
0000000000000020 RSI:
00007ffe64c22008 RDI:
0000000000000002
[ 27.182030] RBP:
00007ffe64c21ff0 R08:
0000000000000000 R09:
00007ffe64c22788
[ 27.183038] R10:
0000000000000064 R11:
0000000000000202 R12:
0000000000000000
[ 27.184006] R13:
00007ffe64c22788 R14:
00007effb42a1000 R15:
0000000000000000
[ 27.184958] </TASK>
It complains about acquiring a local_lock while holding a raw_spin_lock.
It means it should not allocate memory while holding a raw_spin_lock
since it is not safe for RT.
raw_spin_lock is needed because bpf_local_storage supports tracing
context. In particular for task local storage, it is easy to
get a "current" task PTR_TO_BTF_ID in tracing bpf prog.
However, task (and cgroup) local storage has already been moved to
bpf mem allocator which can be used after raw_spin_lock.
The splat is for the sk storage. For sk (and inode) storage,
it has not been moved to bpf mem allocator. Using raw_spin_lock or not,
kzalloc(GFP_ATOMIC) could theoretically be unsafe in tracing context.
However, the local storage helper requires a verifier accepted
sk pointer (PTR_TO_BTF_ID), it is hypothetical if that (mean running
a bpf prog in a kzalloc unsafe context and also able to hold a verifier
accepted sk pointer) could happen.
This patch avoids kzalloc after raw_spin_lock to silent the splat.
There is an existing kzalloc before the raw_spin_lock. At that point,
a kzalloc is very likely required because a lookup has just been done
before. Thus, this patch always does the kzalloc before acquiring
the raw_spin_lock and remove the later kzalloc usage after the
raw_spin_lock. After this change, it will have a charge and then
uncharge during the syscall bpf_map_update_elem() code path.
This patch opts for simplicity and not continue the old
optimization to save one charge and uncharge.
This issue is dated back to the very first commit of bpf_sk_storage
which had been refactored multiple times to create task, inode, and
cgroup storage. This patch uses a Fixes tag with a more recent
commit that should be easier to do backport.
Fixes: b00fa38a9c1c ("bpf: Enable non-atomic allocations in local storage")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230901231129.578493-2-martin.lau@linux.dev
Ilya Leoshkevich [Wed, 6 Sep 2023 00:44:19 +0000 (02:44 +0200)]
s390/bpf: Pass through tail call counter in trampolines
s390x eBPF programs use the following extension to the s390x calling
convention: tail call counter is passed on stack at offset
STK_OFF_TCCNT, which callees otherwise use as scratch space.
Currently trampoline does not respect this and clobbers tail call
counter. This breaks enforcing tail call limits in eBPF programs, which
have trampolines attached to them.
Fix by forwarding a copy of the tail call counter to the original eBPF
program in the trampoline (for fexit), and by restoring it at the end
of the trampoline (for fentry).
Fixes: 528eb2cb87bc ("s390/bpf: Implement arch_prepare_bpf_trampoline()")
Reported-by: Leon Hwang <hffilwlqm@gmail.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230906004448.111674-1-iii@linux.ibm.com
Sebastian Andrzej Siewior [Wed, 30 Aug 2023 08:04:05 +0000 (10:04 +0200)]
bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
__bpf_prog_enter_recur() assigns bpf_tramp_run_ctx::saved_run_ctx before
performing the recursion check which means in case of a recursion
__bpf_prog_exit_recur() uses the previously set bpf_tramp_run_ctx::saved_run_ctx
value.
__bpf_prog_enter_sleepable_recur() assigns bpf_tramp_run_ctx::saved_run_ctx
after the recursion check which means in case of a recursion
__bpf_prog_exit_sleepable_recur() uses an uninitialized value. This does not
look right. If I read the entry trampoline code right, then bpf_tramp_run_ctx
isn't initialized upfront.
Align __bpf_prog_enter_sleepable_recur() with __bpf_prog_enter_recur() and
set bpf_tramp_run_ctx::saved_run_ctx before the recursion check is made.
Remove the assignment of saved_run_ctx in kern_sys_bpf() since it happens
a few cycles later.
Fixes: e384c7b7b46d0 ("bpf, x86: Create bpf_tramp_run_ctx on the caller thread's stack")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230830080405.251926-3-bigeasy@linutronix.de
Sebastian Andrzej Siewior [Wed, 30 Aug 2023 08:04:04 +0000 (10:04 +0200)]
bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().
If __bpf_prog_enter_sleepable_recur() detects recursion then it returns
0 without undoing rcu_read_lock_trace(), migrate_disable() or
decrementing the recursion counter. This is fine in the JIT case because
the JIT code will jump in the 0 case to the end and invoke the matching
exit trampoline (__bpf_prog_exit_sleepable_recur()).
This is not the case in kern_sys_bpf() which returns directly to the
caller with an error code.
Add __bpf_prog_exit_sleepable_recur() as clean up in the recursion case.
Fixes: b1d18a7574d0d ("bpf: Extend sys_bpf commands for bpf_syscall programs.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20230830080405.251926-2-bigeasy@linutronix.de
Jakub Kicinski [Tue, 5 Sep 2023 23:42:02 +0000 (16:42 -0700)]
net: phylink: fix sphinx complaint about invalid literal
sphinx complains about the use of "%PHYLINK_PCS_NEG_*":
Documentation/networking/kapi:144: ./include/linux/phylink.h:601: WARNING: Inline literal start-string without end-string.
Documentation/networking/kapi:144: ./include/linux/phylink.h:633: WARNING: Inline literal start-string without end-string.
These are not valid symbols so drop the '%' prefix.
Alternatively we could use %PHYLINK_PCS_NEG_\* (escape the *)
or use normal literal ``PHYLINK_PCS_NEG_*`` but there is already
a handful of un-adorned DEFINE_* in this file.
Fixes: f99d471afa03 ("net: phylink: add PCS negotiation mode")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/all/20230626162908.2f149f98@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 6 Sep 2023 05:23:05 +0000 (06:23 +0100)]
Merge branch 'sja1105-fixes'
Vladimir Oltean says:
====================
tc-cbs offload fixes for SJA1105 DSA
Yanan Yang has pointed out to me that certain tc-cbs offloaded
configurations do not appear to do any shaping on the LS1021A-TSN board
(SJA1105T).
This is due to an apparent documentation error that also made its way
into the driver, which patch 1/3 now fixes.
While investigating and then testing, I've found 2 more bugs, which are
patches 2/3 and 3/3.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 5 Sep 2023 21:53:38 +0000 (00:53 +0300)]
net: dsa: sja1105: complete tc-cbs offload support on SJA1110
The blamed commit left this delta behind:
struct sja1105_cbs_entry {
- u64 port;
- u64 prio;
+ u64 port; /* Not used for SJA1110 */
+ u64 prio; /* Not used for SJA1110 */
u64 credit_hi;
u64 credit_lo;
u64 send_slope;
u64 idle_slope;
};
but did not actually implement tc-cbs offload fully for the new switch.
The offload is accepted, but it doesn't work.
The difference compared to earlier switch generations is that now, the
table of CBS shapers is sparse, because there are many more shapers, so
the mapping between a {port, prio} and a table index is static, rather
than requiring us to store the port and prio into the sja1105_cbs_entry.
So, the problem is that the code programs the CBS shaper parameters at a
dynamic table index which is incorrect.
All that needs to be done for SJA1110 CBS shapers to work is to bypass
the logic which allocates shapers in a dense manner, as for SJA1105, and
use the fixed mapping instead.
Fixes: 3e77e59bf8cf ("net: dsa: sja1105: add support for the SJA1110 switch family")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 5 Sep 2023 21:53:37 +0000 (00:53 +0300)]
net: dsa: sja1105: fix -ENOSPC when replacing the same tc-cbs too many times
After running command [2] too many times in a row:
[1] $ tc qdisc add dev sw2p0 root handle 1: mqprio num_tc 8 \
map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
[2] $ tc qdisc replace dev sw2p0 parent 1:1 cbs offload 1 \
idleslope 120000 sendslope -880000 locredit -1320 hicredit 180
(aka more than priv->info->num_cbs_shapers times)
we start seeing the following error message:
Error: Specified device failed to setup cbs hardware offload.
This comes from the fact that ndo_setup_tc(TC_SETUP_QDISC_CBS) presents
the same API for the qdisc create and replace cases, and the sja1105
driver fails to distinguish between the 2. Thus, it always thinks that
it must allocate the same shaper for a {port, queue} pair, when it may
instead have to replace an existing one.
Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 5 Sep 2023 21:53:36 +0000 (00:53 +0300)]
net: dsa: sja1105: fix bandwidth discrepancy between tc-cbs software and offload
More careful measurement of the tc-cbs bandwidth shows that the stream
bandwidth (effectively idleslope) increases, there is a larger and
larger discrepancy between the rate limit obtained by the software
Qdisc, and the rate limit obtained by its offloaded counterpart.
The discrepancy becomes so large, that e.g. at an idleslope of 40000
(40Mbps), the offloaded cbs does not actually rate limit anything, and
traffic will pass at line rate through a 100 Mbps port.
The reason for the discrepancy is that the hardware documentation I've
been following is incorrect. UM11040.pdf (for SJA1105P/Q/R/S) states
about IDLE_SLOPE that it is "the rate (in unit of bytes/sec) at which
the credit counter is increased".
Cross-checking with UM10944.pdf (for SJA1105E/T) and UM11107.pdf
(for SJA1110), the wording is different: "This field specifies the
value, in bytes per second times link speed, by which the credit counter
is increased".
So there's an extra scaling for link speed that the driver is currently
not accounting for, and apparently (empirically), that link speed is
expressed in Kbps.
I've pondered whether to pollute the sja1105_mac_link_up()
implementation with CBS shaper reprogramming, but I don't think it is
worth it. IMO, the UAPI exposed by tc-cbs requires user space to
recalculate the sendslope anyway, since the formula for that depends on
port_transmit_rate (see man tc-cbs), which is not an invariant from tc's
perspective.
So we use the offload->sendslope and offload->idleslope to deduce the
original port_transmit_rate from the CBS formula, and use that value to
scale the offload->sendslope and offload->idleslope to values that the
hardware understands.
Some numerical data points:
40Mbps stream, max interfering frame size 1500, port speed 100M
---------------------------------------------------------------
tc-cbs parameters:
idleslope 40000 sendslope -60000 locredit -900 hicredit 600
which result in hardware values:
Before (doesn't work) After (works)
credit_hi 600 600
credit_lo 900 900
send_slope
7500000 75
idle_slope
5000000 50
40Mbps stream, max interfering frame size 1500, port speed 1G
-------------------------------------------------------------
tc-cbs parameters:
idleslope 40000 sendslope -960000 locredit -1440 hicredit 60
which result in hardware values:
Before (doesn't work) After (works)
credit_hi 60 60
credit_lo 1440 1440
send_slope
120000000 120
idle_slope
5000000 5
5.12Mbps stream, max interfering frame size 1522, port speed 100M
-----------------------------------------------------------------
tc-cbs parameters:
idleslope 5120 sendslope -94880 locredit -1444 hicredit 77
which result in hardware values:
Before (doesn't work) After (works)
credit_hi 77 77
credit_lo 1444 1444
send_slope
11860000 118
idle_slope 640000 6
Tested on SJA1105T, SJA1105S and SJA1110A, at 1Gbps and 100Mbps.
Fixes: 4d7525085a9b ("net: dsa: sja1105: offload the Credit-Based Shaper qdisc")
Reported-by: Yanan Yang <yanan.yang@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 6 Sep 2023 05:20:33 +0000 (06:20 +0100)]
Merge branch '1GbE' of git://git./linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Change MIN_TXD and MIN_RXD to allow set rx/tx value between 64 and 80
Olga Zaborska says:
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igb, igbvf and igc devices can use as low as 64
descriptors.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bodong Wang [Tue, 5 Sep 2023 17:48:46 +0000 (10:48 -0700)]
mlx5/core: E-Switch, Create ACL FT for eswitch manager in switchdev mode
ACL flow table is required in switchdev mode when metadata is enabled,
driver creates such table when loading each vport. However, not every
vport is loaded in switchdev mode. Such as ECPF if it's the eswitch manager.
In this case, ACL flow table is still needed.
To make it modularized, create ACL flow table for eswitch manager as
default and skip such operations when loading manager vport.
Also, there is no need to load the eswitch manager vport in switchdev mode.
This means there is no need to load it on regular connect-x HCAs where
the PF is the eswitch manager. This will avoid creating duplicate ACL
flow table for host PF vport.
Fixes: 29bcb6e4fe70 ("net/mlx5e: E-Switch, Use metadata for vport matching in send-to-vport rules")
Fixes: eb8e9fae0a22 ("mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager")
Fixes: 5019833d661f ("net/mlx5: E-switch, Introduce helper function to enable/disable vports")
Signed-off-by: Bodong Wang <bodong@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jianbo Liu [Tue, 5 Sep 2023 17:48:45 +0000 (10:48 -0700)]
net/mlx5e: Clear mirred devices array if the rule is split
In the cited commit, the mirred devices are recorded and checked while
parsing the actions. In order to avoid system crash, the duplicate
action in a single rule is not allowed.
But the rule is actually break down into several FTEs in different
tables, for either mirroring, or the specified types of actions which
use post action infrastructure.
It will reject certain action list by mistake, for example:
actions:enp8s0f0_1,set(ipv4(ttl=63)),enp8s0f0_0,enp8s0f0_1.
Here the rule is split to two FTEs because of pedit action.
To fix this issue, when parsing the rule actions, reset if_count to
clear the mirred devices array if the rule is split to multiple
FTEs, and then the duplicate checking is restarted.
Fixes: 554fe75c1b3f ("net/mlx5e: Avoid duplicating rule destinations")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 5 Sep 2023 13:40:46 +0000 (13:40 +0000)]
ip_tunnels: use DEV_STATS_INC()
syzbot/KCSAN reported data-races in iptunnel_xmit_stats() [1]
This can run from multiple cpus without mutual exclusion.
Adopt SMP safe DEV_STATS_INC() to update dev->stats fields.
[1]
BUG: KCSAN: data-race in iptunnel_xmit / iptunnel_xmit
read-write to 0xffff8881353df170 of 8 bytes by task 30263 on cpu 1:
iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline]
iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87
ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662
__netdev_start_xmit include/linux/netdevice.h:4889 [inline]
netdev_start_xmit include/linux/netdevice.h:4903 [inline]
xmit_one net/core/dev.c:3544 [inline]
dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560
__dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
__bpf_tx_skb net/core/filter.c:2129 [inline]
__bpf_redirect_no_mac net/core/filter.c:2159 [inline]
__bpf_redirect+0x723/0x9c0 net/core/filter.c:2182
____bpf_clone_redirect net/core/filter.c:2453 [inline]
bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425
___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954
__bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195
bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline]
__bpf_prog_run include/linux/filter.h:609 [inline]
bpf_prog_run include/linux/filter.h:616 [inline]
bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423
bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045
bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996
__sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353
__do_sys_bpf kernel/bpf/syscall.c:5439 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5437 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read-write to 0xffff8881353df170 of 8 bytes by task 30249 on cpu 0:
iptunnel_xmit_stats include/net/ip_tunnels.h:493 [inline]
iptunnel_xmit+0x432/0x4a0 net/ipv4/ip_tunnel_core.c:87
ip_tunnel_xmit+0x1477/0x1750 net/ipv4/ip_tunnel.c:831
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x516/0x570 net/ipv4/ip_gre.c:662
__netdev_start_xmit include/linux/netdevice.h:4889 [inline]
netdev_start_xmit include/linux/netdevice.h:4903 [inline]
xmit_one net/core/dev.c:3544 [inline]
dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3560
__dev_queue_xmit+0xeee/0x1de0 net/core/dev.c:4340
dev_queue_xmit include/linux/netdevice.h:3082 [inline]
__bpf_tx_skb net/core/filter.c:2129 [inline]
__bpf_redirect_no_mac net/core/filter.c:2159 [inline]
__bpf_redirect+0x723/0x9c0 net/core/filter.c:2182
____bpf_clone_redirect net/core/filter.c:2453 [inline]
bpf_clone_redirect+0x16c/0x1d0 net/core/filter.c:2425
___bpf_prog_run+0xd7d/0x41e0 kernel/bpf/core.c:1954
__bpf_prog_run512+0x74/0xa0 kernel/bpf/core.c:2195
bpf_dispatcher_nop_func include/linux/bpf.h:1181 [inline]
__bpf_prog_run include/linux/filter.h:609 [inline]
bpf_prog_run include/linux/filter.h:616 [inline]
bpf_test_run+0x15d/0x3d0 net/bpf/test_run.c:423
bpf_prog_test_run_skb+0x77b/0xa00 net/bpf/test_run.c:1045
bpf_prog_test_run+0x265/0x3d0 kernel/bpf/syscall.c:3996
__sys_bpf+0x3af/0x780 kernel/bpf/syscall.c:5353
__do_sys_bpf kernel/bpf/syscall.c:5439 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5437 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5437
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0x0000000000018830 -> 0x0000000000018831
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 30249 Comm: syz-executor.4 Not tainted
6.5.0-syzkaller-11704-g3f86ed6ec0b3 #0
Fixes: 039f50629b7f ("ip_tunnel: Move stats update to iptunnel_xmit()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Tue, 5 Sep 2023 08:46:10 +0000 (08:46 +0000)]
net: team: do not use dynamic lockdep key
team interface has used a dynamic lockdep key to avoid false-positive
lockdep deadlock detection. Virtual interfaces such as team usually
have their own lock for protecting private data.
These interfaces can be nested.
team0
|
team1
Each interface's lock is actually different(team0->lock and team1->lock).
So,
mutex_lock(&team0->lock);
mutex_lock(&team1->lock);
mutex_unlock(&team1->lock);
mutex_unlock(&team0->lock);
The above case is absolutely safe. But lockdep warns about deadlock.
Because the lockdep understands these two locks are same. This is a
false-positive lockdep warning.
So, in order to avoid this problem, the team interfaces started to use
dynamic lockdep key. The false-positive problem was fixed, but it
introduced a new problem.
When the new team virtual interface is created, it registers a dynamic
lockdep key(creates dynamic lockdep key) and uses it. But there is the
limitation of the number of lockdep keys.
So, If so many team interfaces are created, it consumes all lockdep keys.
Then, the lockdep stops to work and warns about it.
In order to fix this problem, team interfaces use the subclass instead
of the dynamic key. So, when a new team interface is created, it doesn't
register(create) a new lockdep, but uses existed subclass key instead.
It is already used by the bonding interface for a similar case.
As the bonding interface does, the subclass variable is the same as
the 'dev->nested_level'. This variable indicates the depth in the stacked
interface graph.
The 'dev->nested_level' is protected by RTNL and RCU.
So, 'mutex_lock_nested()' for 'team->lock' requires RTNL or RCU.
In the current code, 'team->lock' is usually acquired under RTNL, there is
no problem with using 'dev->nested_level'.
The 'team_nl_team_get()' and The 'lb_stats_refresh()' functions acquire
'team->lock' without RTNL.
But these don't iterate their own ports nested so they don't need nested
lock.
Reproducer:
for i in {0..1000}
do
ip link add team$i type team
ip link add dummy$i master team$i type dummy
ip link set dummy$i up
ip link set team$i up
done
Splat looks like:
BUG: MAX_LOCKDEP_ENTRIES too low!
turning off the locking correctness validator.
Please attach the output of /proc/lock_stat to the bug report
CPU: 0 PID: 4104 Comm: ip Not tainted 6.5.0-rc7+ #45
Call Trace:
<TASK>
dump_stack_lvl+0x64/0xb0
add_lock_to_list+0x30d/0x5e0
check_prev_add+0x73a/0x23a0
...
sock_def_readable+0xfe/0x4f0
netlink_broadcast+0x76b/0xac0
nlmsg_notify+0x69/0x1d0
dev_open+0xed/0x130
...
Reported-by: syzbot+9bbbacfbf1e04d5221f7@syzkaller.appspotmail.com
Fixes: 369f61bee0f5 ("team: fix nested locking lockdep warning")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quan Tian [Tue, 5 Sep 2023 10:36:10 +0000 (10:36 +0000)]
net/ipv6: SKB symmetric hash should incorporate transport ports
__skb_get_hash_symmetric() was added to compute a symmetric hash over
the protocol, addresses and transport ports, by commit
eb70db875671
("packet: Use symmetric hash for PACKET_FANOUT_HASH."). It uses
flow_keys_dissector_symmetric_keys as the flow_dissector to incorporate
IPv4 addresses, IPv6 addresses and ports. However, it should not specify
the flag as FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL, which stops further
dissection when an IPv6 flow label is encountered, making transport
ports not being incorporated in such case.
As a consequence, the symmetric hash is based on 5-tuple for IPv4 but
3-tuple for IPv6 when flow label is present. It caused a few problems,
e.g. when nft symhash and openvswitch l4_sym rely on the symmetric hash
to perform load balancing as different L4 flows between two given IPv6
addresses would always get the same symmetric hash, leading to uneven
traffic distribution.
Removing the use of FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL makes sure the
symmetric hash is based on 5-tuple for both IPv4 and IPv6 consistently.
Fixes: eb70db875671 ("packet: Use symmetric hash for PACKET_FANOUT_HASH.")
Reported-by: Lars Ekman <uablrek@gmail.com>
Closes: https://github.com/antrea-io/antrea/issues/5457
Signed-off-by: Quan Tian <qtian@vmware.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Sat, 2 Sep 2023 13:44:07 +0000 (10:44 -0300)]
dt-bindings: rtc: ds3231: Remove text binding
The "maxim,ds3231" compatible is described in the rtc-ds1307.yaml, so
there is no need to keep the text bindings version.
Remove the maxim,ds3231.txt file in favor of the rtc-ds1307.yaml binding.
Signed-off-by: Fabio Estevam <festevam@denx.de>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230902134407.2589099-1-festevam@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Alexandre Belloni [Sun, 27 Aug 2023 22:16:42 +0000 (00:16 +0200)]
rtc: wm8350: remove unnecessary messages
The RTC core already prints a message when the RTC is registered and when
registering fails, it is not necessary to have more in the driver.
Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com>
Link: https://lore.kernel.org/r/20230827221643.544259-3-alexandre.belloni@bootlin.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>