Linus Torvalds [Sun, 6 Feb 2022 18:04:43 +0000 (10:04 -0800)]
Merge tag 'objtool_urgent_for_v5.17_rc3' of git://git./linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
"Fix a potential truncated string warning triggered by gcc12"
* tag 'objtool_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix truncated string warning
Linus Torvalds [Sun, 6 Feb 2022 18:00:40 +0000 (10:00 -0800)]
Merge tag 'irq_urgent_for_v5.17_rc3' of git://git./linux/kernel/git/tip/tip
Pull irq fix from Borislav Petkov:
"Remove a bogus warning introduced by the recent PCI MSI irq affinity
overhaul"
* tag 'irq_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
PCI/MSI: Remove bogus warning in pci_irq_get_affinity()
Linus Torvalds [Sun, 6 Feb 2022 17:57:39 +0000 (09:57 -0800)]
Merge tag 'edac_urgent_for_v5.17_rc3' of git://git./linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
"Fix altera and xgene EDAC drivers to propagate the correct error code
from platform_get_irq() so that deferred probing still works"
* tag 'edac_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/xgene: Fix deferred probing
EDAC/altera: Fix deferred probing
Linus Torvalds [Sat, 5 Feb 2022 18:40:17 +0000 (10:40 -0800)]
Merge tag 'for-linus-5.17a-rc3-tag' of git://git./linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- documentation fixes related to Xen
- enable x2apic mode when available when running as hardware
virtualized guest under Xen
- cleanup and fix a corner case of vcpu enumeration when running a
paravirtualized Xen guest
* tag 'for-linus-5.17a-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/Xen: streamline (and fix) PV CPU enumeration
xen: update missing ioctl magic numers documentation
Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF
xen: xenbus_dev.h: delete incorrect file name
xen/x2apic: enable x2apic mode when supported for HVM
Linus Torvalds [Sat, 5 Feb 2022 17:55:59 +0000 (09:55 -0800)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- A couple of fixes when handling an exception while a SError has
been delivered
- Workaround for Cortex-A510's single-step erratum
RISC-V:
- Make CY, TM, and IR counters accessible in VU mode
- Fix SBI implementation version
x86:
- Report deprecation of x87 features in supported CPUID
- Preparation for fixing an interrupt delivery race on AMD hardware
- Sparse fix
All except POWER and s390:
- Rework guest entry code to correctly mark noinstr areas and fix
vtime' accounting (for x86, this was already mostly correct but not
entirely; for ARM, MIPS and RISC-V it wasn't)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer
KVM: x86: Report deprecated x87 features in supported CPUID
KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
KVM: arm64: Avoid consuming a stale esr value when SError occur
RISC-V: KVM: Fix SBI implementation version
RISC-V: KVM: make CY, TM, and IR counters accessible in VU mode
kvm/riscv: rework guest entry logic
kvm/arm64: rework guest entry logic
kvm/x86: rework guest entry logic
kvm/mips: rework guest entry logic
kvm: add guest_state_{enter,exit}_irqoff()
KVM: x86: Move delivery of non-APICv interrupt into vendor code
kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h
Linus Torvalds [Sat, 5 Feb 2022 17:21:55 +0000 (09:21 -0800)]
Merge tag 'xfs-5.17-fixes-1' of git://git./fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"I was auditing operations in XFS that clear file privileges, and
realized that XFS' fallocate implementation drops suid/sgid but
doesn't clear file capabilities the same way that file writes and
reflink do.
There are VFS helpers that do it correctly, so refactor XFS to use
them. I also noticed that we weren't flushing the log at the correct
point in the fallocate operation, so that's fixed too.
Summary:
- Fix fallocate so that it drops all file privileges when files are
modified instead of open-coding that incompletely.
- Fix fallocate to flush the log if the caller wanted synchronous
file updates"
* tag 'xfs-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: ensure log flush at the end of a synchronous fallocate call
xfs: move xfs_update_prealloc_flags() to xfs_pnfs.c
xfs: set prealloc flag in xfs_alloc_file_space()
xfs: fallocate() should call file_modified()
xfs: remove XFS_PREALLOC_SYNC
xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP*
Linus Torvalds [Sat, 5 Feb 2022 17:13:51 +0000 (09:13 -0800)]
Merge tag 'vfs-5.17-fixes-2' of git://git./fs/xfs/xfs-linux
Pull vfs fixes from Darrick Wong:
"I was auditing the sync_fs code paths recently and noticed that most
callers of ->sync_fs ignore its return value (and many implementations
never return nonzero even if the fs is broken!), which means that
internal fs errors and corruption are not passed up to userspace
callers of syncfs(2) or FIFREEZE. Hence fixing the common code and
XFS, and I'll start working on the ext4/btrfs folks if this is merged.
Summary:
- Fix a bug where callers of ->sync_fs (e.g. sync_filesystem and
syncfs(2)) ignore the return value.
- Fix a bug where callers of sync_filesystem (e.g. fs freeze) ignore
the return value.
- Fix a bug in XFS where xfs_fs_sync_fs never passed back error
returns"
* tag 'vfs-5.17-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: return errors in xfs_fs_sync_fs
quota: make dquot_quota_sync return errors from ->sync_fs
vfs: make sync_filesystem return errors from ->sync_fs
vfs: make freeze_super abort when sync_filesystem returns error
Linus Torvalds [Sat, 5 Feb 2022 17:04:43 +0000 (09:04 -0800)]
Merge tag 'iomap-5.17-fixes-1' of git://git./fs/xfs/xfs-linux
Pull iomap fix from Darrick Wong:
"A single bugfix for iomap.
The fix should eliminate occasional complaints about stall warnings
when a lot of writeback IO completes all at once and we have to then
go clearing status on a large number of folios.
Summary:
- Limit the length of ioend chains in writeback so that we don't trip
the softlockup watchdog and to limit long tail latency on clearing
PageWriteback"
* tag 'iomap-5.17-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs, iomap: limit individual ioend chain lengths in writeback
Paolo Bonzini [Sat, 5 Feb 2022 05:58:25 +0000 (00:58 -0500)]
Merge tag 'kvmarm-fixes-5.17-2' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.17, take #2
- A couple of fixes when handling an exception while a SError has been
delivered
- Workaround for Cortex-A510's single-step[ erratum
Linus Torvalds [Sat, 5 Feb 2022 00:28:11 +0000 (16:28 -0800)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Some medium sized bugs in the various drivers. A couple are more
recent regressions:
- Fix two panics in hfi1 and two allocation problems
- Send the IGMP to the correct address in cma
- Squash a syzkaller bug related to races reading the multicast list
- Memory leak in siw and cm
- Fix a corner case spec compliance for HFI/QIB
- Correct the implementation of fences in siw
- Error unwind bug in mlx4"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx4: Don't continue event handler after memory allocation failure
RDMA/siw: Fix broken RDMA Read Fence/Resume logic.
IB/rdmavt: Validate remote_addr during loopback atomic tests
IB/cm: Release previously acquired reference counter in the cm_id_priv
RDMA/siw: Fix refcounting leak in siw_create_qp()
RDMA/ucma: Protect mc during concurrent multicast leaves
RDMA/cma: Use correct address when leaving multicast group
IB/hfi1: Fix tstats alloc and dealloc
IB/hfi1: Fix AIP early init panic
IB/hfi1: Fix alloc failure with larger txqueuelen
IB/hfi1: Fix panic with larger ipoib send_queue_size
Linus Torvalds [Fri, 4 Feb 2022 23:27:45 +0000 (15:27 -0800)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Seven fixes, six of which are fairly obvious driver fixes.
The one core change to the device budget depth is to try to ensure
that if the default depth is large (which can produce quite a sizeable
bitmap allocation per device), we give back the memory we don't need
if there's a queue size reduction in slave_configure (which happens to
a lot of devices)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internal
scsi: pm8001: Fix use-after-free for aborted SSP/STP sas_task
scsi: pm8001: Fix use-after-free for aborted TMF sas_task
scsi: pm8001: Fix warning for undescribed param in process_one_iomb()
scsi: core: Reallocate device's budget map on queue depth change
scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe
scsi: pm80xx: Fix double completion for SATA devices
Linus Torvalds [Fri, 4 Feb 2022 23:22:35 +0000 (15:22 -0800)]
Merge tag 'pci-v5.17-fixes-3' of git://git./linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Restructure j721e_pcie_probe() so we don't dereference a NULL pointer
(Bjorn Helgaas)
- Add a kirin_pcie_data struct to identify different Kirin variants to
fix probe failure for controllers with an internal PHY (Bjorn
Helgaas)
* tag 'pci-v5.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: kirin: Add dev struct for of_device_get_match_data()
PCI: j721e: Initialize pcie->cdns_pcie before using it
Bjorn Helgaas [Wed, 2 Feb 2022 15:52:41 +0000 (09:52 -0600)]
PCI: kirin: Add dev struct for of_device_get_match_data()
Bean reported that
a622435fbe1a ("PCI: kirin: Prefer
of_device_get_match_data()") broke kirin_pcie_probe() because it assumed
match data of 0 was a failure when in fact, it meant the match data was
"(void *)PCIE_KIRIN_INTERNAL_PHY".
Therefore, probing of "hisilicon,kirin960-pcie" devices failed with -EINVAL
and an "OF data missing" message.
Add a struct kirin_pcie_data to encode the PHY type. Then the result of
of_device_get_match_data() should always be a non-NULL pointer to a struct
kirin_pcie_data that contains the PHY type.
Fixes:
a622435fbe1a ("PCI: kirin: Prefer of_device_get_match_data()")
Link: https://lore.kernel.org/r/20220202162659.GA12603@bhelgaas
Link: https://lore.kernel.org/r/20220201215941.1203155-1-huobean@gmail.com
Reported-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Linus Torvalds [Fri, 4 Feb 2022 20:14:58 +0000 (12:14 -0800)]
Merge tag 'for-5.17-rc2-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few fixes and error handling improvements:
- fix deadlock between quota disable and qgroup rescan worker
- fix use-after-free after failure to create a snapshot
- skip warning on unmount after log cleanup failure
- don't start transaction for scrub if the fs is mounted read-only
- tree checker verifies item sizes"
* tag 'for-5.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: skip reserved bytes warning on unmount after log cleanup failure
btrfs: fix use of uninitialized variable at rm device ioctl
btrfs: fix use-after-free after failure to create a snapshot
btrfs: tree-checker: check item_size for dev_item
btrfs: tree-checker: check item_size for inode_item
btrfs: fix deadlock between quota disable and qgroup rescan worker
btrfs: don't start transaction for scrub if the fs is mounted read-only
Linus Torvalds [Fri, 4 Feb 2022 20:08:49 +0000 (12:08 -0800)]
Merge tag 'erofs-for-5.17-rc3-fixes' of git://git./linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"Two fixes related to fsdax cleanup in this cycle and ztailpacking to
fix small compressed data inlining. There is also a trivial cleanup to
rearrange code for better reading.
Summary:
- fix fsdax partition offset misbehavior
- clean up z_erofs_decompressqueue_work() declaration
- fix up EOF lcluster inlining, especially for small compressed data"
* tag 'erofs-for-5.17-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix small compressed files inlining
erofs: avoid unnecessary z_erofs_decompressqueue_work() declaration
erofs: fix fsdax partition offset handling
Linus Torvalds [Fri, 4 Feb 2022 20:01:57 +0000 (12:01 -0800)]
Merge tag 'block-5.17-2022-02-04' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request
- fix use-after-free in rdma and tcp controller reset (Sagi Grimberg)
- fix the state check in nvmf_ctlr_matches_baseopts (Uday Shankar)
- MD nowait null pointer fix (Song)
- blk-integrity seed advance fix (Martin)
- Fix a dio regression in this merge window (Ilya)
* tag 'block-5.17-2022-02-04' of git://git.kernel.dk/linux-block:
block: bio-integrity: Advance seed correctly for larger interval sizes
nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()
md: fix NULL pointer deref with nowait but no mddev->queue
block: fix DIO handling regressions in blkdev_read_iter()
nvme-rdma: fix possible use-after-free in transport error_recovery work
nvme-tcp: fix possible use-after-free in transport error_recovery work
nvme: fix a possible use-after-free in controller reset during load
Linus Torvalds [Fri, 4 Feb 2022 19:52:37 +0000 (11:52 -0800)]
Merge tag 'ata-5.17-rc3' of git://git./linux/kernel/git/dlemoal/libata
Pull ATA fixes from Damien Le Moal:
- Sergey volunteered to be a reviewer for the Renesas R-Car SATA driver
and PATA drivers. Update the MAINTAINERS file accordingly.
- Regression fix: add a horkage flag to prevent accessing the log
directory log page with SATADOM-ML 3ME SATA devices as they react
badly to reading that log page (from Anton).
* tag 'ata-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage
MAINTAINERS: add myself as Renesas R-Car SATA driver reviewer
MAINTAINERS: add myself as PATA drivers reviewer
Linus Torvalds [Fri, 4 Feb 2022 19:45:16 +0000 (11:45 -0800)]
Merge tag 'iommu-fixes-v5.17-rc2' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Warning fixes and a fix for a potential use-after-free in IOMMU core
code
- Another potential memory leak fix for the Intel VT-d driver
- Fix for an IO polling loop timeout issue in the AMD IOMMU driver
* tag 'iommu-fixes-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
iommu: Fix some W=1 warnings
iommu: Fix potential use-after-free during probe
Linus Torvalds [Fri, 4 Feb 2022 19:38:01 +0000 (11:38 -0800)]
Merge tag 'random-5.17-rc3-for-linus' of git://git./linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
"For this week, we have:
- A fix to make more frequent use of hwgenerator randomness, from
Dominik.
- More cleanups to the boot initialization sequence, from Dominik.
- A fix for an old shortcoming with the ZAP ioctl, from me.
- A workaround for a still unfixed Clang CFI/FullLTO compiler bug,
from me. On one hand, it's a bummer to commit workarounds for
experimental compiler features that have bugs. But on the other, I
think this actually improves the code somewhat, independent of the
bug. So a win-win"
* tag 'random-5.17-rc3-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: only call crng_finalize_init() for primary_crng
random: access primary_pool directly rather than through pointer
random: wake up /dev/random writers after zap
random: continually use hwgenerator randomness
lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
Linus Torvalds [Fri, 4 Feb 2022 19:32:46 +0000 (11:32 -0800)]
Merge tag 'acpi-5.17-rc3' of git://git./linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix compilation in the case when ACPI is selected and CRC32, depended
on by ACPI after recent changes, is not (Randy Dunlap)"
* tag 'acpi-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: require CRC32 to build
Linus Torvalds [Fri, 4 Feb 2022 19:24:28 +0000 (11:24 -0800)]
Merge tag 'sound-5.17-rc3' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes.
The major changes are ASoC core fixes, addressing the DPCM locking
issue after the recent code changes and the potentially invalid
register accesses via control API. Also, HD-audio got a core fix for
Oops at dynamic unbinding.
The rest are device-specific small fixes, including the usual stuff
like HD-audio and USB-audio quirks"
* tag 'sound-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (31 commits)
ALSA: hda: Skip codec shutdown in case the codec is not registered
ALSA: usb-audio: Correct quirk for VF0770
ALSA: Replace acpi_bus_get_device()
Input: wm97xx: Simplify resource management
ALSA: hda/realtek: Add quirk for ASUS GU603
ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows
ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset)
ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks
ALSA: hda: realtek: Fix race at concurrent COEF updates
ASoC: ops: Check for negative values before reading them
ASoC: rt5682: Fix deadlock on resume
ASoC: hdmi-codec: Fix OOB memory accesses
ASoC: soc-pcm: Move debugfs removal out of spinlock
ASoC: soc-pcm: Fix DPCM lockdep warning due to nested stream locks
ASoC: fsl: Add missing error handling in pcm030_fabric_probe
ALSA: hda: Fix signedness of sscanf() arguments
ALSA: usb-audio: initialize variables that could ignore errors
ALSA: hda: Fix UAF of leds class devs at unbinding
ASoC: qdsp6: q6apm-dai: only stop graphs that are started
ASoC: codecs: wcd938x: fix return value of mixer put function
...
Linus Torvalds [Fri, 4 Feb 2022 19:13:54 +0000 (11:13 -0800)]
Merge tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular fixes for the week. Daniel has agreed to bring back the fbcon
hw acceleration under a CONFIG option for the non-drm fbdev users, we
don't advise turning this on unless you are in the niche that is old
fbdev drivers, Since it's essentially a revert and shouldn't be high
impact seemed like a good time to do it now.
Otherwise, i915 and amdgpu fixes are most of it, along with some minor
fixes elsewhere.
fbdev:
- readd fbcon acceleration
i915:
- fix DP monitor via type-c dock
- fix for engine busyness and read timeout with GuC
- use ALLOW_FAIL for error capture buffer allocs
- don't use interruptible lock on error paths
- smatch fix to reject zero sized overlays.
amdgpu:
- mGPU fan boost fix for beige goby
- S0ix fixes
- Cyan skillfish hang fix
- DCN fixes for DCN 3.1
- DCN fixes for DCN 3.01
- Apple retina panel fix
- ttm logic inversion fix
dma-buf:
- heaps: fix potential spectre v1 gadget
kmb:
- fix potential oob access
mxsfb:
- fix NULL ptr deref
nouveau:
- fix potential oob access during BIOS decode"
* tag 'drm-fixes-2022-02-04' of git://anongit.freedesktop.org/drm/drm: (24 commits)
drm: mxsfb: Fix NULL pointer dereference
drm/amdgpu: fix logic inversion in check
drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled
drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels
drm/amd/display: revert "Reset fifo after enable otg"
drm/amd/display: watermark latencies is not enough on DCN31
drm/amd/display: Update watermark values for DCN301
drm/amdgpu: fix a potential GPU hang on cyan skillfish
drm/amd: Only run s3 or s0ix if system is configured properly
drm/amd: add support to check whether the system is set to s3
fbcon: Add option to enable legacy hardware acceleration
Revert "fbcon: Disable accelerated scrolling"
Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)"
drm/i915/pmu: Fix KMD and GuC race on accessing busyness
dma-buf: heaps: Fix potential spectre v1 gadget
drm/amd: Warn users about potential s0ix problems
drm/amd/pm: correct the MGpuFanBoost support for Beige Goby
drm/nouveau: fix off by one in BIOS boundary checking
drm/i915/adlp: Fix TypeC PHY-ready status readout
drm/i915/pmu: Use PM timestamp instead of RING TIMESTAMP for reference
...
Linus Torvalds [Fri, 4 Feb 2022 18:34:19 +0000 (10:34 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"10 patches.
Subsystems affected by this patch series: ipc, MAINTAINERS, and mm
(vmscan, debug, pagemap, kmemleak, and selftests)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"
MAINTAINERS: update rppt's email
mm/kmemleak: avoid scanning potential huge holes
ipc/sem: do not sleep with a spin lock held
mm/pgtable: define pte_index so that preprocessor could recognize it
mm/page_table_check: check entries at pmd levels
mm/khugepaged: unify collapse pmd clear, flush and free
mm/page_table_check: use unsigned long for page counters and cleanup
mm/debug_vm_pgtable: remove pte entry from the page table
Revert "mm/page_isolation: unset migratetype directly for non Buddy page"
Dominik Brodowski [Sun, 30 Jan 2022 21:03:20 +0000 (22:03 +0100)]
random: only call crng_finalize_init() for primary_crng
crng_finalize_init() returns instantly if it is called for another pool
than primary_crng. The test whether crng_finalize_init() is still required
can be moved to the relevant caller in crng_reseed(), and
crng_need_final_init can be reset to false if crng_finalize_init() is
called with workqueues ready. Then, no previous callsite will call
crng_finalize_init() unless it is needed, and we can get rid of the
superfluous function parameter.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Dominik Brodowski [Sun, 30 Jan 2022 21:03:19 +0000 (22:03 +0100)]
random: access primary_pool directly rather than through pointer
Both crng_initialize_primary() and crng_init_try_arch_early() are
only called for the primary_pool. Accessing it directly instead of
through a function parameter simplifies the code.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Fri, 28 Jan 2022 22:44:03 +0000 (23:44 +0100)]
random: wake up /dev/random writers after zap
When account() is called, and the amount of entropy dips below
random_write_wakeup_bits, we wake up the random writers, so that they
can write some more in. However, the RNDZAPENTCNT/RNDCLEARPOOL ioctl
sets the entropy count to zero -- a potential reduction just like
account() -- but does not unblock writers. This commit adds the missing
logic to that ioctl to unblock waiting writers.
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Dominik Brodowski [Tue, 25 Jan 2022 20:14:57 +0000 (21:14 +0100)]
random: continually use hwgenerator randomness
The rngd kernel thread may sleep indefinitely if the entropy count is
kept above random_write_wakeup_bits by other entropy sources. To make
best use of multiple sources of randomness, mix entropy from hardware
RNGs into the pool at least once within CRNG_RESEED_INTERVAL.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Jason A. Donenfeld [Wed, 19 Jan 2022 13:35:06 +0000 (14:35 +0100)]
lib/crypto: blake2s: avoid indirect calls to compression function for Clang CFI
blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".
[ 0.000000][ T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[ 0.000000][ T0] Hardware name: MT6873 (DT)
[ 0.000000][ T0] Call trace:
[ 0.000000][ T0] dump_backtrace+0xfc/0x1dc
[ 0.000000][ T0] dump_stack_lvl+0xa8/0x11c
[ 0.000000][ T0] panic+0x194/0x464
[ 0.000000][ T0] __cfi_check_fail+0x54/0x58
[ 0.000000][ T0] __cfi_slowpath_diag+0x354/0x4b0
[ 0.000000][ T0] blake2s_update+0x14c/0x178
[ 0.000000][ T0] _extract_entropy+0xf4/0x29c
[ 0.000000][ T0] crng_initialize_primary+0x24/0x94
[ 0.000000][ T0] rand_initialize+0x2c/0x6c
[ 0.000000][ T0] start_kernel+0x2f8/0x65c
[ 0.000000][ T0] __primary_switched+0xc4/0x7be4
[ 0.000000][ T0] Rebooting in 5 seconds..
Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.
In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.
Fixes:
6048fdcc5f26 ("lib/crypto: blake2s: include as built-in")
Link: https://github.com/ClangBuiltLinux/linux/issues/1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Linus Torvalds [Fri, 4 Feb 2022 17:54:02 +0000 (09:54 -0800)]
Merge tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A patch to make it possible to disable zero copy path in the messenger
to avoid checksum or authentication tag mismatches and ensuing session
resets in case the destination buffer isn't guaranteed to be stable"
* tag 'ceph-for-5.17-rc3' of git://github.com/ceph/ceph-client:
libceph: optionally use bounce buffer on recv path in crc mode
libceph: make recv path in secure mode work the same as send path
Linus Torvalds [Fri, 4 Feb 2022 17:44:42 +0000 (09:44 -0800)]
Merge tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux
Pull 9p fix from Dominique Martinet:
"Fix 'cannot walk open fid' rule
The 9p 'walk' operation requires fid arguments to not originate from
an open or create call and we've missed that for a while as the
servers regularly running tests with don't enforce the check and no
active reviewer knew about the rule.
Both reporters confirmed reverting this patch fixes things for them
and looking at it further wasn't actually required... Will take more
time for follow up and enforcing the rule more thoroughly later"
* tag '9p-for-5.17-rc3' of git://github.com/martinetd/linux:
Revert "fs/9p: search open fids first"
Linus Torvalds [Fri, 4 Feb 2022 17:34:37 +0000 (09:34 -0800)]
Merge tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"SMB3 client fixes including:
- multiple fscache related fixes, reenabling ability to read/write to
cached files for cifs.ko (that was temporarily disabled for cifs.ko
a few weeks ago due to the recent fscache changes)
- also includes a new fscache helper function ("query_occupancy")
used by above
- fix for multiuser mounts and NTLMSSP auth (workstation name) for
stable
- fix locking ordering problem in multichannel code
- trivial malformed comment fix"
* tag '5.17-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix workstation_name for multiuser mounts
Invalidate fscache cookie only when inode attributes are changed.
cifs: Fix the readahead conversion to manage the batch when reading from cache
cifs: Implement cache I/O by accessing the cache directly
netfs, cachefiles: Add a method to query presence of data in the cache
cifs: Transition from ->readpages() to ->readahead()
cifs: unlock chan_lock before calling cifs_put_tcp_session
Fix a warning about a malformed kernel doc comment in cifs
Shuah Khan [Fri, 4 Feb 2022 04:49:45 +0000 (20:49 -0800)]
kselftest/vm: revert "tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner"
With this change, userfaultfd fails to build with undefined reference
swap() error:
userfaultfd.c: In function `userfaultfd_stress':
userfaultfd.c:1530:17: warning: implicit declaration of function `swap'; did you mean `swab'? [-Wimplicit-function-declaration]
1530 | swap(area_src, area_dst);
| ^~~~
| swab
/usr/bin/ld: /tmp/ccDGOAdV.o: in function `userfaultfd_stress':
userfaultfd.c:(.text+0x549e): undefined reference to `swap'
/usr/bin/ld: userfaultfd.c:(.text+0x54bc): undefined reference to `swap'
collect2: error: ld returned 1 exit status
Revert the commit to fix the problem.
Link: https://lkml.kernel.org/r/20220202003340.87195-1-skhan@linuxfoundation.org
Fixes:
2c769ed7137a ("tools/testing/selftests/vm/userfaultfd.c: use swap() to make code cleaner")
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Fri, 4 Feb 2022 04:49:41 +0000 (20:49 -0800)]
MAINTAINERS: update rppt's email
Use my @kernel.org address
Link: https://lkml.kernel.org/r/20220203090324.3701774-1-rppt@kernel.org
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lang Yu [Fri, 4 Feb 2022 04:49:37 +0000 (20:49 -0800)]
mm/kmemleak: avoid scanning potential huge holes
When using devm_request_free_mem_region() and devm_memremap_pages() to
add ZONE_DEVICE memory, if requested free mem region's end pfn were
huge(e.g., 0x400000000), the node_end_pfn() will be also huge (see
move_pfn_range_to_zone()). Thus it creates a huge hole between
node_start_pfn() and node_end_pfn().
We found on some AMD APUs, amdkfd requested such a free mem region and
created a huge hole. In such a case, following code snippet was just
doing busy test_bit() looping on the huge hole.
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
if (!page)
continue;
...
}
So we got a soft lockup:
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
RIP: 0010:pfn_to_online_page+0x5/0xd0
Call Trace:
? kmemleak_scan+0x16a/0x440
kmemleak_write+0x306/0x3a0
? common_file_perm+0x72/0x170
full_proxy_write+0x5c/0x90
vfs_write+0xb9/0x260
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
I did some tests with the patch.
(1) amdgpu module unloaded
before the patch:
real 0m0.976s
user 0m0.000s
sys 0m0.968s
after the patch:
real 0m0.981s
user 0m0.000s
sys 0m0.973s
(2) amdgpu module loaded
before the patch:
real 0m35.365s
user 0m0.000s
sys 0m35.354s
after the patch:
real 0m1.049s
user 0m0.000s
sys 0m1.042s
Link: https://lkml.kernel.org/r/20211108140029.721144-1-lang.yu@amd.com
Signed-off-by: Lang Yu <lang.yu@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minghao Chi [Fri, 4 Feb 2022 04:49:33 +0000 (20:49 -0800)]
ipc/sem: do not sleep with a spin lock held
We can't call kvfree() with a spin lock held, so defer it.
Link: https://lkml.kernel.org/r/20211223031207.556189-1-chi.minghao@zte.com.cn
Fixes:
fc37a3b8b438 ("[PATCH] ipc sem: use kvmalloc for sem_undo allocation")
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Yang Guang <cgel.zte@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Fri, 4 Feb 2022 04:49:29 +0000 (20:49 -0800)]
mm/pgtable: define pte_index so that preprocessor could recognize it
Since commit
974b9b2c68f3 ("mm: consolidate pte_index() and
pte_offset_*() definitions") pte_index is a static inline and there is
no define for it that can be recognized by the preprocessor. As a
result, vm_insert_pages() uses slower loop over vm_insert_page() instead
of insert_pages() that amortizes the cost of spinlock operations when
inserting multiple pages.
Link: https://lkml.kernel.org/r/20220111145457.20748-1-rppt@kernel.org
Fixes:
974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions")
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reported-by: Christian Dietrich <stettberger@dokucode.de>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pasha Tatashin [Fri, 4 Feb 2022 04:49:24 +0000 (20:49 -0800)]
mm/page_table_check: check entries at pmd levels
syzbot detected a case where the page table counters were not properly
updated.
syzkaller login: ------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:162!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3099 Comm: pasha Not tainted 5.16.0+ #48
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO4
RIP: 0010:__page_table_check_zero+0x159/0x1a0
Call Trace:
free_pcp_prepare+0x3be/0xaa0
free_unref_page+0x1c/0x650
free_compound_page+0xec/0x130
free_transhuge_page+0x1be/0x260
__put_compound_page+0x90/0xd0
release_pages+0x54c/0x1060
__pagevec_release+0x7c/0x110
shmem_undo_range+0x85e/0x1250
...
The repro involved having a huge page that is split due to uprobe event
temporarily replacing one of the pages in the huge page. Later the huge
page was combined again, but the counters were off, as the PTE level was
not properly updated.
Make sure that when PMD is cleared and prior to freeing the level the
PTEs are updated.
Link: https://lkml.kernel.org/r/20220131203249.2832273-5-pasha.tatashin@soleen.com
Fixes:
df4e817b7108 ("mm: page table check")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pasha Tatashin [Fri, 4 Feb 2022 04:49:20 +0000 (20:49 -0800)]
mm/khugepaged: unify collapse pmd clear, flush and free
Unify the code that flushes, clears pmd entry, and frees the PTE table
level into a new function collapse_and_free_pmd().
This cleanup is useful as in the next patch we will add another call to
this function to iterate through PTE prior to freeing the level for page
table check.
Link: https://lkml.kernel.org/r/20220131203249.2832273-4-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pasha Tatashin [Fri, 4 Feb 2022 04:49:15 +0000 (20:49 -0800)]
mm/page_table_check: use unsigned long for page counters and cleanup
For consistency, use "unsigned long" for all page counters.
Also, reduce code duplication by calling __page_table_check_*_clear()
from __page_table_check_*_set() functions.
Link: https://lkml.kernel.org/r/20220131203249.2832273-3-pasha.tatashin@soleen.com
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Wei Xu <weixugc@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Paul Turner <pjt@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pasha Tatashin [Fri, 4 Feb 2022 04:49:10 +0000 (20:49 -0800)]
mm/debug_vm_pgtable: remove pte entry from the page table
Patch series "page table check fixes and cleanups", v5.
This patch (of 4):
The pte entry that is used in pte_advanced_tests() is never removed from
the page table at the end of the test.
The issue is detected by page_table_check, to repro compile kernel with
the following configs:
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_PAGE_TABLE_CHECK=y
CONFIG_PAGE_TABLE_CHECK_ENFORCED=y
During the boot the following BUG is printed:
debug_vm_pgtable: [debug_vm_pgtable ]: Validating architecture page table helpers
------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:162!
invalid opcode: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-11413-g2c271fe77d52 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
...
The entry should be properly removed from the page table before the page
is released to the free list.
Link: https://lkml.kernel.org/r/20220131203249.2832273-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20220131203249.2832273-2-pasha.tatashin@soleen.com
Fixes:
a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Tested-by: Zi Yan <ziy@nvidia.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chen Wandun [Fri, 4 Feb 2022 04:49:06 +0000 (20:49 -0800)]
Revert "mm/page_isolation: unset migratetype directly for non Buddy page"
This reverts commit
721fb891ad0b3956d5c168b2931e3e5e4fb7ca40.
Commit
721fb891ad0b ("mm/page_isolation: unset migratetype directly for
non Buddy page") will result memory that should in buddy disappear by
mistake. move_freepages_block moves all pages in pageblock instead of
pages indicated by input parameter, so if input pages is not in buddy
but other pages in pageblock is in buddy, it will result in page out of
control.
Link: https://lkml.kernel.org/r/20220126024436.13921-1-chenwandun@huawei.com
Fixes:
721fb891ad0b ("mm/page_isolation: unset migratetype directly for non Buddy page")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Dong Aisheng <aisheng.dong@nxp.com>
Tested-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joerg Roedel [Fri, 4 Feb 2022 11:55:37 +0000 (12:55 +0100)]
iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
The polling loop for the register change in iommu_ga_log_enable() needs
to have a udelay() in it. Otherwise the CPU might be faster than the
IOMMU hardware and wrongly trigger the WARN_ON() further down the code
stream. Use a 10us for udelay(), has there is some hardware where
activation of the GA log can take more than a 100ms.
A future optimization should move the activation check of the GA log
to the point where it gets used for the first time. But that is a
bigger change and not suitable for a fix.
Fixes:
8bda0cfbdc1a ("iommu/amd: Detect and initialize guest vAPIC log")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20220204115537.3894-1-joro@8bytes.org
Thomas Gleixner [Mon, 31 Jan 2022 21:02:46 +0000 (22:02 +0100)]
PCI/MSI: Remove bogus warning in pci_irq_get_affinity()
The recent overhaul of pci_irq_get_affinity() introduced a regression when
pci_irq_get_affinity() is called for an MSI-X interrupt which was not
allocated with affinity descriptor information.
The original code just returned a NULL pointer in that case, but the rework
added a WARN_ON() under the assumption that the corresponding WARN_ON() in
the MSI case can be applied to MSI-X as well.
In fact the MSI warning in the original code does not make sense either
because it's legitimate to invoke pci_irq_get_affinity() for a MSI
interrupt which was not allocated with affinity descriptor information.
Remove it and just return NULL as the original code did.
Fixes:
f48235900182 ("PCI/MSI: Simplify pci_irq_get_affinity()")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87ee4n38sm.ffs@tglx
Sean Christopherson [Wed, 2 Feb 2022 00:51:57 +0000 (00:51 +0000)]
KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer
Use ERR_PTR_USR() when returning -EFAULT from kvm_get_attr_addr(), sparse
complains about implicitly casting the kernel pointer from ERR_PTR() into
a __user pointer.
>> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression
(different address spaces) @@ expected void [noderef] __user * @@ got void * @@
arch/x86/kvm/x86.c:4342:31: sparse: expected void [noderef] __user *
arch/x86/kvm/x86.c:4342:31: sparse: got void *
>> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression
(different address spaces) @@ expected void [noderef] __user * @@ got void * @@
arch/x86/kvm/x86.c:4342:31: sparse: expected void [noderef] __user *
arch/x86/kvm/x86.c:4342:31: sparse: got void *
No functional change intended.
Fixes:
56f289a8d23a ("KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <
20220202005157.2545816-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Jim Mattson [Fri, 4 Feb 2022 00:13:48 +0000 (16:13 -0800)]
KVM: x86: Report deprecated x87 features in supported CPUID
CPUID.(EAX=7,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6] and
CPUID.(EAX=7,ECX=0):EBX.ZERO_FCS_FDS[bit 13] are "defeature"
bits. Unlike most of the other CPUID feature bits, these bits are
clear if the features are present and set if the features are not
present. These bits should be reported in KVM_GET_SUPPORTED_CPUID,
because if these bits are set on hardware, they cannot be cleared in
the guest CPUID. Doing so would claim guest support for a feature that
the hardware doesn't support and that can't be efficiently emulated.
Of course, any software (e.g WIN87EM.DLL) expecting these features to
be present likely predates these CPUID feature bits and therefore
doesn't know to check for them anyway.
Aaron Lewis added the corresponding X86_FEATURE macros in
commit
cbb99c0f5887 ("x86/cpufeatures: Add FDP_EXCPTN_ONLY and
ZERO_FCS_FDS"), with the intention of reporting these bits in
KVM_GET_SUPPORTED_CPUID, but I was unable to find a proposed patch on
the kvm list.
Opportunistically reordered the CPUID_7_0_EBX capability bits from
least to most significant.
Cc: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <
20220204001348.2844660-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Anton Lundin [Thu, 3 Feb 2022 09:41:35 +0000 (10:41 +0100)]
ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage
06f6c4c6c3e8 ("ata: libata: add missing ata_identify_page_supported() calls")
introduced additional calls to ata_identify_page_supported(), thus also
adding indirectly accesses to the device log directory log page through
ata_log_supported(). Reading this log page causes SATADOM-ML 3ME devices
to lock up.
Introduce the horkage flag ATA_HORKAGE_NO_LOG_DIR to prevent accesses to
the log directory in ata_log_supported() and add a blacklist entry
with this flag for "SATADOM-ML 3ME" devices.
Fixes:
636f6e2af4fb ("libata: add horkage for missing Identify Device log")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Anton Lundin <glance@acc.umu.se>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Sergey Shtylyov [Thu, 3 Feb 2022 19:47:09 +0000 (22:47 +0300)]
MAINTAINERS: add myself as Renesas R-Car SATA driver reviewer
Add myself as a reviewer for the Renesas R-Car SATA driver -- I don't have
the hardware anymore (Geert Uytterhoeven does have a lot of hardware!) but
I do have the manuals still! :-)
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Dave Airlie [Fri, 4 Feb 2022 05:48:26 +0000 (15:48 +1000)]
Merge tag 'drm-intel-fixes-2022-02-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Fix GitLab issue #4698: DP monitor through Type-C dock(Dell DA310) doesn't work.
Fixes for inconsistent engine busyness value and read timeout with GuC.
Fix to use ALLOW_FAIL for error capture buffer allocation. Don't use
interruptible lock on error path. Smatch fix to reject zero sized overlays.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YfuiG8SKMKP5V/Dm@jlahtine-mobl.ger.corp.intel.com
Dave Airlie [Fri, 4 Feb 2022 04:43:28 +0000 (14:43 +1000)]
Merge tag 'drm-misc-fixes-2022-02-03' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
* dma-buf/heaps: Fix potential spectre v1 gadget
* drm/kmb: Fix potential out-of-bounds access
* drm/mxsfb: Fix NULL-pointer dereference
* drm/nouveau: Fix potential out-of-bounds access in BIOS decoding
* fbdev: Re-add support for fbcon hardware acceleration
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/Yfu8mTZQUNt1RwZd@linux-uq9g
Gao Xiang [Thu, 3 Feb 2022 19:02:03 +0000 (03:02 +0800)]
erofs: fix small compressed files inlining
Prior to ztailpacking feature, it's enough that each lcluster has
two pclusters at most, and the last pcluster should be turned into
an uncompressed pcluster when necessary. For example,
_________________________________________________
|_ pcluster n-2 _|_ pcluster n-1 _|____ EOFed ____|
which should be converted into:
_________________________________________________
|_ pcluster n-2 _|_ pcluster n-1 (uncompressed)' _|
That is fine since either pcluster n-1 or (uncompressed)' takes one
physical block.
However, after ztailpacking was supported, the game is changed since
the last pcluster can be inlined now. And such case above is quite
common for inlining small files. Therefore, in order to inline more
effectively, special EOF lclusters are now supported which can have
three parts at most, as illustrated below:
_________________________________________________
|_ pcluster n-2 _|_ pcluster n-1 _|____ EOFed ____|
^ i_size
Actually similar code exists in Yue Hu's original patchset [1], but I
removed this part on purpose. After evaluating more real cases with
small files, I've changed my mind.
[1] https://lore.kernel.org/r/
20211215094449.15162-1-huyue2@yulong.com
Link: https://lore.kernel.org/r/20220203190203.30794-1-xiang@kernel.org
Fixes:
ab92184ff8f1 ("erofs: add on-disk compressed tail-packing inline support")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Martin K. Petersen [Fri, 4 Feb 2022 03:42:09 +0000 (22:42 -0500)]
block: bio-integrity: Advance seed correctly for larger interval sizes
Commit
309a62fa3a9e ("bio-integrity: bio_integrity_advance must update
integrity seed") added code to update the integrity seed value when
advancing a bio. However, it failed to take into account that the
integrity interval might be larger than the 512-byte block layer
sector size. This broke bio splitting on PI devices with 4KB logical
blocks.
The seed value should be advanced by bio_integrity_intervals() and not
the number of sectors.
Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: stable@vger.kernel.org
Fixes:
309a62fa3a9e ("bio-integrity: bio_integrity_advance must update integrity seed")
Tested-by: Dmitry Ivanov <dmitry.ivanov2@hpe.com>
Reported-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220204034209.4193-1-martin.petersen@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dave Airlie [Fri, 4 Feb 2022 03:18:55 +0000 (13:18 +1000)]
Merge tag 'amd-drm-fixes-5.17-2022-02-02' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.17-2022-02-02:
amdgpu:
- mGPU fan boost fix for beige goby
- S0ix fixes
- Cyan skillfish hang fix
- DCN fixes for DCN 3.1
- DCN fixes for DCN 3.01
- Apple retina panel fix
- ttm logic inversion fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220203035224.5801-1-alexander.deucher@amd.com
Kees Cook [Thu, 3 Feb 2022 20:17:54 +0000 (12:17 -0800)]
gcc-plugins/stackleak: Use noinstr in favor of notrace
While the stackleak plugin was already using notrace, objtool is now a
bit more picky. Update the notrace uses to noinstr. Silences the
following objtool warnings when building with:
CONFIG_DEBUG_ENTRY=y
CONFIG_STACK_VALIDATION=y
CONFIG_VMLINUX_VALIDATION=y
CONFIG_GCC_PLUGIN_STACKLEAK=y
vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section
vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section
vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section
Note that the plugin's addition of calls to stackleak_track_stack() from
noinstr functions is expected to be safe, as it isn't runtime
instrumentation and is self-contained.
Cc: Alexander Popov <alex.popov@linux.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 4 Feb 2022 00:54:18 +0000 (16:54 -0800)]
Merge tag 'net-5.17-rc3' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, netfilter, and ieee802154.
Current release - regressions:
- Partially revert "net/smc: Add netlink net namespace support", fix
uABI breakage
- netfilter:
- nft_ct: fix use after free when attaching zone template
- nft_byteorder: track register operations
Previous releases - regressions:
- ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
- phy: qca8081: fix speeds lower than 2.5Gb/s
- sched: fix use-after-free in tc_new_tfilter()
Previous releases - always broken:
- tcp: fix mem under-charging with zerocopy sendmsg()
- tcp: add missing tcp_skb_can_collapse() test in
tcp_shift_skb_data()
- neigh: do not trigger immediate probes on NUD_FAILED from
neigh_managed_work, avoid a deadlock
- bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN
false-positives
- netfilter: nft_reject_bridge: fix for missing reply from prerouting
- smc: forward wakeup to smc socket waitqueue after fallback
- ieee802154:
- return meaningful error codes from the netlink helpers
- mcr20a: fix lifs/sifs periods
- at86rf230, ca8210: stop leaking skbs on error paths
- macsec: add missing un-offload call for NETDEV_UNREGISTER of parent
- ax25: add refcount in ax25_dev to avoid UAF bugs
- eth: mlx5e:
- fix SFP module EEPROM query
- fix broken SKB allocation in HW-GRO
- IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows
- eth: amd-xgbe:
- fix skb data length underflow
- ensure reset of the tx_timer_active flag, avoid Tx timeouts
- eth: stmmac: fix runtime pm use in stmmac_dvr_remove()
- eth: e1000e: handshake with CSME starts from Alder Lake platforms"
* tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
ax25: fix reference count leaks of ax25_dev
net: stmmac: ensure PTP time register reads are consistent
net: ipa: request IPA register values be retained
dt-bindings: net: qcom,ipa: add optional qcom,qmp property
tools/resolve_btfids: Do not print any commands when building silently
bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work
tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
net: sparx5: do not refer to skb after passing it on
Partially revert "net/smc: Add netlink net namespace support"
net/mlx5e: Avoid field-overflowing memcpy()
net/mlx5e: Use struct_group() for memcpy() region
net/mlx5e: Avoid implicit modify hdr for decap drop rule
net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
net/mlx5: E-Switch, Fix uninitialized variable modact
net/mlx5e: Fix handling of wrong devices during bond netevent
net/mlx5e: Fix broken SKB allocation in HW-GRO
net/mlx5e: Fix wrong calculation of header index in HW_GRO
...
Linus Torvalds [Fri, 4 Feb 2022 00:44:12 +0000 (16:44 -0800)]
Merge tag 'selinux-pr-
20220203' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"One small SELinux patch to ensure that a policy structure field is
properly reset after freeing so that we don't inadvertently do a
double-free on certain error conditions"
* tag 'selinux-pr-
20220203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix double free of cond_list on error paths
Linus Torvalds [Fri, 4 Feb 2022 00:36:26 +0000 (16:36 -0800)]
Merge tag 'linux-kselftest-fixes-5.17-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"Important fixes to several tests and documentation clarification on
running mainline kselftest on stable releases. A few notable fixes:
- fix kselftest run hang due to child processes that haven't been
terminated. Fix signals all child processes
- fix false pass/fail results from vdso_test_abi, openat2, mincore
- build failures when using -j (multiple jobs) option
- exec test build failure due to incorrect build rule for a run-time
created "pipe"
- zram test fixes related to interaction with zram-generator to make
sure zram test to coordinate deleted with zram-generator
- zram test compression ratio calculation fix and skipping
max_comp_streams.
- increasing rtc test timeout
- cpufreq test to write test results to stdout which will necessary
on automated test systems"
* tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest: Fix vdso_test_abi return status
selftests: skip mincore.check_file_mmap when fs lacks needed support
selftests: openat2: Skip testcases that fail with EOPNOTSUPP
selftests: openat2: Add missing dependency in Makefile
selftests: openat2: Print also errno in failure messages
selftests: futex: Use variable MAKE instead of make
selftests/exec: Remove pipe from TEST_GEN_FILES
selftests/zram: Adapt the situation that /dev/zram0 is being used
selftests/zram01.sh: Fix compression ratio calculation
selftests/zram: Skip max_comp_streams interface on newer kernel
docs/kselftest: clarify running mainline tests on stables
kselftest: signal all child processes
selftests: cpufreq: Write test output to stdout as well
selftests: rtc: Increase test timeout so that all tests run
Duoming Zhou [Thu, 3 Feb 2022 15:08:11 +0000 (23:08 +0800)]
ax25: fix reference count leaks of ax25_dev
The previous commit
d01ffb9eee4a ("ax25: add refcount in ax25_dev
to avoid UAF bugs") introduces refcount into ax25_dev, but there
are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(),
ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().
This patch uses ax25_dev_put() and adjusts the position of
ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.
Fixes:
d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yannick Vignon [Thu, 3 Feb 2022 16:00:25 +0000 (17:00 +0100)]
net: stmmac: ensure PTP time register reads are consistent
Even if protected from preemption and interrupts, a small time window
remains when the 2 register reads could return inconsistent values,
each time the "seconds" register changes. This could lead to an about
1-second error in the reported time.
Add logic to ensure the "seconds" and "nanoseconds" values are consistent.
Fixes:
92ba6888510c ("stmmac: add the support for PTP hw clock driver")
Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220203160025.750632-1-yannick.vignon@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 3 Feb 2022 21:42:38 +0000 (13:42 -0800)]
Merge https://git./linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2022-02-03
We've added 6 non-merge commits during the last 10 day(s) which contain
a total of 7 files changed, 11 insertions(+), 236 deletions(-).
The main changes are:
1) Fix BPF ringbuf to allocate its area with VM_MAP instead of VM_ALLOC
flag which otherwise trips over KASAN, from Hou Tao.
2) Fix unresolved symbol warning in resolve_btfids due to LSM callback
rename, from Alexei Starovoitov.
3) Fix a possible race in inc_misses_counter() when IRQ would trigger
during counter update, from He Fengqing.
4) Fix tooling infra for cross-building with clang upon probing whether
gcc provides the standard libraries, from Jean-Philippe Brucker.
5) Fix silent mode build for resolve_btfids, from Nathan Chancellor.
6) Drop unneeded and outdated lirc.h header copy from tooling infra as
BPF does not require it anymore, from Sean Young.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
tools/resolve_btfids: Do not print any commands when building silently
bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
tools: Ignore errors from `which' when searching a GCC toolchain
tools headers UAPI: remove stale lirc.h
bpf: Fix possible race in inc_misses_counter
bpf: Fix renaming task_getsecid_subj->current_getsecid_subj.
====================
Link: https://lore.kernel.org/r/20220203155815.25689-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jens Axboe [Thu, 3 Feb 2022 19:37:02 +0000 (12:37 -0700)]
Merge tag 'nvme-5.17-2022-02-03' of git://git.infradead.org/nvme into block-5.17
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.17
- fix a use-after-free in rdm and tcp controller reset (Sagi Grimberg)
- fix the state check in nvmf_ctlr_matches_baseopts (Uday Shankar)"
* tag 'nvme-5.17-2022-02-03' of git://git.infradead.org/nvme:
nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()
nvme-rdma: fix possible use-after-free in transport error_recovery work
nvme-tcp: fix possible use-after-free in transport error_recovery work
nvme: fix a possible use-after-free in controller reset during load
Mickaël Salaün [Thu, 3 Feb 2022 14:50:29 +0000 (15:50 +0100)]
printk: Fix incorrect __user type in proc_dointvec_minmax_sysadmin()
The move of proc_dointvec_minmax_sysadmin() from kernel/sysctl.c to
kernel/printk/sysctl.c introduced an incorrect __user attribute to the
buffer argument. I spotted this change in [1] as well as the kernel
test robot. Revert this change to please sparse:
kernel/printk/sysctl.c:20:51: warning: incorrect type in argument 3 (different address spaces)
kernel/printk/sysctl.c:20:51: expected void *
kernel/printk/sysctl.c:20:51: got void [noderef] __user *buffer
Fixes:
faaa357a55e0 ("printk: move printk sysctl to printk/sysctl.c")
Link: https://lore.kernel.org/r/20220104155024.48023-2-mic@digikod.net
Reported-by: kernel test robot <lkp@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20220203145029.272640-1-mic@digikod.net
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Igor Pylypiv [Thu, 27 Jan 2022 23:39:53 +0000 (15:39 -0800)]
Revert "module, async: async_synchronize_full() on module init iff async is used"
This reverts commit
774a1221e862b343388347bac9b318767336b20b.
We need to finish all async code before the module init sequence is
done. In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule(). Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked. This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().
For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:
if (cpu < nr_cpu_ids)
error = work_on_cpu(cpu, local_pci_probe, &ddi);
else
error = local_pci_probe(&ddi);
We end up in a situation where a worker thread gets the PF_USED_ASYNC
flag set instead of the modprobe thread. As a result,
async_synchronize_full() is not invoked and modprobe completes without
waiting for the async code to finish.
The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)
modprobe pm80xx worker
...
do_init_module()
...
pci_call_probe()
work_on_cpu(local_pci_probe)
local_pci_probe()
pm8001_pci_probe()
scsi_scan_host()
async_schedule()
worker->flags |= PF_USED_ASYNC;
...
< return from worker >
...
if (current->flags & PF_USED_ASYNC) <--- false
async_synchronize_full();
Commit
21c3c5d28007 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit
774a1221e862
("module, async: async_synchronize_full() on module init iff async is
used") tried to fix.
Since commit
0fdff3ec6d87 ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading from
async is not allowed.
Given that the original deadlock issue is fixed and it is no longer
allowed to call synchronous request_module() from async we can remove
PF_USED_ASYNC flag to make module init consistently invoke
async_synchronize_full() unless async module probe is requested.
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jens Axboe [Thu, 3 Feb 2022 18:54:53 +0000 (11:54 -0700)]
Merge branch 'md-fixes' of https://git./linux/kernel/git/song/md into block-5.17
Pull MD fix from Song:
"Please consider pulling the following fix on top of your block-5.17
branch. It fixes a NULL ptr deref case with nowait."
* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md: fix NULL pointer deref with nowait but no mddev->queue
Linus Torvalds [Thu, 3 Feb 2022 16:15:13 +0000 (08:15 -0800)]
Merge branch 'for-5.17-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Eric's fix for a long standing cgroup1 permission issue where it only
checks for uid 0 instead of CAP which inadvertently allows
unprivileged userns roots to modify release_agent userhelper
- Fixes for the fallout from Waiman's recent cpuset work
* 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
cgroup-v1: Require capabilities to set release_agent
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy
Jakub Kicinski [Thu, 3 Feb 2022 16:04:15 +0000 (08:04 -0800)]
Merge branch 'net-ipa-enable-register-retention'
Alex Elder says:
====================
net: ipa: enable register retention
With runtime power management in place, we sometimes need to issue
a command to enable retention of IPA register values before power
collapse. This requires a new Device Tree property, whose presence
will also be used to signal that the command is required.
====================
Link: https://lore.kernel.org/r/20220201150205.468403-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 1 Feb 2022 15:02:05 +0000 (09:02 -0600)]
net: ipa: request IPA register values be retained
In some cases, the IPA hardware needs to request the always-on
subsystem (AOSS) to coordinate with the IPA microcontroller to
retain IPA register values at power collapse. This is done by
issuing a QMP request to the AOSS microcontroller. A similar
request ondoes that request.
We must get and hold the "QMP" handle early, because we might get
back EPROBE_DEFER for that. But the actual request should be sent
while we know the IPA clock is active, and when we know the
microcontroller is operational.
Fixes:
1aac309d3207 ("net: ipa: use autosuspend")
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Tue, 1 Feb 2022 15:02:04 +0000 (09:02 -0600)]
dt-bindings: net: qcom,ipa: add optional qcom,qmp property
For some systems, the IPA driver must make a request to ensure that
its registers are retained across power collapse of the IPA hardware.
On such systems, we'll use the existence of the "qcom,qmp" property
as a signal that this request is required.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Waiman Long [Thu, 3 Feb 2022 03:31:03 +0000 (22:31 -0500)]
cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
It was found that a "suspicious RCU usage" lockdep warning was issued
with the rcu_read_lock() call in update_sibling_cpumasks(). It is
because the update_cpumasks_hier() function may sleep. So we have
to release the RCU lock, call update_cpumasks_hier() and reacquire
it afterward.
Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks()
instead of stating that in the comment.
Fixes:
4716909cc5c5 ("cpuset: Track cpusets that use parent's effective_cpus")
Signed-off-by: Waiman Long <longman@redhat.com>
Tested-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Nathan Chancellor [Tue, 1 Feb 2022 21:25:04 +0000 (14:25 -0700)]
tools/resolve_btfids: Do not print any commands when building silently
When building with 'make -s', there is some output from resolve_btfids:
$ make -sj"$(nproc)" oldconfig prepare
MKDIR .../tools/bpf/resolve_btfids/libbpf/
MKDIR .../tools/bpf/resolve_btfids//libsubcmd
LINK resolve_btfids
Silent mode means that no information should be emitted about what is
currently being done. Use the $(silent) variable from Makefile.include
to avoid defining the msg macro so that there is no information printed.
Fixes:
fbbb68de80a4 ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220201212503.731732-1-nathan@kernel.org
John Hubbard [Wed, 2 Feb 2022 03:23:17 +0000 (19:23 -0800)]
Revert "mm/gup: small refactoring: simplify try_grab_page()"
This reverts commit
54d516b1d62ff8f17cee2da06e5e4706a0d00b8a
That commit did a refactoring that effectively combined fast and slow
gup paths (again). And that was again incorrect, for two reasons:
a) Fast gup and slow gup get reference counts on pages in different
ways and with different goals: see Linus' writeup in commit
cd1adf1b63a1 ("Revert "mm/gup: remove try_get_page(), call
try_get_compound_head() directly""), and
b) try_grab_compound_head() also has a specific check for
"FOLL_LONGTERM && !is_pinned(page)", that assumes that the caller
can fall back to slow gup. This resulted in new failures, as
recently report by Will McVicker [1].
But (a) has problems too, even though they may not have been reported
yet. So just revert this.
Link: https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com
Fixes:
54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()")
Reported-and-tested-by: Will McVicker <willmcvicker@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Minchan Kim <minchan@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: stable@vger.kernel.org # 5.15
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 3 Feb 2022 14:45:34 +0000 (06:45 -0800)]
Merge tag 'mips-fixes-5.17_2' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fix missed change for PTR->PTR_WD conversion
- kernel-doc fixes
* tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: KVM: fix vz.c kernel-doc notation
MIPS: octeon: Fix missed PTR->PTR_WD conversion
James Morse [Thu, 27 Jan 2022 12:20:52 +0000 (12:20 +0000)]
KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata
Cortex-A510's erratum #2077057 causes SPSR_EL2 to be corrupted when
single-stepping authenticated ERET instructions. A single step is
expected, but a pointer authentication trap is taken instead. The
erratum causes SPSR_EL1 to be copied to SPSR_EL2, which could allow
EL1 to cause a return to EL2 with a guest controlled ELR_EL2.
Because the conditions require an ERET into active-not-pending state,
this is only a problem for the EL2 when EL2 is stepping EL1. In this case
the previous SPSR_EL2 value is preserved in struct kvm_vcpu, and can be
restored.
Cc: stable@vger.kernel.org # 53960faf2b73: arm64: Add Cortex-A510 CPU part definition
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
[maz: fixup cpucaps ordering]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127122052.1584324-5-james.morse@arm.com
James Morse [Thu, 27 Jan 2022 12:20:51 +0000 (12:20 +0000)]
KVM: arm64: Stop handle_exit() from handling HVC twice when an SError occurs
Prior to commit
defe21f49bc9 ("KVM: arm64: Move PC rollback on SError to
HYP"), when an SError is synchronised due to another exception, KVM
handles the SError first. If the guest survives, the instruction that
triggered the original exception is re-exectued to handle the first
exception. HVC is treated as a special case as the instruction wouldn't
normally be re-exectued, as its not a trap.
Commit
defe21f49bc9 didn't preserve the behaviour of the 'return 1'
that skips the rest of handle_exit().
Since commit
defe21f49bc9, KVM will try to handle the SError and the
original exception at the same time. When the exception was an HVC,
fixup_guest_exit() has already rolled back ELR_EL2, meaning if the
guest has virtual SError masked, it will execute and handle the HVC
twice.
Restore the original behaviour.
Fixes:
defe21f49bc9 ("KVM: arm64: Move PC rollback on SError to HYP")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127122052.1584324-4-james.morse@arm.com
James Morse [Thu, 27 Jan 2022 12:20:50 +0000 (12:20 +0000)]
KVM: arm64: Avoid consuming a stale esr value when SError occur
When any exception other than an IRQ occurs, the CPU updates the ESR_EL2
register with the exception syndrome. An SError may also become pending,
and will be synchronised by KVM. KVM notes the exception type, and whether
an SError was synchronised in exit_code.
When an exception other than an IRQ occurs, fixup_guest_exit() updates
vcpu->arch.fault.esr_el2 from the hardware register. When an SError was
synchronised, the vcpu esr value is used to determine if the exception
was due to an HVC. If so, ELR_EL2 is moved back one instruction. This
is so that KVM can process the SError first, and re-execute the HVC if
the guest survives the SError.
But if an IRQ synchronises an SError, the vcpu's esr value is stale.
If the previous non-IRQ exception was an HVC, KVM will corrupt ELR_EL2,
causing an unrelated guest instruction to be executed twice.
Check ARM_EXCEPTION_CODE() before messing with ELR_EL2, IRQs don't
update this register so don't need to check.
Fixes:
defe21f49bc9 ("KVM: arm64: Move PC rollback on SError to HYP")
Cc: stable@vger.kernel.org
Reported-by: Steven Price <steven.price@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220127122052.1584324-3-james.morse@arm.com
Alexander Stein [Wed, 2 Feb 2022 08:17:55 +0000 (09:17 +0100)]
drm: mxsfb: Fix NULL pointer dereference
mxsfb should not ever dereference the NULL pointer which
drm_atomic_get_new_bridge_state is allowed to return.
Assume a fixed format instead.
Fixes:
b776b0f00f24 ("drm: mxsfb: Use bus_format from the nearest bridge if present")
Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20220202081755.145716-3-alexander.stein@ew.tq-group.com
Jan Beulich [Tue, 1 Feb 2022 10:57:16 +0000 (11:57 +0100)]
x86/Xen: streamline (and fix) PV CPU enumeration
This started out with me noticing that "dom0_max_vcpus=<N>" with <N>
larger than the number of physical CPUs reported through ACPI tables
would not bring up the "excess" vCPU-s. Addressing this is the primary
purpose of the change; CPU maps handling is being tidied only as far as
is necessary for the change here (with the effect of also avoiding the
setting up of too much per-CPU infrastructure, i.e. for CPUs which can
never come online).
Noticing that xen_fill_possible_map() is called way too early, whereas
xen_filter_cpu_maps() is called too late (after per-CPU areas were
already set up), and further observing that each of the functions serves
only one of Dom0 or DomU, it looked like it was better to simplify this.
Use the .get_smp_config hook instead, uniformly for Dom0 and DomU.
xen_fill_possible_map() can be dropped altogether, while
xen_filter_cpu_maps() is re-purposed but not otherwise changed.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/2dbd5f0a-9859-ca2d-085e-a02f7166c610@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Randy Dunlap [Mon, 31 Jan 2022 16:19:59 +0000 (08:19 -0800)]
xen: update missing ioctl magic numers documentation
Add missing ioctl "magic numbers" for various Xen interfaces
(xenbus_dev.h, gntalloc.h, gntdev.h, and privcmd.h).
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lore.kernel.org/r/20220131161959.16509-1-rdunlap@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Demi Marie Obenour [Mon, 31 Jan 2022 17:23:07 +0000 (12:23 -0500)]
Improve docs for IOCTL_GNTDEV_MAP_GRANT_REF
--------------cKY3Ggs6VDUCSn4I6iN78sHA
Content-Type: multipart/mixed; boundary="------------g0T69ASidFiPhh4eOY4XzIg1"
--------------g0T69ASidFiPhh4eOY4XzIg1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
The current implementation of gntdev guarantees that the first call to
IOCTL_GNTDEV_MAP_GRANT_REF will set @index to 0. This is required to
use gntdev for Wayland, which is a future desire of Qubes OS.
Additionally, requesting zero grants results in an error, but this was
not documented either. Document both of these.
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/f66c5a4e-2034-00b5-a635-6983bd999c07@gmail.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Randy Dunlap [Sun, 30 Jan 2022 19:17:05 +0000 (11:17 -0800)]
xen: xenbus_dev.h: delete incorrect file name
It is better/preferred not to include file names in source files
because (a) they are not needed and (b) they can be incorrect,
so just delete this incorrect file name.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel@lists.xenproject.org
Link: https://lore.kernel.org/r/20220130191705.24971-1-rdunlap@infradead.org
Signed-off-by: Juergen Gross <jgross@suse.com>
Hou Tao [Wed, 2 Feb 2022 06:01:58 +0000 (14:01 +0800)]
bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
After commit
2fd3fb0be1d1 ("kasan, vmalloc: unpoison VM_ALLOC pages
after mapping"), non-VM_ALLOC mappings will be marked as accessible
in __get_vm_area_node() when KASAN is enabled. But now the flag for
ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access
after vmap() returns. Because the ringbuf area is created by mapping
allocated pages, so use VM_MAP instead.
After the change, info in /proc/vmallocinfo also changes from
[start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmalloc user
to
[start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmap user
Fixes:
457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com
Uday Shankar [Thu, 20 Jan 2022 20:17:37 +0000 (12:17 -0800)]
nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()
Controller deletion/reset, immediately followed by or concurrent with
a reconnect, is hard failing the connect attempt resulting in a
complete loss of connectivity to the controller.
In the connect request, fabrics looks for an existing controller with
the same address components and aborts the connect if a controller
already exists and the duplicate connect option isn't set. The match
routine filters out controllers that are dead or dying, so they don't
interfere with the new connect request.
When NVME_CTRL_DELETING_NOIO was added, it missed updating the state
filters in the nvmf_ctlr_matches_baseopts() routine. Thus, when in this
new state, it's seen as a live controller and fails the connect request.
Correct by adding the DELETING_NIO state to the match checks.
Fixes:
ecca390e8056 ("nvme: fix deadlock in disconnect during scan_work and/or ana_work")
Cc: <stable@vger.kernel.org> # v5.7+
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Ryan Bair [Wed, 22 Dec 2021 16:04:05 +0000 (11:04 -0500)]
cifs: fix workstation_name for multiuser mounts
Set workstation_name from the master_tcon for multiuser mounts.
Just in case, protect size_of_ntlmssp_blob against a NULL workstation_name.
Fixes:
49bd49f983b5 ("cifs: send workstation name during ntlmssp session setup")
Cc: stable@vger.kernel.org # 5.16
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Ryan Bair <ryandbair@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Rohith Surabattula [Tue, 1 Feb 2022 07:22:02 +0000 (07:22 +0000)]
Invalidate fscache cookie only when inode attributes are changed.
For example if mtime or size has changed.
Signed-off-by: Rohith Surabattula <rohiths@microsoft.com>
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Daniel Borkmann [Tue, 1 Feb 2022 19:39:42 +0000 (20:39 +0100)]
net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work
syzkaller was able to trigger a deadlock for NTF_MANAGED entries [0]:
kworker/0:16/14617 is trying to acquire lock:
ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: ___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652
[...]
but task is already holding lock:
ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: neigh_managed_work+0x35/0x250 net/core/neighbour.c:1572
The neighbor entry turned to NUD_FAILED state, where __neigh_event_send()
triggered an immediate probe as per commit
cd28ca0a3dd1 ("neigh: reduce
arp latency") via neigh_probe() given table lock was held.
One option to fix this situation is to defer the neigh_probe() back to
the neigh_timer_handler() similarly as pre
cd28ca0a3dd1. For the case
of NTF_MANAGED, this deferral is acceptable given this only happens on
actual failure state and regular / expected state is NUD_VALID with the
entry already present.
The fix adds a parameter to __neigh_event_send() in order to communicate
whether immediate probe is allowed or disallowed. Existing call-sites
of neigh_event_send() default as-is to immediate probe. However, the
neigh_managed_work() disables it via use of neigh_event_send_probe().
[0] <TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_deadlock_bug kernel/locking/lockdep.c:2956 [inline]
check_deadlock kernel/locking/lockdep.c:2999 [inline]
validate_chain kernel/locking/lockdep.c:3788 [inline]
__lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5027
lock_acquire kernel/locking/lockdep.c:5639 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
__raw_write_lock_bh include/linux/rwlock_api_smp.h:202 [inline]
_raw_write_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:334
___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652
ip6_finish_output2+0x1070/0x14f0 net/ipv6/ip6_output.c:123
__ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
__ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170
ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
dst_output include/net/dst.h:451 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
ndisc_send_skb+0xa99/0x17f0 net/ipv6/ndisc.c:508
ndisc_send_ns+0x3a9/0x840 net/ipv6/ndisc.c:650
ndisc_solicit+0x2cd/0x4f0 net/ipv6/ndisc.c:742
neigh_probe+0xc2/0x110 net/core/neighbour.c:1040
__neigh_event_send+0x37d/0x1570 net/core/neighbour.c:1201
neigh_event_send include/net/neighbour.h:470 [inline]
neigh_managed_work+0x162/0x250 net/core/neighbour.c:1574
process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
worker_thread+0x657/0x1110 kernel/workqueue.c:2454
kthread+0x2e9/0x3a0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
Fixes:
7482e3841d52 ("net, neigh: Add NTF_MANAGED flag for managed neighbor entries")
Reported-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@nvidia.com>
Tested-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220201193942.5055-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 1 Feb 2022 18:46:40 +0000 (10:46 -0800)]
tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
tcp_shift_skb_data() might collapse three packets into a larger one.
P_A, P_B, P_C -> P_ABC
Historically, it used a single tcp_skb_can_collapse_to(P_A) call,
because it was enough.
In commit
85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions"),
this call was replaced by a call to tcp_skb_can_collapse(P_A, P_B)
But the now needed test over P_C has been missed.
This probably broke MPTCP.
Then later, commit
9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
added an extra condition to tcp_skb_can_collapse(), but the missing call
from tcp_shift_skb_data() is also breaking TCP zerocopy, because P_A and P_C
might have different skb_zcopy_pure() status.
Fixes:
85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions")
Fixes:
9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220201184640.756716-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sergey Shtylyov [Wed, 2 Feb 2022 21:30:38 +0000 (00:30 +0300)]
MAINTAINERS: add myself as PATA drivers reviewer
Add myself as a reviewer for the libata PATA drivers -- there is
activity in this area still... 8-)
Having been hacking on ATA from the early 90s, I think I deserved this
highly responsible position, at last! :-)
Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Christian König [Fri, 28 Jan 2022 12:21:10 +0000 (13:21 +0100)]
drm/amdgpu: fix logic inversion in check
We probably never trigger this, but the logic inside the check is
inverted.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Wed, 26 Jan 2022 03:46:58 +0000 (21:46 -0600)]
drm/amd: avoid suspend on dGPUs w/ s2idle support when runtime PM enabled
dGPUs connected to Intel systems configured for suspend to idle
will not have the power rails cut at suspend and resetting the GPU
may lead to problematic behaviors.
Fixes:
e25443d2765f4 ("drm/amdgpu: add a dev_pm_ops prepare callback (v2)")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1879
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aun-Ali Zaidi [Sat, 29 Jan 2022 05:49:55 +0000 (05:49 +0000)]
drm/amd/display: Force link_rate as LINK_RATE_RBR2 for 2018 15" Apple Retina panels
The eDP link rate reported by the DP_MAX_LINK_RATE dpcd register (0xa) is
contradictory to the highest rate supported reported by
EDID (0xc = LINK_RATE_RBR2). The effects of this compounded with commit
'
4a8ca46bae8a ("drm/amd/display: Default max bpc to 16 for eDP")' results
in no display modes being found and a dark panel.
For now, simply force the maximum supported link rate for the eDP attached
2018 15" Apple Retina panels.
Additionally, we must also check the firmware revision since the device ID
reported by the DPCD is identical to that of the more capable 16,1,
incorrectly quirking it. We also use said firmware check to quirk the
refreshed 15,1 models with Vega graphics as they use a slightly newer
firmware version.
Tested-by: Aun-Ali Zaidi <admin@kodeit.net>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Aun-Ali Zaidi <admin@kodeit.net>
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Zhan Liu [Fri, 28 Jan 2022 14:03:59 +0000 (22:03 +0800)]
drm/amd/display: revert "Reset fifo after enable otg"
[Why]
This change causes regression, that prevents some systems
from lighting up internal displays.
[How]
Revert this patch until a new solution is ready.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Zhan Liu <Zhan.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Paul Hsieh [Fri, 28 Jan 2022 14:03:57 +0000 (22:03 +0800)]
drm/amd/display: watermark latencies is not enough on DCN31
[Why]
The original latencies were causing underflow in some modes.
Resolution: 2880x1620@60p when HDR enable
[How]
1. Replace with the up-to-date watermark values based on new measurments
2. Correct the ddr_wm_table name to DDR5 on DCN31
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com>
Acked-by: Stylon Wang <stylon.wang@amd.com>
Signed-off-by: Paul Hsieh <paul.hsieh@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Agustin Gutierrez [Fri, 28 Jan 2022 22:51:53 +0000 (17:51 -0500)]
drm/amd/display: Update watermark values for DCN301
[Why]
There is underflow / visual corruption DCN301, for high
bandwidth MST DSC configurations such as 2x1440p144 or 2x4k60.
[How]
Use up-to-date watermark values for DCN301.
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Signed-off-by: Agustin Gutierrez <agustin.gutierrez@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Lang Yu [Fri, 28 Jan 2022 10:24:53 +0000 (18:24 +0800)]
drm/amdgpu: fix a potential GPU hang on cyan skillfish
We observed a GPU hang when querying GMC CG state(i.e.,
cat amdgpu_pm_info) on cyan skillfish. Acctually, cyan
skillfish doesn't support any CG features.
Just prevent it from accessing GMC CG registers.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Mario Limonciello [Wed, 26 Jan 2022 03:37:57 +0000 (21:37 -0600)]
drm/amd: Only run s3 or s0ix if system is configured properly
This will cause misconfigured systems to not run the GPU suspend
routines.
* In APUs that are properly configured system will go into s2idle.
* In APUs that are intended to be S3 but user selects
s2idle the GPU will stay fully powered for the suspend.
* In APUs that are intended to be s2idle and system misconfigured
the GPU will stay fully powered for the suspend.
* In systems that are intended to be s2idle, but AMD dGPU is also
present, the dGPU will go through S3
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mario Limonciello [Wed, 26 Jan 2022 03:35:09 +0000 (21:35 -0600)]
drm/amd: add support to check whether the system is set to s3
This will be used to help make decisions on what to do in
misconfigured systems.
v2: squash in semicolon fix from Stephen Rothwell
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Linus Torvalds [Wed, 2 Feb 2022 18:14:31 +0000 (10:14 -0800)]
Merge tag 'nfsd-5.17-1' of git://git./linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
"Notable bug fixes:
- Ensure SM_NOTIFY doesn't crash the NFS server host
- Ensure NLM locks are cleaned up after client reboot
- Fix a leak of internal NFSv4 lease information"
* tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
lockd: fix failure to cleanup client locks
lockd: fix server crash on reboot of client holding lock
Song Liu [Wed, 2 Feb 2022 17:24:10 +0000 (09:24 -0800)]
md: fix NULL pointer deref with nowait but no mddev->queue
Leon reported NULL pointer deref with nowait support:
[ 15.123761] device-mapper: raid: Loading target version 1.15.1
[ 15.124185] device-mapper: raid: Ignoring chunk size parameter for RAID 1
[ 15.124192] device-mapper: raid: Choosing default region size of 4MiB
[ 15.129524] BUG: kernel NULL pointer dereference, address:
0000000000000060
[ 15.129530] #PF: supervisor write access in kernel mode
[ 15.129533] #PF: error_code(0x0002) - not-present page
[ 15.129535] PGD 0 P4D 0
[ 15.129538] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 15.129541] CPU: 5 PID: 494 Comm: ldmtool Not tainted 5.17.0-rc2-1-mainline #1
9fe89d43dfcb215d2731e6f8851740520778615e
[ 15.129546] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F36e 10/14/2021
[ 15.129549] RIP: 0010:blk_queue_flag_set+0x7/0x20
[ 15.129555] Code: 00 00 00 0f 1f 44 00 00 48 8b 35 e4 e0 04 02 48 8d 57 28 bf 40 01 \
00 00 e9 16 c1 be ff 66 0f 1f 44 00 00 0f 1f 44 00 00 89 ff <f0> 48 0f ab 7e 60 \
31 f6 89 f7 c3 66 66 2e 0f 1f 84 00 00 00 00 00
[ 15.129559] RSP: 0018:
ffff966b81987a88 EFLAGS:
00010202
[ 15.129562] RAX:
ffff8b11c363a0d0 RBX:
ffff8b11e294b070 RCX:
0000000000000000
[ 15.129564] RDX:
0000000000000000 RSI:
0000000000000000 RDI:
000000000000001d
[ 15.129566] RBP:
ffff8b11e294b058 R08:
0000000000000000 R09:
0000000000000000
[ 15.129568] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff8b11e294b070
[ 15.129570] R13:
0000000000000000 R14:
ffff8b11e294b000 R15:
0000000000000001
[ 15.129572] FS:
00007fa96e826780(0000) GS:
ffff8b18deb40000(0000) knlGS:
0000000000000000
[ 15.129575] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 15.129577] CR2:
0000000000000060 CR3:
000000010b8ce000 CR4:
00000000003506e0
[ 15.129580] Call Trace:
[ 15.129582] <TASK>
[ 15.129584] md_run+0x67c/0xc70 [md_mod
1e470c1b6bcf1114198109f42682f5a2740e9531]
[ 15.129597] raid_ctr+0x134a/0x28ea [dm_raid
6a645dd7519e72834bd7e98c23497eeade14cd63]
[ 15.129604] ? dm_split_args+0x63/0x150 [dm_mod
0d7b0bc3414340a79c4553bae5ca97294b78336e]
[ 15.129615] dm_table_add_target+0x188/0x380 [dm_mod
0d7b0bc3414340a79c4553bae5ca97294b78336e]
[ 15.129625] table_load+0x13b/0x370 [dm_mod
0d7b0bc3414340a79c4553bae5ca97294b78336e]
[ 15.129635] ? dev_suspend+0x2d0/0x2d0 [dm_mod
0d7b0bc3414340a79c4553bae5ca97294b78336e]
[ 15.129644] ctl_ioctl+0x1bd/0x460 [dm_mod
0d7b0bc3414340a79c4553bae5ca97294b78336e]
[ 15.129655] dm_ctl_ioctl+0xa/0x20 [dm_mod
0d7b0bc3414340a79c4553bae5ca97294b78336e]
[ 15.129663] __x64_sys_ioctl+0x8e/0xd0
[ 15.129667] do_syscall_64+0x5c/0x90
[ 15.129672] ? syscall_exit_to_user_mode+0x23/0x50
[ 15.129675] ? do_syscall_64+0x69/0x90
[ 15.129677] ? do_syscall_64+0x69/0x90
[ 15.129679] ? syscall_exit_to_user_mode+0x23/0x50
[ 15.129682] ? do_syscall_64+0x69/0x90
[ 15.129684] ? do_syscall_64+0x69/0x90
[ 15.129686] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 15.129689] RIP: 0033:0x7fa96ecd559b
[ 15.129692] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c \
c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff \
ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48
[ 15.129696] RSP: 002b:
00007ffcaf85c258 EFLAGS:
00000206 ORIG_RAX:
0000000000000010
[ 15.129699] RAX:
ffffffffffffffda RBX:
00007fa96f1b48f0 RCX:
00007fa96ecd559b
[ 15.129701] RDX:
00007fa97017e610 RSI:
00000000c138fd09 RDI:
0000000000000003
[ 15.129702] RBP:
00007fa96ebab583 R08:
00007fa97017c9e0 R09:
00007ffcaf85bf27
[ 15.129704] R10:
0000000000000001 R11:
0000000000000206 R12:
00007fa97017e610
[ 15.129706] R13:
00007fa97017e640 R14:
00007fa97017e6c0 R15:
00007fa97017e530
[ 15.129709] </TASK>
This is caused by missing mddev->queue check for setting QUEUE_FLAG_NOWAIT
Fix this by moving the QUEUE_FLAG_NOWAIT logic to under mddev->queue check.
Fixes:
f51d46d0e7cb ("md: add support for REQ_NOWAIT")
Reported-by: Leon Möller <jkhsjdhjs@totally.rip>
Tested-by: Leon Möller <jkhsjdhjs@totally.rip>
Cc: Vishal Verma <vverma@digitalocean.com>
Signed-off-by: Song Liu <song@kernel.org>
Linus Torvalds [Wed, 2 Feb 2022 18:08:52 +0000 (10:08 -0800)]
Merge tag 'fsnotify_for_v5.17-rc3' of git://git./linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"Fix stale file descriptor in copy_event_to_user"
* tag 'fsnotify_for_v5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: Fix stale file descriptor in copy_event_to_user()
Linus Torvalds [Wed, 2 Feb 2022 18:00:08 +0000 (10:00 -0800)]
Merge tag 'linux-kselftest-kunit-fixes-5.17-rc3' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull KUnit fixes from Shuah Khan:
"A single fix to an error seen on qemu due to a missing import"
* tag 'linux-kselftest-kunit-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: tool: Import missing importlib.abc
Ilya Dryomov [Thu, 30 Dec 2021 14:13:32 +0000 (15:13 +0100)]
libceph: optionally use bounce buffer on recv path in crc mode
Both msgr1 and msgr2 in crc mode are zero copy in the sense that
message data is read from the socket directly into the destination
buffer. We assume that the destination buffer is stable (i.e. remains
unchanged while it is being read to) though. Otherwise, CRC errors
ensue:
libceph: read_partial_message
0000000048edf8ad data crc
1063286393 != exp.
228122706
libceph: osd1 (1)192.168.122.1:6843 bad crc/signature
libceph: bad data crc, calculated
57958023, expected
1805382778
libceph: osd2 (2)192.168.122.1:6876 integrity error, bad crc
Introduce rxbounce option to enable use of a bounce buffer when
receiving message data. In particular this is needed if a mapped
image is a Windows VM disk, passed to QEMU. Windows has a system-wide
"dummy" page that may be mapped into the destination buffer (potentially
more than once into the same buffer) by the Windows Memory Manager in
an effort to generate a single large I/O [1][2]. QEMU makes a point of
preserving overlap relationships when cloning I/O vectors, so krbd gets
exposed to this behaviour.
[1] "What Is Really in That MDL?"
https://docs.microsoft.com/en-us/previous-versions/windows/hardware/design/dn614012(v=vs.85)
[2] https://blogs.msmvps.com/kernelmustard/2005/05/04/dummy-pages/
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1973317
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>