Krzysztof Kozlowski [Sat, 10 Sep 2022 09:14:28 +0000 (11:14 +0200)]
dt-bindings: soc: qcom: apr: add missing properties
The APR bindings were not describing all properties already used in DTS:
1. Add qcom,glink-channels, qcom,smd-channels and qcom,intents (widely
used).
2. Add power-domains for MSM8996.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220910091428.50418-16-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Krzysztof Kozlowski [Sat, 10 Sep 2022 09:14:27 +0000 (11:14 +0200)]
ASoC: dt-bindings: qcom,q6apm-dai: adjust indentation in example
Cleanup the example DTS by fixing indentation to 4-spaces and adding
blank lines for readability.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220910091428.50418-15-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Krzysztof Kozlowski [Sat, 10 Sep 2022 09:14:26 +0000 (11:14 +0200)]
ASoC: dt-bindings: qcom,q6dsp-lpass-clocks: cleanup example
Cleanup the example DTS by adding APR and service compatibles, adding
typical properties, using proper device node names for services and
fixing indentation to 4-spaces.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220910091428.50418-14-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Krzysztof Kozlowski [Sat, 10 Sep 2022 09:14:25 +0000 (11:14 +0200)]
ASoC: dt-bindings: qcom,q6dsp-lpass-ports: cleanup example
Cleanup the example DTS by adding APR and service compatibles, adding
typical properties, using proper device node names for services and
fixing indentation to 4-spaces.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220910091428.50418-13-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Krzysztof Kozlowski [Sat, 10 Sep 2022 09:14:24 +0000 (11:14 +0200)]
ASoC: dt-bindings: qcom,q6adm: convert to dtschema
Convert Qualcomm Audio Device Manager (Q6ADM) bindings to DT schema.
The original bindings documented:
1. APR service node with compatibles: "qcom,q6adm" and
"qcom,q6adm-v<MAJOR-NUMBER>.<MINOR-NUMBER>",
2. Routing child node with compatible "qcom,q6adm-routing".
The conversion entirely drops (1) because the compatible is already
documented in bindings/soc/qcom/qcom,apr.yaml. The
"qcom,q6adm-v<MAJOR-NUMBER>.<MINOR-NUMBER>" on the other hand is not
used at all - neither in existing DTS, nor in downstream sources - so
versions seems to be fully auto-detectable.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220910091428.50418-12-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Krzysztof Kozlowski [Sat, 10 Sep 2022 09:14:23 +0000 (11:14 +0200)]
ASoC: dt-bindings: qcom,q6asm: convert to dtschema
Convert Qualcomm Audio Stream Manager (Q6ASM) bindings to DT schema.
The original bindings documented:
1. APR service node with compatibles: "qcom,q6asm" and
"qcom,q6asm-v<MAJOR-NUMBER>.<MINOR-NUMBER>",
2. actual DAIs child node with compatible "qcom,q6asm-dais".
The conversion entirely drops (1) because the compatible is already
documented in bindings/soc/qcom/qcom,apr.yaml. The
"qcom,q6asm-v<MAJOR-NUMBER>.<MINOR-NUMBER>" on the other hand is not
used at all - neither in existing DTS, nor in downstream sources - so
versions seems to be fully auto-detectable.
Another change done in conversion is adding "iommus" property, which is
already used in DTS and Linux driver.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220910091428.50418-11-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Krzysztof Kozlowski [Sat, 10 Sep 2022 09:14:22 +0000 (11:14 +0200)]
dt-bindings: soc: qcom: apr: correct service children
The APR bindings were not describing properly children nodes for DAIs.
None of the DTSes use unit addresses for the children, so correct the
nodes and reference their schema: clock-controller, dais and routing.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220910091428.50418-10-krzysztof.kozlowski@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Linus Torvalds [Sun, 11 Sep 2022 20:22:01 +0000 (16:22 -0400)]
Linux 6.0-rc5
Linus Torvalds [Sun, 11 Sep 2022 19:16:47 +0000 (15:16 -0400)]
Merge tag 'kbuild-fixes-v6.0-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Remove unused scripts/gcc-ld script
- Add zstd support to scripts/extract-ikconfig
- Check 'make headers' for UML
- Fix scripts/mksysmap to ignore local symbols
* tag 'kbuild-fixes-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
mksysmap: Fix the mismatch of 'L0' symbols in System.map
kbuild: disable header exports for UML in a straightforward way
scripts/extract-ikconfig: add zstd compression support
scripts: remove obsolete gcc-ld script
Linus Torvalds [Sun, 11 Sep 2022 11:48:21 +0000 (07:48 -0400)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Three small arm64 fixes, all related to optional architecture
extensions: BTI, SME and 52-bit virtual addressing:
- Disable in-kernel BTI when compiling with GCC, as it makes invalid
assumptions about the distance between functions which has led to
crashes when calling modules on a CPU with BTI support
- Remove bogus TIF_SME flag management if memory allocation fails in
the ptrace code
- Fix the resume path when configured for 52-bit virtual addressing"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: fix resume for 52-bit enabled builds
arm64/ptrace: Don't clear calling process' TIF_SME on OOM
arm64/bti: Disable in kernel BTI when cross section thunks are broken
Linus Torvalds [Sun, 11 Sep 2022 11:39:03 +0000 (07:39 -0400)]
Merge tag 'i2c-for-6.0-rc5' of git://git./linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Only documentation and DT binding fixes and improvements"
* tag 'i2c-for-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
dt-bindings: i2c: renesas,riic: Fix 'unevaluatedProperties' warnings
docs: i2c: piix4: Fix typos, add markup, drop link
docs: i2c: i2c-topology: reorder sections more logically
docs: i2c: i2c-topology: fix incorrect heading
docs: i2c: i2c-topology: fix typo
Linus Torvalds [Sun, 11 Sep 2022 11:32:26 +0000 (07:32 -0400)]
Merge tag 'iommu-fixes-v6.0-rc4' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
- Intel VT-d fixes from Lu Baolu:
- Boot kdump kernels with VT-d scalable mode on
- Calculate the right page table levels
- Fix two recursive locking issues
- Fix a lockdep splat issue
- AMD IOMMU fixes:
- Fix for completion-wait command to use full 64 bits of data
- Fix PASID related issue where GPU sound devices failed to
initialize
- Fix for Virtio-IOMMU to report correct caching behavior, needed for
use with VFIO
* tag 'iommu-fixes-v6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix false ownership failure on AMD systems with PASID activated
iommu/vt-d: Fix possible recursive locking in intel_iommu_init()
iommu/virtio: Fix interaction with VFIO
iommu/vt-d: Fix lockdep splat due to klist iteration in atomic context
iommu/vt-d: Fix recursive lock issue in iommu_flush_dev_iotlb()
iommu/vt-d: Correctly calculate sagaw value of IOMMU
iommu/vt-d: Fix kdump kernels boot failure with scalable mode
iommu/amd: use full 64-bit value in build_completion_wait()
Linus Torvalds [Sun, 11 Sep 2022 11:21:56 +0000 (07:21 -0400)]
Merge tag 'mips-fixes_6.0_1' of git://git./linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fix for loongson32 starup hang
- fix for octeon irq setup problem
- fix compiler warning for new CONFIG option
- switch to SPARSEMEM_EXTREME for all platforms selecting SPARSEMEM
* tag 'mips-fixes_6.0_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
mips: Select SPARSEMEM_EXTREME
MIPS: OCTEON: irq: Fix octeon_irq_force_ciu_mapping()
MIPS: octeon: Get rid of preprocessor directives around RESERVE32
MIPS: loongson32: ls1c: Fix hang during startup
Jason Gunthorpe [Fri, 9 Sep 2022 19:46:31 +0000 (16:46 -0300)]
iommu: Fix false ownership failure on AMD systems with PASID activated
The AMD IOMMU driver cannot activate PASID mode on a RID without the RID's
translation being set to IDENTITY. Further it requires changing the RID's
page table layout from the normal v1 IOMMU_DOMAIN_IDENTITY layout to a
different v2 layout.
It does this by creating a new iommu_domain, configuring that domain for
v2 identity operation and then attaching it to the group, from within the
driver. This logic assumes the group is already set to the IDENTITY domain
and is being used by the DMA API.
However, since the ownership logic is based on the group's domain pointer
equaling the default domain to detect DMA API ownership, this causes it to
look like the group is not attached to the DMA API any more. This blocks
attaching drivers to any other devices in the group.
In a real system this manifests itself as the HD-audio devices on some AMD
platforms losing their device drivers.
Work around this unique behavior of the AMD driver by checking for
equality of IDENTITY domains based on their type, not their pointer
value. This allows the AMD driver to have two IDENTITY domains for
internal purposes without breaking the check.
Have the AMD driver properly declare that the special domain it created is
actually an IDENTITY domain.
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: stable@vger.kernel.org
Fixes:
512881eacfa7 ("bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management")
Reported-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0-v1-ea566e16b06b+811-amd_owner_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Sun, 11 Sep 2022 03:18:45 +0000 (11:18 +0800)]
iommu/vt-d: Fix possible recursive locking in intel_iommu_init()
The global rwsem dmar_global_lock was introduced by commit
3a5670e8ac932
("iommu/vt-d: Introduce a rwsem to protect global data structures"). It
is used to protect DMAR related global data from DMAR hotplug operations.
The dmar_global_lock used in the intel_iommu_init() might cause recursive
locking issue, for example, intel_iommu_get_resv_regions() is taking the
dmar_global_lock from within a section where intel_iommu_init() already
holds it via probe_acpi_namespace_devices().
Using dmar_global_lock in intel_iommu_init() could be relaxed since it is
unlikely that any IO board must be hot added before the IOMMU subsystem is
initialized. This eliminates the possible recursive locking issue by moving
down DMAR hotplug support after the IOMMU is initialized and removing the
uses of dmar_global_lock in intel_iommu_init().
Fixes:
d5692d4af08cd ("iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()")
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/894db0ccae854b35c73814485569b634237b5538.1657034828.git.robin.murphy@arm.com
Link: https://lore.kernel.org/r/20220718235325.3952426-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Linus Torvalds [Sat, 10 Sep 2022 17:19:31 +0000 (13:19 -0400)]
Merge tag 's390-6.0-4' of git://git./linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix absolute zero lowcore corruption on kdump when CPU0 is offline
- Fix lowcore protection setup for offline CPU restart
* tag 's390-6.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/smp: enforce lowcore protection on CPU restart
s390/boot: fix absolute zero lowcore corruption on boot
Linus Torvalds [Sat, 10 Sep 2022 17:02:10 +0000 (13:02 -0400)]
Merge tag 'hwmon-for-v6.0-rc5' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix severe regression in asus-ec-sensors driver
which resulted in EC driver failures
- Fix various bugs in mr75203 driver
- Fix byte order bug in tps23861 driver
* tag 'hwmon-for-v6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (asus-ec-sensors) autoload module via DMI data
hwmon: (mr75203) enable polling for all VM channels
hwmon: (mr75203) fix multi-channel voltage reading
hwmon: (mr75203) fix voltage equation for negative source input
hwmon: (mr75203) update pvt->v_num and vm_num to the actual number of used sensors
hwmon: (mr75203) fix VM sensor allocation when "intel,vm-map" not defined
dt-bindings: hwmon: (mr75203) fix "intel,vm-map" property to be optional
hwmon: (tps23861) fix byte order in resistance register
Linus Torvalds [Sat, 10 Sep 2022 16:18:19 +0000 (12:18 -0400)]
Merge tag 'dma-mapping-6.0-2022-09-10' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- revert a panic on swiotlb initialization failure (Yu Zhao)
- fix the lookup for partial syncs in dma-debug (Robin Murphy)
- fix a shift overflow in swiotlb (Chao Gao)
- fix a comment typo in swiotlb (Chao Gao)
- mark a function static now that all abusers are gone (Christoph
Hellwig)
* tag 'dma-mapping-6.0-2022-09-10' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: mark dma_supported static
swiotlb: fix a typo
swiotlb: avoid potential left shift overflow
dma-debug: improve search for partial syncs
Revert "swiotlb: panic if nslabs is too small"
Joey Gouly [Fri, 9 Sep 2022 12:43:11 +0000 (13:43 +0100)]
arm64: mm: fix resume for 52-bit enabled builds
__cpu_setup() was changed to take the actual number of VA bits in x0,
however the resume path was not updated at the same time.
Load `vabits_actual` in the resume path, to ensure that the correct
number of VA bits is used.
This fixes booting v6.0-rc kernels on my Juno.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Fixes:
0aaa68532e9d ("arm64: mm: fix booting with 52-bit address space")
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220909124311.38489-1-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Linus Torvalds [Fri, 9 Sep 2022 21:40:28 +0000 (17:40 -0400)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Eight patches which looks like quite a large core change, but most of
the diffstat is reverting the attempt to rejig reference counting
introduced in the last merge window which caused issues with device
and module removal.
Of the remaining four patches, only the fix use-after-free is
substantial"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mpt3sas: Fix use-after-free warning
scsi: core: Fix a use-after-free
scsi: core: Revert "Make sure that targets outlive devices"
scsi: core: Revert "Make sure that hosts outlive targets"
scsi: core: Revert "Simplify LLD module reference counting"
scsi: core: Revert "Call blk_mq_free_tag_set() earlier"
scsi: lpfc: Add missing destroy_workqueue() in error path
scsi: lpfc: Return DID_TRANSPORT_DISRUPTED instead of DID_REQUEUE
Youling Tang [Thu, 1 Sep 2022 11:10:59 +0000 (19:10 +0800)]
mksysmap: Fix the mismatch of 'L0' symbols in System.map
When System.map was generated, the kernel used mksysmap to filter the
kernel symbols, we need to filter "L0" symbols in LoongArch architecture.
$ cat System.map | grep L0
9000000000221540 t L0
The L0 symbol exists in System.map, but not in .tmp_System.map. When
"cmp -s System.map .tmp_System.map" will show "Inconsistent kallsyms
data" error message in link-vmlinux.sh script.
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Linus Torvalds [Fri, 9 Sep 2022 19:08:40 +0000 (15:08 -0400)]
Merge tag 'driver-core-6.0-rc5' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some small driver core and debugfs fixes for 6.0-rc5.
Included in here are:
- multiple attempts to get the arch_topology code to work properly on
non-cluster SMT systems. First attempt caused build breakages in
linux-next and 0-day, second try worked.
- debugfs fixes for a long-suffering memory leak. The pattern of
debugfs_remove(debugfs_lookup(...)) turns out to leak dentries, so
add debugfs_lookup_and_remove() to fix this problem. Also fix up
the scheduler debug code that highlighted this problem. Fixes for
other subsystems will be trickling in over the next few months for
this same issue once the debugfs function is merged.
All of these have been in linux-next since Wednesday with no reported
problems"
* tag 'driver-core-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
arch_topology: Make cluster topology span at least SMT CPUs
sched/debug: fix dentry leak in update_sched_domain_debugfs
debugfs: add debugfs_lookup_and_remove()
driver core: fix driver_set_override() issue with empty strings
Revert "arch_topology: Make cluster topology span at least SMT CPUs"
arch_topology: Make cluster topology span at least SMT CPUs
Linus Torvalds [Fri, 9 Sep 2022 19:03:08 +0000 (15:03 -0400)]
Merge tag 'block-6.0-2022-09-09' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull via Christoph:
- fix a use after free in nvmet (Bart Van Assche)
- fix a use after free when detecting digest errors
(Sagi Grimberg)
- fix regression that causes sporadic TCP requests to time out
(Sagi Grimberg)
- fix two off by ones errors in the nvmet ZNS support
(Dennis Maisenbacher)
- requeue aen after firmware activation (Keith Busch)
- Fix missing request flags in debugfs code (me)
- Partition scan fix (Ming)
* tag 'block-6.0-2022-09-09' of git://git.kernel.dk/linux-block:
block: add missing request flags to debugfs code
nvme: requeue aen after firmware activation
nvmet: fix mar and mor off-by-one errors
nvme-tcp: fix regression that causes sporadic requests to time out
nvme-tcp: fix UAF when detecting digest errors
nvmet: fix a use-after-free
block: don't add partitions if GD_SUPPRESS_PART_SCAN is set
Linus Torvalds [Fri, 9 Sep 2022 18:57:18 +0000 (14:57 -0400)]
Merge tag 'io_uring-6.0-2022-09-09' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Removed function that became unused after last week's merge (Jiapeng)
- Two small fixes for kbuf recycling (Pavel)
- Include address copy for zc send for POLLFIRST (Pavel)
- Fix for short IO handling in the normal read/write path (Pavel)
* tag 'io_uring-6.0-2022-09-09' of git://git.kernel.dk/linux-block:
io_uring/rw: fix short rw error handling
io_uring/net: copy addr for zc on POLL_FIRST
io_uring: recycle kbuf recycle on tw requeue
io_uring/kbuf: fix not advancing READV kbuf ring
io_uring/notif: Remove the unused function io_notif_complete()
Linus Torvalds [Fri, 9 Sep 2022 18:46:44 +0000 (14:46 -0400)]
Merge tag 'for-linus' of git://git./linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Many bug fixes in several drivers:
- Fix misuse of the DMA API in rtrs
- Several irdma issues: hung task due to SQ flushing, incorrect
capability reporting to userspace, improper error handling for MW
corners, touching an uninitialized SGL for during invalidation.
- hns was using the wrong page size limits for the HW, an incorrect
calculation of wqe_shift causing WQE corruption, and mis computed a
timer id.
- Fix a crash in SRP triggered by blktests
- Fix compiler errors by calling virt_to_page() with the proper type
in siw
- Userspace triggerable deadlock in ODP
- mlx5 could use the wrong profile due to some driver loading races,
counters were not working in some device configurations, and a
crash on error unwind"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/irdma: Report RNR NAK generation in device caps
RDMA/irdma: Use s/g array in post send only when its valid
RDMA/irdma: Return correct WC error for bind operation failure
RDMA/irdma: Return error on MR deregister CQP failure
RDMA/irdma: Report the correct max cqes from query device
MAINTAINERS: Update maintainers of HiSilicon RoCE
RDMA/mlx5: Fix UMR cleanup on error flow of driver init
RDMA/mlx5: Set local port to one when accessing counters
RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile
IB/core: Fix a nested dead lock as part of ODP flow
RDMA/siw: Pass a pointer to virt_to_page()
RDMA/srp: Set scmnd->result only when scmnd is not NULL
RDMA/hns: Remove the num_qpc_timer variable
RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shift
RDMA/hns: Fix supported page size
RDMA/cma: Fix arguments order in net device validation
RDMA/irdma: Fix drain SQ hang with no completion
RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGL
RDMA/rtrs-clt: Use the right sg_cnt after ib_dma_map_sg
Linus Torvalds [Fri, 9 Sep 2022 18:35:22 +0000 (14:35 -0400)]
Merge tag 'drm-fixes-2022-09-10' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"From a train in the Irish countryside, regular drm fixes for 6.0-rc5.
This is mostly amdgpu/amdkfd and i915 fixes, then one panfrost, one
ttm and one edid fix. Nothing too major going on. Hopefully a quiet
week next week for LPC.
edid:
- Fix EDID 1.4 range-descriptor parsing
ttm:
- Fix ghost-object bulk moves
i915:
- Fix MIPI sequence block copy from BIOS' table
- Fix PCODE min freq setup when GuC's SLPC is in use
- Implement Workaround for eDP
- Fix has_flat_ccs selection for DG1
amdgpu:
- Firmware header fix
- SMU 13.x fix
- Debugfs memory leak fix
- NBIO 7.7 fix
- Firmware memory leak fix
amdkfd:
- Debug output fix
panfrost:
- Fix devfreq OPP"
* tag 'drm-fixes-2022-09-10' of git://anongit.freedesktop.org/drm/drm:
drm/panfrost: devfreq: set opp to the recommended one to configure regulator
drm/ttm: cleanup the resource of ghost objects after locking them
drm/amdgpu: prevent toc firmware memory leak
drm/amdgpu: correct doorbell range/size value for CSDMA_DOORBELL_RANGE
drm/amdkfd: print address in hex format rather than decimal
drm/amd/display: fix memory leak when using debugfs_lookup()
drm/amd/pm: add missing SetMGpuFanBoostLimitRpm mapping for SMU 13.0.7
drm/amd/amdgpu: add rlc_firmware_header_v2_4 to amdgpu_firmware_header
drm/i915: consider HAS_FLAT_CCS() in needs_ccs_pages
drm/i915: Implement WaEdpLinkRateDataReload
drm/i915/slpc: Let's fix the PCODE min freq table setup for SLPC
drm/i915/bios: Copy the whole MIPI sequence block
drm/ttm: update bulk move object of ghost BO
drm/edid: Handle EDID 1.4 range descriptor h/vfreq offsets
Linus Torvalds [Fri, 9 Sep 2022 18:13:36 +0000 (14:13 -0400)]
Merge tag 'linux-kselftest-kunit-fixes-6.0-rc5' of git://git./linux/kernel/git/shuah/linux-kselftest
Pull KUnit fixes from Shuah Khan:
"Two fixes to test build and a fix for incorrect taint reason reporting"
* tag 'linux-kselftest-kunit-fixes-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
tools: Add new "test" taint to kernel-chktaint
kunit: fix Kconfig for build-in tests USB4 and Nitro Enclaves
kunit: fix assert_type for comparison macros
Linus Torvalds [Fri, 9 Sep 2022 18:06:10 +0000 (14:06 -0400)]
Merge tag 'riscv-for-linus-6.0-rc5' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A pair of device tree fixes for the Polarfire SOC
- A fix to avoid overflowing the PMU counter array when firmware
incorrectly reports the number of supported counters, which manifests
on OpenSBI versions prior to 1.1
* tag 'riscv-for-linus-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
perf: RISC-V: fix access beyond allocated array
riscv: dts: microchip: use an mpfs specific l2 compatible
dt-bindings: riscv: sifive-l2: add a PolarFire SoC compatible
Linus Torvalds [Fri, 9 Sep 2022 18:00:45 +0000 (14:00 -0400)]
Merge tag 'powerpc-6.0-5' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
- Fix crashes on bare metal due to the new plkps driver trying to probe
and call the hypervisor on non-pseries machines.
Thanks to Nathan Chancellor and Dan Horák.
* tag 'powerpc-6.0-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Fix plpks crash on non-pseries
Eugene Shalygin [Fri, 9 Sep 2022 15:56:53 +0000 (17:56 +0200)]
hwmon: (asus-ec-sensors) autoload module via DMI data
Replace autoloading data based on the ACPI EC device with the DMI
records for motherboards models. The ACPI method created a bug that when
this driver returns error from the probe function because of the
unsupported motherboard model, the ACPI subsystem concludes
that the EC device does not work properly.
Fixes:
5cd29012028d ("hwmon: (asus-ec-sensors) introduce ec_board_info struct for board data")
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=216412
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2121844
Signed-off-by: Eugene Shalygin <eugene.shalygin@gmail.com>
Link: https://lore.kernel.org/r/20220909155654.123398-2-eugene.shalygin@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Dave Airlie [Fri, 9 Sep 2022 15:37:01 +0000 (01:37 +1000)]
Merge tag 'drm-intel-fixes-2022-09-08' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix MIPI sequence block copy from BIOS' table. (Ville)
- Fix PCODE min freq setup when GuC's SLPC is in use. (Rodrigo)
- Implement Workaround for eDP. (Ville)
- Fix has_flat_ccs selection for DG1. (Matt)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Yxn1WpmUJnJpqq23@intel.com
Alexander Sverdlin [Fri, 9 Sep 2022 09:30:42 +0000 (11:30 +0200)]
mips: Select SPARSEMEM_EXTREME
Commit
c46173183657 ("MIPS: Add NUMA support for Loongson-3") has increased
.bss size of the Octeon kernel from 16k to 16M. Providing the conditions
for SPARSEMEM_EXTREME avoids the waste of memory.
Thomas has tested the loogsoon64 kernel, where .bss is being reduced by
this patch from 16.5M to 515k.
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Dave Airlie [Fri, 9 Sep 2022 15:30:02 +0000 (01:30 +1000)]
Merge tag 'drm-misc-fixes-2022-09-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Short summary of fixes pull:
* edid: Fix EDID 1.4 range-descriptor parsing
* panfrost: Fix devfreq OPP
* ttm: Fix ghost-object bulk moves
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YxniKN4rK4qPp+J9@linux-uq9g
Pavel Begunkov [Fri, 9 Sep 2022 11:11:49 +0000 (12:11 +0100)]
io_uring/rw: fix short rw error handling
We have a couple of problems, first reports of unexpected link breakage
for reads when cqe->res indicates that the IO was done in full. The
reason here is partial IO with retries.
TL;DR; we compare the result in __io_complete_rw_common() against
req->cqe.res, but req->cqe.res doesn't store the full length but rather
the length left to be done. So, when we pass the full corrected result
via kiocb_done() -> __io_complete_rw_common(), it fails.
The second problem is that we don't try to correct res in
io_complete_rw(), which, for instance, might be a problem for O_DIRECT
but when a prefix of data was cached in the page cache. We also
definitely don't want to pass a corrected result into io_rw_done().
The fix here is to leave __io_complete_rw_common() alone, always pass
not corrected result into it and fix it up as the last step just before
actually finishing the I/O.
Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://github.com/axboe/liburing/issues/643
Reported-by: Beld Zhang <beldzhang@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Jens Axboe [Thu, 8 Sep 2022 23:26:59 +0000 (17:26 -0600)]
block: add missing request flags to debugfs code
We're missing TIMED_OUT and RESV. Particularly the former is handy
for debugging, let's get them added.
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Fri, 9 Sep 2022 11:54:19 +0000 (07:54 -0400)]
Merge tag 'for-6.0-rc4-tag' of git://git./linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes to zoned mode and one regression fix for chunk limit:
- Zoned mode fixes:
- fix how wait/wake up is done when finishing zone
- fix zone append limit in emulated mode
- fix mount on devices with conventional zones
- fix regression, user settable data chunk limit got accidentally
lowered and causes allocation problems on some profiles (raid0,
raid1)"
* tag 'for-6.0-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix the max chunk size and stripe length calculation
btrfs: zoned: fix mounting with conventional zones
btrfs: zoned: set pseudo max append zone limit in zone emulation mode
btrfs: zoned: fix API misuse of zone finish waiting
Linus Torvalds [Fri, 9 Sep 2022 11:44:33 +0000 (07:44 -0400)]
Merge tag 'vfio-v6.0-rc5' of https://github.com/awilliam/linux-vfio
Pull VFIO fix from Alex Williamson:
- Fix zero page refcount leak (Alex Williamson)
* tag 'vfio-v6.0-rc5' of https://github.com/awilliam/linux-vfio:
vfio/type1: Unpin zero pages
Linus Torvalds [Fri, 9 Sep 2022 11:36:10 +0000 (07:36 -0400)]
Merge tag 'sound-6.0-rc5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Lots of small fixes for various drivers at this time, hopefully it
will be the last big bump before 6.0 release.
The significant changes are regression fixes for (yet again) HD-audio
memory allocations and USB-audio PCM parameter handling, while there
are many small ASoC device-specific fixes as well as a few
out-of-bounds and race issues spotted by fuzzers"
* tag 'sound-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (29 commits)
ALSA: usb-audio: Clear fixed clock rate at closing EP
ALSA: emu10k1: Fix out of bounds access in snd_emu10k1_pcm_channel_alloc()
ALSA: hda: Once again fix regression of page allocations with IOMMU
ALSA: usb-audio: Fix an out-of-bounds bug in __snd_usb_parse_audio_interface()
ALSA: hda/tegra: Align BDL entry to 4KB boundary
ALSA: hda/sigmatel: Fix unused variable warning for beep power change
ALSA: pcm: oss: Fix race at SNDCTL_DSP_SYNC
ALSA: hda/sigmatel: Keep power up while beep is enabled
ALSA: aloop: Fix random zeros in capture data when using jiffies timer
ALSA: usb-audio: Split endpoint setups for hw_params and prepare
ALSA: usb-audio: Register card again for iface over delayed_register option
ALSA: usb-audio: Inform the delayed registration more properly
ASoC: fsl_aud2htx: Add error handler for pm_runtime_enable
ASoC: fsl_aud2htx: register platform component before registering cpu dai
ASoC: SOF: ipc4-topology: fix alh_group_ida max value
ASoC: mchp-spdiftx: Fix clang -Wbitfield-constant-conversion
ASoC: SOF: Kconfig: Make IPC_MESSAGE_INJECTOR depend on SND_SOC_SOF
ASoC: SOF: Kconfig: Make IPC_FLOOD_TEST depend on SND_SOC_SOF
ASoC: fsl_mqs: Fix supported clock DAI format
ASoC: nau8540: Implement hw constraint for rates
...
Linus Torvalds [Fri, 9 Sep 2022 11:31:17 +0000 (07:31 -0400)]
Merge tag 'perf-tools-fixes-for-v6.0-2022-09-08' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix per-thread mmaps for multi-threaded targets, noticed with
'perf top --pid' with multithreaded targets
- Fix synthesis failure warnings in 'perf record'
- Fix L2 Topdown metrics disappearance for raw events in 'perf stat'
- Fix out of bound access in some CPU masks
- Fix segfault if there is no CPU PMU table and a metric is sought,
noticed when building with NO_JEVENTS=1
- Skip dummy event attr check in 'perf script' fixing nonsensical
warning about UREGS attribute not set, as 'dummy' events have no
samples
- Fix 'iregs' field handling with dummy events on hybrid systems in
'perf script'
- Prevent potential memory leak in c2c_he_zalloc() in 'perf c2c'
- Don't install data files with x permissions
- Fix types for print format in dlfilter-show-cycles
- Switch deprecated openssl MD5_* functions to new EVP API in 'genelf'
- Remove redundant word 'contention' in 'perf lock' help message
* tag 'perf-tools-fixes-for-v6.0-2022-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf record: Fix synthesis failure warnings
perf tools: Don't install data files with x permissions
perf script: Fix Cannot print 'iregs' field for hybrid systems
perf lock: Remove redundant word 'contention' in help message
perf dlfilter dlfilter-show-cycles: Fix types for print format
libperf evlist: Fix per-thread mmaps for multi-threaded targets
perf c2c: Prevent potential memory leak in c2c_he_zalloc()
perf genelf: Switch deprecated openssl MD5_* functions to new EVP API
tools/perf: Fix out of bound access to cpu mask array
perf affinity: Fix out of bound access to "sched_cpus" mask
perf stat: Fix L2 Topdown metrics disappear for raw events
perf script: Skip dummy event attr check
perf metric: Return early if no CPU PMU table exists
Linus Torvalds [Fri, 9 Sep 2022 11:27:44 +0000 (07:27 -0400)]
Merge tag 'trace-v6.0-rc4' of git://git./linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
- Do not stop trace events in modules if TAINT_TEST is set
- Do not clobber mount options when tracefs is mounted a second time
- Prevent crash of kprobes in gate area
- Add static annotation to some non global functions
- Add some entries into the MAINTAINERS file
- Fix check of event_mutex held when accessing trigger list
- Add some __init/__exit annotations
- Fix reporting of what called hardirq_{enable,disable}_ip function
* tag 'trace-v6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracefs: Only clobber mode/uid/gid on remount if asked
kprobes: Prohibit probes in gate area
rv/reactor: add __init/__exit annotations to module init/exit funcs
tracing: Fix to check event_mutex is held while accessing trigger list
tracing: hold caller_addr to hardirq_{enable,disable}_ip
tracepoint: Allow trace events in modules with TAINT_TEST
MAINTAINERS: add scripts/tracing/ to TRACING
MAINTAINERS: Add Runtime Verification (RV) entry
rv/monitors: Make monitor's automata definition static
Linus Torvalds [Fri, 9 Sep 2022 11:23:29 +0000 (07:23 -0400)]
Merge tag 'asm-generic-fixes-6.0-rc4' of git://git./linux/kernel/git/arnd/asm-generic
Pull SOFTIRQ_ON_OWN_STACK rework from Arnd Bergmann:
"Just one fixup patch, reworking the softirq_on_own_stack logic for
preempt-rt kernels as discussed in
https://lore.kernel.org/all/CAHk-=wgZSD3W2y6yczad2Am=EfHYyiPzTn3CfXxrriJf9i5W5w@mail.gmail.com/"
* tag 'asm-generic-fixes-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: Conditionally enable do_softirq_own_stack() via Kconfig.
Brian Norris [Sat, 27 Aug 2022 00:44:17 +0000 (17:44 -0700)]
tracefs: Only clobber mode/uid/gid on remount if asked
Users may have explicitly configured their tracefs permissions; we
shouldn't overwrite those just because a second mount appeared.
Only clobber if the options were provided at mount time.
Note: the previous behavior was especially surprising in the presence of
automounted /sys/kernel/debug/tracing/.
Existing behavior:
## Pre-existing status: tracefs is 0755.
# stat -c '%A' /sys/kernel/tracing/
drwxr-xr-x
## (Re)trigger the automount.
# umount /sys/kernel/debug/tracing
# stat -c '%A' /sys/kernel/debug/tracing/.
drwx------
## Unexpected: the automount changed mode for other mount instances.
# stat -c '%A' /sys/kernel/tracing/
drwx------
New behavior (after this change):
## Pre-existing status: tracefs is 0755.
# stat -c '%A' /sys/kernel/tracing/
drwxr-xr-x
## (Re)trigger the automount.
# umount /sys/kernel/debug/tracing
# stat -c '%A' /sys/kernel/debug/tracing/.
drwxr-xr-x
## Expected: the automount does not change other mount instances.
# stat -c '%A' /sys/kernel/tracing/
drwxr-xr-x
Link: https://lkml.kernel.org/r/20220826174353.2.Iab6e5ea57963d6deca5311b27fb7226790d44406@changeid
Cc: stable@vger.kernel.org
Fixes:
4282d60689d4f ("tracefs: Add new tracefs file system")
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Christian A. Ehrhardt [Wed, 7 Sep 2022 20:09:17 +0000 (22:09 +0200)]
kprobes: Prohibit probes in gate area
The system call gate area counts as kernel text but trying
to install a kprobe in this area fails with an Oops later on.
To fix this explicitly disallow the gate area for kprobes.
Found by syzkaller with the following reproducer:
perf_event_open$cgroup(&(0x7f00000001c0)={0x6, 0x80, 0x0, 0x0, 0x0, 0x0, 0x80ffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_config_ext={0x0, 0xffffffffff600000}}, 0xffffffffffffffff, 0x0, 0xffffffffffffffff, 0x0)
Sample report:
BUG: unable to handle page fault for address:
fffffbfff3ac6000
PGD
6dfcb067 P4D
6dfcb067 PUD
6df8f067 PMD
6de4d067 PTE 0
Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 0 PID: 21978 Comm: syz-executor.2 Not tainted 6.0.0-rc3-00363-g7726d4c3e60b-dirty #6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:__insn_get_emulate_prefix arch/x86/lib/insn.c:91 [inline]
RIP: 0010:insn_get_emulate_prefix arch/x86/lib/insn.c:106 [inline]
RIP: 0010:insn_get_prefixes.part.0+0xa8/0x1110 arch/x86/lib/insn.c:134
Code: 49 be 00 00 00 00 00 fc ff df 48 8b 40 60 48 89 44 24 08 e9 81 00 00 00 e8 e5 4b 39 ff 4c 89 fa 4c 89 f9 48 c1 ea 03 83 e1 07 <42> 0f b6 14 32 38 ca 7f 08 84 d2 0f 85 06 10 00 00 48 89 d8 48 89
RSP: 0018:
ffffc900088bf860 EFLAGS:
00010246
RAX:
0000000000040000 RBX:
ffffffff9b9bebc0 RCX:
0000000000000000
RDX:
1ffffffff3ac6000 RSI:
ffffc90002d82000 RDI:
ffffc900088bf9e8
RBP:
ffffffff9d630001 R08:
0000000000000000 R09:
ffffc900088bf9e8
R10:
0000000000000000 R11:
0000000000000001 R12:
0000000000000001
R13:
ffffffff9d630000 R14:
dffffc0000000000 R15:
ffffffff9d630000
FS:
00007f63eef63640(0000) GS:
ffff88806d000000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
fffffbfff3ac6000 CR3:
0000000029d90005 CR4:
0000000000770ef0
PKRU:
55555554
Call Trace:
<TASK>
insn_get_prefixes arch/x86/lib/insn.c:131 [inline]
insn_get_opcode arch/x86/lib/insn.c:272 [inline]
insn_get_modrm+0x64a/0x7b0 arch/x86/lib/insn.c:343
insn_get_sib+0x29a/0x330 arch/x86/lib/insn.c:421
insn_get_displacement+0x350/0x6b0 arch/x86/lib/insn.c:464
insn_get_immediate arch/x86/lib/insn.c:632 [inline]
insn_get_length arch/x86/lib/insn.c:707 [inline]
insn_decode+0x43a/0x490 arch/x86/lib/insn.c:747
can_probe+0xfc/0x1d0 arch/x86/kernel/kprobes/core.c:282
arch_prepare_kprobe+0x79/0x1c0 arch/x86/kernel/kprobes/core.c:739
prepare_kprobe kernel/kprobes.c:1160 [inline]
register_kprobe kernel/kprobes.c:1641 [inline]
register_kprobe+0xb6e/0x1690 kernel/kprobes.c:1603
__register_trace_kprobe kernel/trace/trace_kprobe.c:509 [inline]
__register_trace_kprobe+0x26a/0x2d0 kernel/trace/trace_kprobe.c:477
create_local_trace_kprobe+0x1f7/0x350 kernel/trace/trace_kprobe.c:1833
perf_kprobe_init+0x18c/0x280 kernel/trace/trace_event_perf.c:271
perf_kprobe_event_init+0xf8/0x1c0 kernel/events/core.c:9888
perf_try_init_event+0x12d/0x570 kernel/events/core.c:11261
perf_init_event kernel/events/core.c:11325 [inline]
perf_event_alloc.part.0+0xf7f/0x36a0 kernel/events/core.c:11619
perf_event_alloc kernel/events/core.c:12059 [inline]
__do_sys_perf_event_open+0x4a8/0x2a00 kernel/events/core.c:12157
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f63ef7efaed
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:
00007f63eef63028 EFLAGS:
00000246 ORIG_RAX:
000000000000012a
RAX:
ffffffffffffffda RBX:
00007f63ef90ff80 RCX:
00007f63ef7efaed
RDX:
0000000000000000 RSI:
ffffffffffffffff RDI:
00000000200001c0
RBP:
00007f63ef86019c R08:
0000000000000000 R09:
0000000000000000
R10:
ffffffffffffffff R11:
0000000000000246 R12:
0000000000000000
R13:
0000000000000002 R14:
00007f63ef90ff80 R15:
00007f63eef43000
</TASK>
Modules linked in:
CR2:
fffffbfff3ac6000
---[ end trace
0000000000000000 ]---
RIP: 0010:__insn_get_emulate_prefix arch/x86/lib/insn.c:91 [inline]
RIP: 0010:insn_get_emulate_prefix arch/x86/lib/insn.c:106 [inline]
RIP: 0010:insn_get_prefixes.part.0+0xa8/0x1110 arch/x86/lib/insn.c:134
Code: 49 be 00 00 00 00 00 fc ff df 48 8b 40 60 48 89 44 24 08 e9 81 00 00 00 e8 e5 4b 39 ff 4c 89 fa 4c 89 f9 48 c1 ea 03 83 e1 07 <42> 0f b6 14 32 38 ca 7f 08 84 d2 0f 85 06 10 00 00 48 89 d8 48 89
RSP: 0018:
ffffc900088bf860 EFLAGS:
00010246
RAX:
0000000000040000 RBX:
ffffffff9b9bebc0 RCX:
0000000000000000
RDX:
1ffffffff3ac6000 RSI:
ffffc90002d82000 RDI:
ffffc900088bf9e8
RBP:
ffffffff9d630001 R08:
0000000000000000 R09:
ffffc900088bf9e8
R10:
0000000000000000 R11:
0000000000000001 R12:
0000000000000001
R13:
ffffffff9d630000 R14:
dffffc0000000000 R15:
ffffffff9d630000
FS:
00007f63eef63640(0000) GS:
ffff88806d000000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
fffffbfff3ac6000 CR3:
0000000029d90005 CR4:
0000000000770ef0
PKRU:
55555554
==================================================================
Link: https://lkml.kernel.org/r/20220907200917.654103-1-lk@c--e.de
cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
cc: "David S. Miller" <davem@davemloft.net>
Cc: stable@vger.kernel.org
Fixes:
1da177e4c3f4 ("Linux-2.6.12-rc2")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Christian A. Ehrhardt <lk@c--e.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Sergey Matyukevich [Tue, 30 Aug 2022 15:53:04 +0000 (18:53 +0300)]
perf: RISC-V: fix access beyond allocated array
SBI firmware should report total number of firmware and hardware counters
including unused ones or special ones. In this case the kernel doesn't need
to make any assumptions about gaps in reported counters, e.g. excluded timer
counter. That was fixed in OpenSBI v1.1 by commit
3f66465fb6bf ("lib: pmu:
allow to use the highest available counter"). This kernel patch has no effect
if SBI firmware behaves correctly. However it eliminates access beyond the
allocated pmu_ctr_list if the kernel is used with OpenSBI older than v1.1.
Fixes:
e9991434596f ("RISC-V: Add perf platform driver based on SBI PMU extension")
Signed-off-by: Sergey Matyukevich <sergey.matyukevich@syntacore.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220830155306.301714-2-geomatsi@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Adrian Hunter [Wed, 7 Sep 2022 16:24:58 +0000 (19:24 +0300)]
perf record: Fix synthesis failure warnings
Some calls to synthesis functions set err < 0 but only warn about the
failure and continue. However they do not set err back to zero, relying
on subsequent code to do that.
That changed with the introduction of option --synth. When --synth=no
subsequent functions that set err back to zero are not called.
Fix by setting err = 0 in those cases.
Example:
Before:
$ perf record --no-bpf-event --synth=all -o /tmp/huh uname
Couldn't synthesize bpf events.
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ]
$ perf record --no-bpf-event --synth=no -o /tmp/huh uname
Couldn't synthesize bpf events.
After:
$ perf record --no-bpf-event --synth=no -o /tmp/huh uname
Couldn't synthesize bpf events.
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB /tmp/huh (7 samples) ]
Fixes:
41b740b6e8a994e5 ("perf record: Add --synth option")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220907162458.72817-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Eliav Farber [Thu, 8 Sep 2022 15:24:34 +0000 (15:24 +0000)]
hwmon: (mr75203) enable polling for all VM channels
Configure ip-polling register to enable polling for all voltage monitor
channels.
This enables reading the voltage values for all inputs other than just
input 0.
Fixes:
9d823351a337 ("hwmon: Add hardware monitoring driver for Moortec MR75203 PVT controller")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220908152449.35457-7-farbere@amazon.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Eliav Farber [Thu, 8 Sep 2022 15:24:33 +0000 (15:24 +0000)]
hwmon: (mr75203) fix multi-channel voltage reading
Fix voltage allocation and reading to support all channels in all VMs.
Prior to this change allocation and reading were done only for the first
channel in each VM.
This change counts the total number of channels for allocation, and takes
into account the channel offset when reading the sample data register.
Fixes:
9d823351a337 ("hwmon: Add hardware monitoring driver for Moortec MR75203 PVT controller")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220908152449.35457-6-farbere@amazon.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Jiri Slaby [Thu, 8 Sep 2022 06:04:26 +0000 (08:04 +0200)]
perf tools: Don't install data files with x permissions
install(1), by default, installs with rwxr-xr-x permissions. Modify
perf's Makefile to pass '-m 644' when installing:
* Documentation/tips.txt
* examples/bpf/*
* perf-completion.sh
* perf_dlfilter.h header
* scripts/perl/Perf-Trace-Util/lib/Perf/Trace/*
* scripts/perl/*.pl
* tests/attr/*
* tests/attr.py
* tests/shell/lib/*.sh
* trace/strace/groups/*
All those are supposed to be non-executable. Either they are not scripts
at all, or they don't have shebang.
Signed-off-by: <jslaby@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908060426.9619-1-jslaby@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Eliav Farber [Thu, 8 Sep 2022 15:24:32 +0000 (15:24 +0000)]
hwmon: (mr75203) fix voltage equation for negative source input
According to Moortec Embedded Voltage Monitor (MEVM) series 3 data
sheet, the minimum input signal is -100mv and maximum input signal
is +1000mv.
The equation used to convert the digital word to voltage uses mixed
types (*val signed and n unsigned), and on 64 bit machines also has
different size, since sizeof(u32) = 4 and sizeof(long) = 8.
So when measuring a negative input, n will be small enough, such that
PVT_N_CONST * n < PVT_R_CONST, and the result of
(PVT_N_CONST * n - PVT_R_CONST) will overflow to a very big positive
32 bit number. Then when storing the result in *val it will be the same
value just in 64 bit (instead of it representing a negative number which
will what happen when sizeof(long) = 4).
When -1023 <= (PVT_N_CONST * n - PVT_R_CONST) <= -1
dividing the number by 1024 should result of in 0, but because ">> 10"
is used, and the sign bit is used to fill the vacated bit positions, it
results in -1 (0xf...fffff) which is wrong.
This change fixes the sign problem and supports negative values by
casting n to long and replacing the shift right with div operation.
Fixes:
9d823351a337 ("hwmon: Add hardware monitoring driver for Moortec MR75203 PVT controller")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220908152449.35457-5-farbere@amazon.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Eliav Farber [Thu, 8 Sep 2022 15:24:31 +0000 (15:24 +0000)]
hwmon: (mr75203) update pvt->v_num and vm_num to the actual number of used sensors
This issue is relevant when "intel,vm-map" is set in device-tree, and
defines a lower number of VMs than actually supported.
This change is needed for all places that use pvt->v_num or vm_num
later on in the code.
Fixes:
9d823351a337 ("hwmon: Add hardware monitoring driver for Moortec MR75203 PVT controller")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220908152449.35457-4-farbere@amazon.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Eliav Farber [Thu, 8 Sep 2022 15:24:30 +0000 (15:24 +0000)]
hwmon: (mr75203) fix VM sensor allocation when "intel,vm-map" not defined
Bug - in case "intel,vm-map" is missing in device-tree ,'num' is set
to 0, and no voltage channel infos are allocated.
The reason num is set to 0 when "intel,vm-map" is missing is to set the
entire pvt->vm_idx[] with incremental channel numbers, but it didn't
take into consideration that same num is used later in devm_kcalloc().
If "intel,vm-map" does exist there is no need to set the unspecified
channels with incremental numbers, because the unspecified channels
can't be accessed in pvt_read_in() which is the only other place besides
the probe functions that uses pvt->vm_idx[].
This change fixes the bug by moving the incremental channel numbers
setting to be done only if "intel,vm-map" property is defined (starting
loop from 0), and removing 'num = 0'.
Fixes:
9d823351a337 ("hwmon: Add hardware monitoring driver for Moortec MR75203 PVT controller")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20220908152449.35457-3-farbere@amazon.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Eliav Farber [Thu, 8 Sep 2022 15:24:29 +0000 (15:24 +0000)]
dt-bindings: hwmon: (mr75203) fix "intel,vm-map" property to be optional
Change "intel,vm-map" property to be optional instead of required.
The driver implementation indicates it is not mandatory to have
"intel,vm-map" in the device tree:
- probe doesn't fail in case it is absent.
- explicit comment in code - "Incase intel,vm-map property is not
defined, we assume incremental channel numbers".
Fixes:
748022ef093f ("hwmon: Add DT bindings schema for PVT controller")
Signed-off-by: Eliav Farber <farbere@amazon.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220908152449.35457-2-farbere@amazon.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Zhengjun Xing [Thu, 8 Sep 2022 07:00:30 +0000 (15:00 +0800)]
perf script: Fix Cannot print 'iregs' field for hybrid systems
Commit
b91e5492f9d7ca89 ("perf record: Add a dummy event on hybrid
systems to collect metadata records") adds a dummy event on hybrid
systems to fix the symbol "unknown" issue when the workload is created
in a P-core but runs on an E-core. The added dummy event will cause
"perf script -F iregs" to fail. Dummy events do not have "iregs"
attribute set, so when we do evsel__check_attr, the "iregs" attribute
check will fail, so the issue happened.
The following commit [1] has fixed a similar issue by skipping the attr
check for the dummy event because it does not have any samples anyway. It
works okay for the normal mode, but the issue still happened when running
the test in the pipe mode. In the pipe mode, it calls process_attr() which
still checks the attr for the dummy event. This commit fixed the issue by
skipping the attr check for the dummy event in the API evsel__check_attr,
Otherwise, we have to patch everywhere when evsel__check_attr() is called.
Before:
#./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
Samples for 'dummy:HG' event do not have IREGS attribute set. Cannot print 'iregs' field.
0x120 [0x90]: failed to process type: 64
#
After:
# ./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
ABI:2 CX:0x55b8efa87000 DX:0x55b8efa7e000 DI:0xffffba5e625efbb0 R8:0xffff90e51f8ae100
ABI:2 CX:0x7f1dae1e4000 DX:0xd0 DI:0xffff90e18c675ac0 R8:0x71
ABI:2 CX:0xcc0 DX:0x1 DI:0xffff90e199880240 R8:0x0
ABI:2 CX:0xffff90e180dd7500 DX:0xffff90e180dd7500 DI:0xffff90e180043500 R8:0x1
ABI:2 CX:0x50 DX:0xffff90e18c583bd0 DI:0xffff90e1998803c0 R8:0x58
#
[1]https://lore.kernel.org/lkml/
20220831124041.219925-1-jolsa@kernel.org/
Fixes:
b91e5492f9d7ca89 ("perf record: Add a dummy event on hybrid systems to collect metadata records")
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908070030.3455164-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yang Jihong [Thu, 8 Sep 2022 01:48:54 +0000 (09:48 +0800)]
perf lock: Remove redundant word 'contention' in help message
Before:
# perf lock -h
Usage: perf lock [<options>] {record|report|script|info|contention|contention}
-D, --dump-raw-trace dump raw trace in ASCII
-f, --force don't complain, do it
-i, --input <file> input file name
-v, --verbose be more verbose (show symbol address, etc)
--kallsyms <file>
kallsyms pathname
--vmlinux <file> vmlinux pathname
After:
# perf lock -h
Usage: perf lock [<options>] {record|report|script|info|contention}
-D, --dump-raw-trace dump raw trace in ASCII
-f, --force don't complain, do it
-i, --input <file> input file name
-v, --verbose be more verbose (show symbol address, etc)
--kallsyms <file>
kallsyms pathname
--vmlinux <file> vmlinux pathname
Fixes:
528b9cab3b813a3b ("perf lock: Add 'contention' subcommand")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908014854.151203-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Thu, 8 Sep 2022 17:13:47 +0000 (13:13 -0400)]
Merge tag 'spi-fix-v6.0-rc4' of git://git./linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"Several fixes that came in since the merge window, the major one being
a fix for the spi-mux driver which was broken by the performance
optimisations due to it peering inside the core's data structures more
than it should"
* tag 'spi-fix-v6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: spi: Fix queue hang if previous transfer failed
spi: mux: Fix mux interaction with fast path optimisations
spi: cadence-quadspi: Disable irqs during indirect reads
spi: bitbang: Fix lsb-first Rx
Linus Torvalds [Thu, 8 Sep 2022 16:56:20 +0000 (12:56 -0400)]
Merge tag 'regulator-fix-v6.0-rc4' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"One core fix here improving the error handling on enable failure, plus
smaller fixes for the pfuze100 drive and the SPMI DT bindings"
* tag 'regulator-fix-v6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Fix qcom,spmi-regulator schema
regulator: pfuze100: Fix the global-out-of-bounds access in pfuze100_regulator_probe()
regulator: core: Clean up on enable failure
Linus Torvalds [Thu, 8 Sep 2022 16:51:58 +0000 (12:51 -0400)]
Merge tag 'regmap-fix-v6.0-rc4' of git://git./linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"A fix for how we handle controller constraints on SPI message sizes,
only impacting systems with SPI controllers with very low limits like
the AMD controller used in the Steam Deck"
* tag 'regmap-fix-v6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: spi: Reserve space for register address/padding
Jens Axboe [Thu, 8 Sep 2022 16:20:18 +0000 (10:20 -0600)]
Merge tag 'nvme-6.0-2022-09-08' of git://git.infradead.org/nvme into block-6.0
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 6.1
- fix a use after free in nvmet (Bart Van Assche)
- fix a use after free when detecting digest errors (Sagi Grimberg)
- fix regression that causes sporadic TCP requests to time out
(Sagi Grimberg)
- fix two off by ones errors in the nvmet ZNS support
(Dennis Maisenbacher)
- requeue aen after firmware activation (Keith Busch)"
* tag 'nvme-6.0-2022-09-08' of git://git.infradead.org/nvme:
nvme: requeue aen after firmware activation
nvmet: fix mar and mor off-by-one errors
nvme-tcp: fix regression that causes sporadic requests to time out
nvme-tcp: fix UAF when detecting digest errors
nvmet: fix a use-after-free
Adrian Hunter [Mon, 5 Sep 2022 07:47:35 +0000 (10:47 +0300)]
perf dlfilter dlfilter-show-cycles: Fix types for print format
Avoid compiler warning about format %llu that expects long long unsigned
int but argument has type __u64.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes:
c3afd6e50fce824f ("perf dlfilter: Add dlfilter-show-cycles")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220905074735.4513-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Mon, 5 Sep 2022 11:42:09 +0000 (14:42 +0300)]
libperf evlist: Fix per-thread mmaps for multi-threaded targets
The offending commit removed mmap_per_thread(), which did not consider
the different set-output rules for per-thread mmaps i.e. in the per-thread
case set-output is used for file descriptors of the same thread not the
same cpu.
This was not immediately noticed because it only happens with
multi-threaded targets and we do not have a test for that yet.
Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
events i.e. to continue to allow the mixing of per-thread and per-cpu
mmaps.
Debug messages (with -vv) show the file descriptors that are opened with
sys_perf_event_open. New debug messages are added (needs -vvv) that show
also which file descriptors are mmapped and which are redirected with
set-output.
In the per-cpu case (cpu != -1) file descriptors for the same CPU are
set-output to the first file descriptor for that CPU.
In the per-thread case (cpu == -1) file descriptors for the same thread are
set-output to the first file descriptor for that thread.
Example (process 17489 has 2 threads):
Before (but with new debug prints):
$ perf record --no-bpf-event -vvv --per-thread -p 17489
<SNIP>
sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
<SNIP>
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 6 -> 5
failed to mmap with 22 (Invalid argument)
After:
$ perf record --no-bpf-event -vvv --per-thread -p 17489
<SNIP>
sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
<SNIP>
libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 1: mmapping fd 6
<SNIP>
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]
Per-cpu example (process 20341 has 2 threads, same as above):
$ perf record --no-bpf-event -vvv -p 20341
<SNIP>
sys_perf_event_open: pid 20341 cpu 0 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 20342 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid 20341 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid 20342 cpu 1 group_fd -1 flags 0x8 = 8
sys_perf_event_open: pid 20341 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid 20342 cpu 2 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid 20341 cpu 3 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid 20342 cpu 3 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid 20341 cpu 4 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid 20342 cpu 4 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid 20341 cpu 5 group_fd -1 flags 0x8 = 15
sys_perf_event_open: pid 20342 cpu 5 group_fd -1 flags 0x8 = 16
sys_perf_event_open: pid 20341 cpu 6 group_fd -1 flags 0x8 = 17
sys_perf_event_open: pid 20342 cpu 6 group_fd -1 flags 0x8 = 18
sys_perf_event_open: pid 20341 cpu 7 group_fd -1 flags 0x8 = 19
sys_perf_event_open: pid 20342 cpu 7 group_fd -1 flags 0x8 = 20
<SNIP>
libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 6 -> 5
libperf: idx 1: mmapping fd 7
libperf: idx 1: set output fd 8 -> 7
libperf: idx 2: mmapping fd 9
libperf: idx 2: set output fd 10 -> 9
libperf: idx 3: mmapping fd 11
libperf: idx 3: set output fd 12 -> 11
libperf: idx 4: mmapping fd 13
libperf: idx 4: set output fd 14 -> 13
libperf: idx 5: mmapping fd 15
libperf: idx 5: set output fd 16 -> 15
libperf: idx 6: mmapping fd 17
libperf: idx 6: set output fd 18 -> 17
libperf: idx 7: mmapping fd 19
libperf: idx 7: set output fd 20 -> 19
<SNIP>
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]
Fixes:
ae4f8ae16a078964 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
Reported-by: Tomáš Trnka <trnka@scm.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216441
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220905114209.8389-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pavel Begunkov [Thu, 8 Sep 2022 13:01:10 +0000 (14:01 +0100)]
io_uring/net: copy addr for zc on POLL_FIRST
Every time we return from an issue handler and expect the request to be
retried we should also setup it for async exec ourselves. Do that when
we return on IORING_RECVSEND_POLL_FIRST in io_sendzc(), otherwise it'll
re-read the address, which might be a surprise for the userspace.
Fixes:
092aeedb750a9 ("io_uring: allow to pass addr into sendzc")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ab1d0657890d6721339c56d2e161a4bba06f85d0.1662642013.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Mark Brown [Fri, 2 Sep 2022 13:28:02 +0000 (14:28 +0100)]
arm64/ptrace: Don't clear calling process' TIF_SME on OOM
If allocating memory for the target SVE state in za_set() fails we clear
TIF_SME for the ptracing task which is obviously not correct. If we are
here we know that the target task already had neither TIF_SVE nor
TIF_SME set since we only need to allocate if either the target had not
used either SVE or SME and had no need to allocate state before or we
just changed the vector length with vec_set_vector_length() which clears
TIF_ for us on allocation failure so just remove the clear entirely.
Reported-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220902132802.39682-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Takashi Iwai [Thu, 8 Sep 2022 12:24:05 +0000 (14:24 +0200)]
Merge tag 'asoc-fix-v6.0-rc4' of https://git./linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.0
Quite a few fixes here, all driver specific and fairly small.
Linus Torvalds [Thu, 8 Sep 2022 12:15:01 +0000 (08:15 -0400)]
Merge tag 'net-6.0-rc5' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from rxrpc, netfilter, wireless and bluetooth
subtrees.
Current release - regressions:
- skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
- bluetooth: fix regression preventing ACL packet transmission
Current release - new code bugs:
- dsa: microchip: fix kernel oops on ksz8 switches
- dsa: qca8k: fix NULL pointer dereference for
of_device_get_match_data
Previous releases - regressions:
- netfilter: clean up hook list when offload flags check fails
- wifi: mt76: fix crash in chip reset fail
- rxrpc: fix ICMP/ICMP6 error handling
- ice: fix DMA mappings leak
- i40e: fix kernel crash during module removal
Previous releases - always broken:
- ipv6: sr: fix out-of-bounds read when setting HMAC data.
- tcp: TX zerocopy should not sense pfmemalloc status
- sch_sfb: don't assume the skb is still around after
enqueueing to child
- netfilter: drop dst references before setting
- wifi: wilc1000: fix DMA on stack objects
- rxrpc: fix an insufficiently large sglist in
rxkad_verify_packet_2()
- fec: use a spinlock to guard `fep->ptp_clk_on`
Misc:
- usb: qmi_wwan: add Quectel RM520N"
* tag 'net-6.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
sch_sfb: Also store skb len before calling child enqueue
net: phy: lan87xx: change interrupt src of link_up to comm_ready
net/smc: Fix possible access to freed memory in link clear
net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb
net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear
net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
net: usb: qmi_wwan: add Quectel RM520N
net: dsa: qca8k: fix NULL pointer dereference for of_device_get_match_data
tcp: fix early ETIMEDOUT after spurious non-SACK RTO
stmmac: intel: Simplify intel_eth_pci_remove()
net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()
ipv6: sr: fix out-of-bounds read when setting HMAC data.
bonding: accept unsolicited NA message
bonding: add all node mcast address when slave up
bonding: use unspecified address if no available link local address
wifi: use struct_group to copy addresses
wifi: mac80211_hwsim: check length for virtio packets
...
Linus Torvalds [Wed, 31 Aug 2022 16:46:12 +0000 (09:46 -0700)]
fs: only do a memory barrier for the first set_buffer_uptodate()
Commit
d4252071b97d ("add barriers to buffer_uptodate and
set_buffer_uptodate") added proper memory barriers to the buffer head
BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date
will be guaranteed to actually see initialized state.
However, that commit didn't _just_ add the memory barrier, it also ended
up dropping the "was it already set" logic that the BUFFER_FNS() macro
had.
That's conceptually the right thing for a generic "this is a memory
barrier" operation, but in the case of the buffer contents, we really
only care about the memory barrier for the _first_ time we set the bit,
in that the only memory ordering protection we need is to avoid anybody
seeing uninitialized memory contents.
Any other access ordering wouldn't be about the BH_Uptodate bit anyway,
and would require some other proper lock (typically BH_Lock or the folio
lock). A reader that races with somebody invalidating the buffer head
isn't an issue wrt the memory ordering, it's a serialization issue.
Now, you'd think that the buffer head operations don't matter in this
day and age (and I certainly thought so), but apparently some loads
still end up being heavy users of buffer heads. In particular, the
kernel test robot reported that not having this bit access optimization
in place caused a noticeable direct IO performance regression on ext4:
fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression
although you presumably need a fast disk and a lot of cores to actually
notice.
Link: https://lore.kernel.org/all/Yw8L7HTZ%2FdE2%2Fo9C@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Fengwei Yin <fengwei.yin@intel.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 8 Sep 2022 11:37:38 +0000 (07:37 -0400)]
Merge tag 'efi-urgent-for-v6.0-1' of git://git./linux/kernel/git/efi/efi
Pull EFI fixes from Ard Biesheuvel:
"A couple of low-priority EFI fixes:
- prevent the randstruct plugin from re-ordering EFI protocol
definitions
- fix a use-after-free in the capsule loader
- drop unused variable"
* tag 'efi-urgent-for-v6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi: capsule-loader: Fix use-after-free in efi_capsule_write
efi/x86: libstub: remove unused variable
efi: libstub: Disable struct randomization
Clément Péron [Tue, 6 Sep 2022 15:30:33 +0000 (17:30 +0200)]
drm/panfrost: devfreq: set opp to the recommended one to configure regulator
Enabling panfrost GPU OPP with dynamic regulator will make OPP
responsible to enable and configure it.
Unfortunately OPP configure and enable the regulator when an OPP
is asked to be set, which is not the case during
panfrost_devfreq_init().
This leave the regulator unconfigured and if no GPU load is
triggered, no OPP is asked to be set which make the regulator framework
switching it off during regulator_late_cleanup() without
noticing and therefore make the board hang as any access to GPU
memory space make bus locks up.
Call dev_pm_opp_set_opp() with the recommend OPP in
panfrost_devfreq_init() to enable the regulator, this will properly
configure and enable the regulator and will avoid any switch off
by regulator_late_cleanup().
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Clément Péron <peron.clem@gmail.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220906153034.153321-5-peron.clem@gmail.com
Toke Høiland-Jørgensen [Mon, 5 Sep 2022 19:21:36 +0000 (21:21 +0200)]
sch_sfb: Also store skb len before calling child enqueue
Cong Wang noticed that the previous fix for sch_sfb accessing the queued
skb after enqueueing it to a child qdisc was incomplete: the SFB enqueue
function was also calling qdisc_qstats_backlog_inc() after enqueue, which
reads the pkt len from the skb cb field. Fix this by also storing the skb
len, and using the stored value to increment the backlog after enqueueing.
Fixes:
9efd23297cca ("sch_sfb: Don't assume the skb is still around after enqueueing to child")
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Acked-by: Cong Wang <cong.wang@bytedance.com>
Link: https://lore.kernel.org/r/20220905192137.965549-1-toke@toke.dk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Arun Ramadoss [Mon, 5 Sep 2022 15:27:50 +0000 (20:57 +0530)]
net: phy: lan87xx: change interrupt src of link_up to comm_ready
Currently phy link up/down interrupt is enabled using the
LAN87xx_INTERRUPT_MASK register. In the lan87xx_read_status function,
phy link is determined using the T1_MODE_STAT_REG register comm_ready bit.
comm_ready bit is set using the loc_rcvr_status & rem_rcvr_status.
Whenever the phy link is up, LAN87xx_INTERRUPT_SOURCE link_up bit is set
first but comm_ready bit takes some time to set based on local and
remote receiver status.
As per the current implementation, interrupt is triggered using link_up
but the comm_ready bit is still cleared in the read_status function. So,
link is always down. Initially tested with the shared interrupt
mechanism with switch and internal phy which is working, but after
implementing interrupt controller it is not working.
It can fixed either by updating the read_status function to read from
LAN87XX_INTERRUPT_SOURCE register or enable the interrupt mask for
comm_ready bit. But the validation team recommends the use of comm_ready
for link detection.
This patch fixes by enabling the comm_ready bit for link_up in the
LAN87XX_INTERRUPT_MASK_2 register (MISC Bank) and link_down in
LAN87xx_INTERRUPT_MASK register.
Fixes:
8a1b415d70b7 ("net: phy: added ethtool master-slave configuration support")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220905152750.5079-1-arun.ramadoss@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Christian König [Wed, 7 Sep 2022 09:56:22 +0000 (11:56 +0200)]
drm/ttm: cleanup the resource of ghost objects after locking them
Otherwise lockdep will complain about cleaning up the bulk_move.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220907100051.570641-1-christian.koenig@amd.com
Fixes:
d91c411c744b ("drm/ttm: update bulk move object of ghost BO")
Dave Airlie [Thu, 8 Sep 2022 06:09:41 +0000 (16:09 +1000)]
Merge tag 'amd-drm-fixes-6.0-2022-09-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.0-2022-09-07:
amdgpu:
- Firmware header fix
- SMU 13.x fix
- Debugfs memory leak fix
- NBIO 7.7 fix
- Firmware memory leak fix
amdkfd:
- Debug output fix
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220908032332.5880-1-alexander.deucher@amd.com
Guchun Chen [Fri, 2 Sep 2022 06:08:55 +0000 (14:08 +0800)]
drm/amdgpu: prevent toc firmware memory leak
It's missed in psp fini.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yifan Zhang [Tue, 6 Sep 2022 05:09:20 +0000 (13:09 +0800)]
drm/amdgpu: correct doorbell range/size value for CSDMA_DOORBELL_RANGE
current function mixes CSDMA_DOORBELL_RANGE and SDMA0_DOORBELL_RANGE
range/size manipulation, while these 2 registers have difference size
field mask. Remove range/size manipulation for SDMA0_DOORBELL_RANGE.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Xiaojian Du <Xiaojian.Du@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yifan Zhang [Sun, 4 Sep 2022 07:53:27 +0000 (15:53 +0800)]
drm/amdkfd: print address in hex format rather than decimal
Addresses should be printed in hex format.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Greg Kroah-Hartman [Fri, 2 Sep 2022 13:01:05 +0000 (15:01 +0200)]
drm/amd/display: fix memory leak when using debugfs_lookup()
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time. Fix this up by properly
calling dput().
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: hersen wu <hersenxs.wu@amd.com>
Cc: Wenjing Liu <wenjing.liu@amd.com>
Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Cc: Thelford Williams <tdwilliamsiv@gmail.com>
Cc: Fangzhi Zuo <Jerry.Zuo@amd.com>
Cc: Yongzhi Liu <lyz_cs@pku.edu.cn>
Cc: Mikita Lipski <mikita.lipski@amd.com>
Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Cc: Bhanuprakash Modem <bhanuprakash.modem@intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Evan Quan [Thu, 1 Sep 2022 05:48:58 +0000 (13:48 +0800)]
drm/amd/pm: add missing SetMGpuFanBoostLimitRpm mapping for SMU 13.0.7
Missing SetMGpuFanBoostLimitRpm mapping leads to loading failure for SMU
13.0.7.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Chengming Gui [Tue, 6 Sep 2022 01:26:37 +0000 (09:26 +0800)]
drm/amd/amdgpu: add rlc_firmware_header_v2_4 to amdgpu_firmware_header
Add missing structure to avoid incorrect size and version check.
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Ellerman [Tue, 6 Sep 2022 23:37:17 +0000 (09:37 +1000)]
powerpc/pseries: Fix plpks crash on non-pseries
As reported[1] by Nathan, the recently added plpks driver will crash if
it's built into the kernel and booted on a non-pseries machine, eg
powernv:
kernel BUG at arch/powerpc/kernel/syscall.c:39!
Oops: Exception in kernel mode, sig: 5 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
...
NIP system_call_exception+0x90/0x3d0
LR system_call_common+0xec/0x250
Call Trace:
0xc0000000035c3e10 (unreliable)
system_call_common+0xec/0x250
--- interrupt: c00 at plpar_hcall+0x38/0x60
NIP:
c0000000000e4300 LR:
c00000000202945c CTR:
0000000000000000
REGS:
c0000000035c3e80 TRAP: 0c00 Not tainted (6.0.0-rc4)
MSR:
9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR:
28000284 XER:
00000000
...
NIP plpar_hcall+0x38/0x60
LR pseries_plpks_init+0x64/0x23c
--- interrupt: c00
On powernv Linux is the hypervisor, so a hypercall just ends up going to
the syscall path, which BUGs if the syscall (hypercall) didn't come from
userspace.
The fix is simply to not probe the plpks driver on non-pseries machines.
[1] https://lore.kernel.org/linuxppc-dev/Yxe06fbq18Wv9y3W@dev-arch.thelio-3990X/
Fixes:
2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Dan Horák <dan@danny.cz>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20220907065038.1604504-1-mpe@ellerman.id.au
Joe Fradley [Wed, 24 Aug 2022 04:19:33 +0000 (21:19 -0700)]
tools: Add new "test" taint to kernel-chktaint
Commit
c272612cb4a2 ("kunit: Taint the kernel when KUnit tests are run")
added a new taint flag for when in-kernel tests run. This commit adds
recognition of this new flag in kernel-chktaint.
With this change the correct reason will be reported if the kernel is
tainted because of a test run.
Amended Commit log: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Joe Fradley <joefradley@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Pavel Begunkov [Tue, 6 Sep 2022 16:11:17 +0000 (17:11 +0100)]
io_uring: recycle kbuf recycle on tw requeue
When we queue a request via tw for execution it's not going to be
executed immediately, so when io_queue_async() hits IO_APOLL_READY
and queues a tw but doesn't try to recycle/consume the buffer some other
request may try to use the the buffer.
Fixes:
c7fb19428d67 ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/a19bc9e211e3184215a58e129b62f440180e9212.1662480490.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Tue, 6 Sep 2022 16:11:16 +0000 (17:11 +0100)]
io_uring/kbuf: fix not advancing READV kbuf ring
When we don't recycle a selected ring buffer we should advance the head
of the ring, so don't just skip io_kbuf_recycle() for IORING_OP_READV
but adjust the ring.
Fixes:
934447a603b22 ("io_uring: do not recycle buffer in READV")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/a6d85e2611471bcb5d5dcd63a8342077ddc2d73d.1662480490.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hyunwoo Kim [Wed, 7 Sep 2022 16:07:14 +0000 (09:07 -0700)]
efi: capsule-loader: Fix use-after-free in efi_capsule_write
A race condition may occur if the user calls close() on another thread
during a write() operation on the device node of the efi capsule.
This is a race condition that occurs between the efi_capsule_write() and
efi_capsule_flush() functions of efi_capsule_fops, which ultimately
results in UAF.
So, the page freeing process is modified to be done in
efi_capsule_release() instead of efi_capsule_flush().
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Link: https://lore.kernel.org/all/20220907102920.GA88602@ubuntu/
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Yicong Yang [Mon, 5 Sep 2022 12:26:15 +0000 (20:26 +0800)]
arch_topology: Make cluster topology span at least SMT CPUs
Currently cpu_clustergroup_mask() will return CPU mask if cluster span more
or the same CPUs as cpu_coregroup_mask(). This will result topology borken
on non-Cluster SMT machines when building with CONFIG_SCHED_CLUSTER=y.
Test with:
qemu-system-aarch64 -enable-kvm -machine virt \
-net none \
-cpu host \
-bios ./QEMU_EFI.fd \
-m 2G \
-smp 48,sockets=2,cores=12,threads=2 \
-kernel $Image \
-initrd $Rootfs \
-nographic
-append "rdinit=init console=ttyAMA0 sched_verbose loglevel=8"
We'll get below error:
[ 3.084568] BUG: arch topology borken
[ 3.084570] the SMT domain not a subset of the CLS domain
Since cluster is a level higher than SMT, fix this by making cluster
spans at least SMT CPUs.
Fixes:
bfcc4397435d ("arch_topology: Limit span of cpu_clustergroup_mask()")
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20220905122615.12946-1-yangyicong@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yacan Liu [Tue, 6 Sep 2022 13:01:39 +0000 (21:01 +0800)]
net/smc: Fix possible access to freed memory in link clear
After modifying the QP to the Error state, all RX WR would be completed
with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not
wait for it is done, but destroy the QP and free the link group directly.
So there is a risk that accessing the freed memory in tasklet context.
Here is a crash example:
BUG: unable to handle page fault for address:
ffffffff8f220860
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD
f7300e067 P4D
f7300e067 PUD
f7300f063 PMD
8c4e45063 PTE
800ffff08c9df060
Oops: 0002 [#1] SMP PTI
CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S OE 5.10.0-0607+ #23
Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018
RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0
Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
RSP: 0018:
ffffb3b6c001ebd8 EFLAGS:
00010086
RAX:
ffffffff8f220860 RBX:
0000000000000246 RCX:
0000000000080000
RDX:
ffff91db1f86c800 RSI:
000000000000173c RDI:
ffff91db62bace00
RBP:
ffff91db62bacc00 R08:
0000000000000000 R09:
c00000010000028b
R10:
0000000000055198 R11:
ffffb3b6c001ea58 R12:
ffff91db80e05010
R13:
000000000000000a R14:
0000000000000006 R15:
0000000000000040
FS:
0000000000000000(0000) GS:
ffff91db1f840000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffffffff8f220860 CR3:
00000001f9580004 CR4:
00000000003706e0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
Call Trace:
<IRQ>
_raw_spin_lock_irqsave+0x30/0x40
mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib]
smc_wr_rx_tasklet_fn+0x56/0xa0 [smc]
tasklet_action_common.isra.21+0x66/0x100
__do_softirq+0xd5/0x29c
asm_call_irq_on_stack+0x12/0x20
</IRQ>
do_softirq_own_stack+0x37/0x40
irq_exit_rcu+0x9d/0xa0
sysvec_call_function_single+0x34/0x80
asm_sysvec_call_function_single+0x12/0x20
Fixes:
bd4ad57718cc ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR")
Signed-off-by: Yacan Liu <liuyacan@corp.netease.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Mon, 5 Sep 2022 12:41:28 +0000 (14:41 +0200)]
net: ethernet: mtk_eth_soc: check max allowed hash in mtk_ppe_check_skb
Even if max hash configured in hw in mtk_ppe_hash_entry is
MTK_PPE_ENTRIES - 1, check theoretical OOB accesses in
mtk_ppe_check_skb routine
Fixes:
c4f033d9e03e9 ("net: ethernet: mtk_eth_soc: rework hardware flow table management")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Menglong Dong [Mon, 5 Sep 2022 03:50:15 +0000 (11:50 +0800)]
net: skb: export skb drop reaons to user by TRACE_DEFINE_ENUM
As Eric reported, the 'reason' field is not presented when trace the
kfree_skb event by perf:
$ perf record -e skb:kfree_skb -a sleep 10
$ perf script
ip_defrag 14605 [021] 221.614303: skb:kfree_skb:
skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1
reason:
The cause seems to be passing kernel address directly to TP_printk(),
which is not right. As the enum 'skb_drop_reason' is not exported to
user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason
string from the 'reason' field, which is a number.
Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used
to define the trace enum by TRACE_DEFINE_ENUM(). With the help of
DEFINE_DROP_REASON(), now we can remove the auto-generate that we
introduced in the commit
ec43908dd556
("net: skb: use auto-generation to convert skb drop reason to string"),
and define the string array 'drop_reasons'.
Hmmmm...now we come back to the situation that have to maintain drop
reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they
are both in dropreason.h, which makes it easier.
After this commit, now the format of kfree_skb is like this:
$ cat /tracing/events/skb/kfree_skb/format
name: kfree_skb
ID: 1524
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:void * skbaddr; offset:8; size:8; signed:0;
field:void * location; offset:16; size:8; signed:0;
field:unsigned short protocol; offset:24; size:2; signed:0;
field:enum skb_drop_reason reason; offset:28; size:4; signed:0;
print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ......
Fixes:
ec43908dd556 ("net: skb: use auto-generation to convert skb drop reason to string")
Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi [Tue, 6 Sep 2022 14:36:32 +0000 (16:36 +0200)]
net: ethernet: mtk_eth_soc: fix typo in __mtk_foe_entry_clear
Set ib1 state to MTK_FOE_STATE_UNBIND in __mtk_foe_entry_clear routine.
Fixes:
33fc42de33278 ("net: ethernet: mtk_eth_soc: support creating mac address based offload entries")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jean-Philippe Brucker [Thu, 25 Aug 2022 15:46:24 +0000 (16:46 +0100)]
iommu/virtio: Fix interaction with VFIO
Commit
e8ae0e140c05 ("vfio: Require that devices support DMA cache
coherence") requires IOMMU drivers to advertise
IOMMU_CAP_CACHE_COHERENCY, in order to be used by VFIO. Since VFIO does
not provide to userspace the ability to maintain coherency through cache
invalidations, it requires hardware coherency. Advertise the capability
in order to restore VFIO support.
The meaning of IOMMU_CAP_CACHE_COHERENCY also changed from "IOMMU can
enforce cache coherent DMA transactions" to "IOMMU_CACHE is supported".
While virtio-iommu cannot enforce coherency (of PCIe no-snoop
transactions), it does support IOMMU_CACHE.
We can distinguish different cases of non-coherent DMA:
(1) When accesses from a hardware endpoint are not coherent. The host
would describe such a device using firmware methods ('dma-coherent'
in device-tree, '_CCA' in ACPI), since they are also needed without
a vIOMMU. In this case mappings are created without IOMMU_CACHE.
virtio-iommu doesn't need any additional support. It sends the same
requests as for coherent devices.
(2) When the physical IOMMU supports non-cacheable mappings. Supporting
those would require a new feature in virtio-iommu, new PROBE request
property and MAP flags. Device drivers would use a new API to
discover this since it depends on the architecture and the physical
IOMMU.
(3) When the hardware supports PCIe no-snoop. It is possible for
assigned PCIe devices to issue no-snoop transactions, and the
virtio-iommu specification is lacking any mention of this.
Arm platforms don't necessarily support no-snoop, and those that do
cannot enforce coherency of no-snoop transactions. Device drivers
must be careful about assuming that no-snoop transactions won't end
up cached; see commit
e02f5c1bb228 ("drm: disable uncached DMA
optimization for ARM and arm64"). On x86 platforms, the host may or
may not enforce coherency of no-snoop transactions with the physical
IOMMU. But according to the above commit, on x86 a driver which
assumes that no-snoop DMA is compatible with uncached CPU mappings
will also work if the host enforces coherency.
Although these issues are not specific to virtio-iommu, it could be
used to facilitate discovery and configuration of no-snoop. This
would require a new feature bit, PROBE property and ATTACH/MAP
flags.
Cc: stable@vger.kernel.org
Fixes:
e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20220825154622.86759-1-jean-philippe@linaro.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Tue, 23 Aug 2022 06:15:57 +0000 (14:15 +0800)]
iommu/vt-d: Fix lockdep splat due to klist iteration in atomic context
With CONFIG_INTEL_IOMMU_DEBUGFS enabled, below lockdep splat are seen
when an I/O fault occurs on a machine with an Intel IOMMU in it.
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write NO_PASID] Request device [00:1a.0] fault addr 0x0
[fault reason 0x05] PTE Write access is not set
DMAR: Dump dmar0 table entries for IOVA 0x0
DMAR: root entry: 0x0000000127f42001
DMAR: context entry: hi 0x0000000000001502, low 0x000000012d8ab001
================================
WARNING: inconsistent lock state
5.20.0-0.rc0.20220812git7ebfc85e2cd7.10.fc38.x86_64 #1 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
rngd/1006 [HC1[1]:SC0[0]:HE0:SE1] takes:
ff177021416f2d78 (&k->k_lock){?.+.}-{2:2}, at: klist_next+0x1b/0x160
{HARDIRQ-ON-W} state was registered at:
lock_acquire+0xce/0x2d0
_raw_spin_lock+0x33/0x80
klist_add_tail+0x46/0x80
bus_add_device+0xee/0x150
device_add+0x39d/0x9a0
add_memory_block+0x108/0x1d0
memory_dev_init+0xe1/0x117
driver_init+0x43/0x4d
kernel_init_freeable+0x1c2/0x2cc
kernel_init+0x16/0x140
ret_from_fork+0x1f/0x30
irq event stamp: 7812
hardirqs last enabled at (7811): [<
ffffffff85000e86>] asm_sysvec_apic_timer_interrupt+0x16/0x20
hardirqs last disabled at (7812): [<
ffffffff84f16894>] irqentry_enter+0x54/0x60
softirqs last enabled at (7794): [<
ffffffff840ff669>] __irq_exit_rcu+0xf9/0x170
softirqs last disabled at (7787): [<
ffffffff840ff669>] __irq_exit_rcu+0xf9/0x170
The klist iterator functions using spin_*lock_irq*() but the klist
insertion functions using spin_*lock(), combined with the Intel DMAR
IOMMU driver iterating over klists from atomic (hardirq) context, where
pci_get_domain_bus_and_slot() calls into bus_find_device() which iterates
over klists.
As currently there's no plan to fix the klist to make it safe to use in
atomic context, this fixes the lockdep splat by avoid calling
pci_get_domain_bus_and_slot() in the hardirq context.
Fixes:
8ac0b64b9735 ("iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()")
Reported-by: Lennert Buytenhek <buytenh@wantstofly.org>
Link: https://lore.kernel.org/linux-iommu/Yvo2dfpEh%2FWC+Wrr@wantstofly.org/
Link: https://lore.kernel.org/linux-iommu/YvyBdPwrTuHHbn5X@wantstofly.org/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220819015949.4795-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Tue, 23 Aug 2022 06:15:56 +0000 (14:15 +0800)]
iommu/vt-d: Fix recursive lock issue in iommu_flush_dev_iotlb()
The per domain spinlock is acquired in iommu_flush_dev_iotlb(), which
is possbile to be called in the interrupt context. For example, the
drm-intel's CI system got completely blocked with below error:
WARNING: inconsistent lock state
6.0.0-rc1-CI_DRM_11990-g6590d43d39b9+ #1 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
ffff88810440d678 (&domain->lock){+.?.}-{2:2}, at: iommu_flush_dev_iotlb.part.61+0x23/0x80
{SOFTIRQ-ON-W} state was registered at:
lock_acquire+0xd3/0x310
_raw_spin_lock+0x2a/0x40
domain_update_iommu_cap+0x20b/0x2c0
intel_iommu_attach_device+0x5bd/0x860
__iommu_attach_device+0x18/0xe0
bus_iommu_probe+0x1f3/0x2d0
bus_set_iommu+0x82/0xd0
intel_iommu_init+0xe45/0x102a
pci_iommu_init+0x9/0x31
do_one_initcall+0x53/0x2f0
kernel_init_freeable+0x18f/0x1e1
kernel_init+0x11/0x120
ret_from_fork+0x1f/0x30
irq event stamp: 162354
hardirqs last enabled at (162354): [<
ffffffff81b59274>] _raw_spin_unlock_irqrestore+0x54/0x70
hardirqs last disabled at (162353): [<
ffffffff81b5901b>] _raw_spin_lock_irqsave+0x4b/0x50
softirqs last enabled at (162338): [<
ffffffff81e00323>] __do_softirq+0x323/0x48e
softirqs last disabled at (162349): [<
ffffffff810c1588>] irq_exit_rcu+0xb8/0xe0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&domain->lock);
<Interrupt>
lock(&domain->lock);
*** DEADLOCK ***
1 lock held by swapper/6/0:
This coverts the spin_lock/unlock() into the irq save/restore varieties
to fix the recursive locking issues.
Fixes:
ffd5869d93530 ("iommu/vt-d: Replace spin_lock_irqsave() with spin_lock()")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20220817025650.3253959-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Tue, 23 Aug 2022 06:15:55 +0000 (14:15 +0800)]
iommu/vt-d: Correctly calculate sagaw value of IOMMU
The Intel IOMMU driver possibly selects between the first-level and the
second-level translation tables for DMA address translation. However,
the levels of page-table walks for the 4KB base page size are calculated
from the SAGAW field of the capability register, which is only valid for
the second-level page table. This causes the IOMMU driver to stop working
if the hardware (or the emulated IOMMU) advertises only first-level
translation capability and reports the SAGAW field as 0.
This solves the above problem by considering both the first level and the
second level when calculating the supported page table levels.
Fixes:
b802d070a52a1 ("iommu/vt-d: Use iova over first level")
Cc: stable@vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220817023558.3253263-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Lu Baolu [Tue, 23 Aug 2022 06:15:54 +0000 (14:15 +0800)]
iommu/vt-d: Fix kdump kernels boot failure with scalable mode
The translation table copying code for kdump kernels is currently based
on the extended root/context entry formats of ECS mode defined in older
VT-d v2.5, and doesn't handle the scalable mode formats. This causes
the kexec capture kernel boot failure with DMAR faults if the IOMMU was
enabled in scalable mode by the previous kernel.
The ECS mode has already been deprecated by the VT-d spec since v3.0 and
Intel IOMMU driver doesn't support this mode as there's no real hardware
implementation. Hence this converts ECS checking in copying table code
into scalable mode.
The existing copying code consumes a bit in the context entry as a mark
of copied entry. It needs to work for the old format as well as for the
extended context entries. As it's hard to find such a common bit for both
legacy and scalable mode context entries. This replaces it with a per-
IOMMU bitmap.
Fixes:
7373a8cc38197 ("iommu/vt-d: Setup context and enable RID2PASID support")
Cc: stable@vger.kernel.org
Reported-by: Jerry Snitselaar <jsnitsel@redhat.com>
Tested-by: Wen Jin <wen.jin@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20220817011035.3250131-1-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Alexander Sverdlin [Tue, 6 Sep 2022 09:59:43 +0000 (11:59 +0200)]
MIPS: OCTEON: irq: Fix octeon_irq_force_ciu_mapping()
For irq_domain_associate() to work the virq descriptor has to be
pre-allocated in advance. Otherwise the following happens:
WARNING: CPU: 0 PID: 0 at .../kernel/irq/irqdomain.c:527 irq_domain_associate+0x298/0x2e8
error: virq128 is not allocated
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.78-... #1
...
Call Trace:
[<
ffffffff801344c4>] show_stack+0x9c/0x130
[<
ffffffff80769550>] dump_stack+0x90/0xd0
[<
ffffffff801576d0>] __warn+0x118/0x130
[<
ffffffff80157734>] warn_slowpath_fmt+0x4c/0x70
[<
ffffffff801b83c0>] irq_domain_associate+0x298/0x2e8
[<
ffffffff80a43bb8>] octeon_irq_init_ciu+0x4c8/0x53c
[<
ffffffff80a76cbc>] of_irq_init+0x1e0/0x388
[<
ffffffff80a452cc>] init_IRQ+0x4c/0xf4
[<
ffffffff80a3cc00>] start_kernel+0x404/0x698
Use irq_alloc_desc_at() to avoid the above problem.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Alexander Sverdlin [Tue, 6 Sep 2022 08:32:39 +0000 (10:32 +0200)]
MIPS: octeon: Get rid of preprocessor directives around RESERVE32
Some of them were pointless because CONFIG_CAVIUM_RESERVE32 is now always
defined, some were not enough (Yu Zhao reported
"Failed to allocate CAVIUM_RESERVE32 memory area" error).
Removing the directives allows for compiler coverage of RESERVE32 code and
replacing one of [always-true] "ifdef" with a compiler conditional fixes
the [cosmetic] error message.
Fixes:
3e3114ac460e ("MIPS: Introduce CAVIUM_RESERVE32 Kconfig option")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
David S. Miller [Wed, 7 Sep 2022 12:44:04 +0000 (13:44 +0100)]
Merge branch 'dsa-felix-fixes'
Vladimir Oltean says:
====================
Fixes for Felix DSA driver calculation of tc-taprio guard bands
This series fixes some bugs which are not quite new, but date from v5.13
when static guard bands were enabled by Michael Walle to prevent
tc-taprio overruns.
The investigation started when Xiaoliang asked privately what is the
expected max SDU for a traffic class when its minimum gate interval is
10 us. The answer, as it turns out, is not an L1 size of 1250 octets,
but 1245 octets, since otherwise, the switch will not consider frames
for egress scheduling, because the static guard band is exactly as large
as the time interval. The switch needs a minimum of 33 ns outside of the
guard band to consider a frame for scheduling, and the reduction of the
max SDU by 5 provides exactly for that.
The fix for that (patch 1/3) is relatively small, but during testing, it
became apparent that cut-through forwarding prevents oversized frame
dropping from working properly. This is solved through the larger patch
2/3. Finally, patch 3/3 fixes one more tc-taprio locking problem found
through code inspection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 5 Sep 2022 17:01:25 +0000 (20:01 +0300)]
net: dsa: felix: access QSYS_TAG_CONFIG under tas_lock in vsc9959_sched_speed_set
The read-modify-write of QSYS_TAG_CONFIG from vsc9959_sched_speed_set()
runs unlocked with respect to the other functions that access it, which
are vsc9959_tas_guard_bands_update(), vsc9959_qos_port_tas_set() and
vsc9959_tas_clock_adjust(). All the others are under ocelot->tas_lock,
so move the vsc9959_sched_speed_set() access under that lock as well, to
resolve the concurrency.
Fixes:
55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 5 Sep 2022 17:01:24 +0000 (20:01 +0300)]
net: dsa: felix: disable cut-through forwarding for frames oversized for tc-taprio
Experimentally, it looks like when QSYS_QMAXSDU_CFG_7 is set to 605,
frames even way larger than 601 octets are transmitted even though these
should be considered as oversized, according to the documentation, and
dropped.
Since oversized frame dropping depends on frame size, which is only
known at the EOF stage, and therefore not at SOF when cut-through
forwarding begins, it means that the switch cannot take QSYS_QMAXSDU_CFG_*
into consideration for traffic classes that are cut-through.
Since cut-through forwarding has no UAPI to control it, and the driver
enables it based on the mantra "if we can, then why not", the strategy
is to alter vsc9959_cut_through_fwd() to take into consideration which
tc's have oversize frame dropping enabled, and disable cut-through for
them. Then, from vsc9959_tas_guard_bands_update(), we re-trigger the
cut-through determination process.
There are 2 strategies for vsc9959_cut_through_fwd() to determine
whether a tc has oversized dropping enabled or not. One is to keep a bit
mask of traffic classes per port, and the other is to read back from the
hardware registers (a non-zero value of QSYS_QMAXSDU_CFG_* means the
feature is enabled). We choose reading back from registers, because
struct ocelot_port is shared with drivers (ocelot, seville) that don't
support either cut-through nor tc-taprio, and we don't have a felix
specific extension of struct ocelot_port. Furthermore, reading registers
from the Felix hardware is quite cheap, since they are memory-mapped.
Fixes:
55a515b1f5a9 ("net: dsa: felix: drop oversized frames with tc-taprio instead of hanging the port")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 5 Sep 2022 17:01:23 +0000 (20:01 +0300)]
net: dsa: felix: tc-taprio intervals smaller than MTU should send at least one packet
The blamed commit broke tc-taprio schedules such as this one:
tc qdisc replace dev $swp1 root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 0x7f 990000 \
sched-entry S 0x80 10000 \
flags 0x2
because the gate entry for TC 7 (S 0x80 10000 ns) now has a static guard
band added earlier than its 'gate close' event, such that packet
overruns won't occur in the worst case of the largest packet possible.
Since guard bands are statically determined based on the per-tc
QSYS_QMAXSDU_CFG_* with a fallback on the port-based QSYS_PORT_MAX_SDU,
we need to discuss what happens with TC 7 depending on kernel version,
since the driver, prior to commit
55a515b1f5a9 ("net: dsa: felix: drop
oversized frames with tc-taprio instead of hanging the port"), did not
touch QSYS_QMAXSDU_CFG_*, and therefore relied on QSYS_PORT_MAX_SDU.
1 (before vsc9959_tas_guard_bands_update): QSYS_PORT_MAX_SDU defaults to
1518, and at gigabit this introduces a static guard band (independent
of packet sizes) of 12144 ns, plus QSYS::HSCH_MISC_CFG.FRM_ADJ (bit
time of 20 octets => 160 ns). But this is larger than the time window
itself, of 10000 ns. So, the queue system never considers a frame with
TC 7 as eligible for transmission, since the gate practically never
opens, and these frames are forever stuck in the TX queues and hang
the port.
2 (after vsc9959_tas_guard_bands_update): Under the sole goal of
enabling oversized frame dropping, we make an effort to set
QSYS_QMAXSDU_CFG_7 to 1230 bytes. But QSYS_QMAXSDU_CFG_7 plays
one more role, which we did not take into account: per-tc static guard
band, expressed in L2 byte time (auto-adjusted for FCS and L1 overhead).
There is a discrepancy between what the driver thinks (that there is
no guard band, and 100% of min_gate_len[tc] is available for egress
scheduling) and what the hardware actually does (crops the equivalent
of QSYS_QMAXSDU_CFG_7 ns out of min_gate_len[tc]). In practice, this
means that the hardware thinks it has exactly 0 ns for scheduling tc 7.
In both cases, even minimum sized Ethernet frames are stuck on egress
rather than being considered for scheduling on TC 7, even if they would
fit given a proper configuration. Considering the current situation,
with vsc9959_tas_guard_bands_update(), frames between 60 octets and 1230
octets in size are not eligible for oversized dropping (because they are
smaller than QSYS_QMAXSDU_CFG_7), but won't be considered as eligible
for scheduling either, because the min_gate_len[7] (10000 ns) minus the
guard band determined by QSYS_QMAXSDU_CFG_7 (1230 octets * 8 ns per
octet == 9840 ns) minus the guard band auto-added for L1 overhead by
QSYS::HSCH_MISC_CFG.FRM_ADJ (20 octets * 8 ns per octet == 160 octets)
leaves 0 ns for scheduling in the queue system proper.
Investigating the hardware behavior, it becomes apparent that the queue
system needs precisely 33 ns of 'gate open' time in order to consider a
frame as eligible for scheduling to a tc. So the solution to this
problem is to amend vsc9959_tas_guard_bands_update(), by giving the
per-tc guard bands less space by exactly 33 ns, just enough for one
frame to be scheduled in that interval. This allows the queue system to
make forward progress for that port-tc, and prevents it from hanging.
Fixes:
297c4de6f780 ("net: dsa: felix: re-enable TAS guard band mode")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Gordeev [Wed, 20 Jul 2022 05:24:03 +0000 (07:24 +0200)]
s390/smp: enforce lowcore protection on CPU restart
As result of commit
915fea04f932 ("s390/smp: enable DAT before
CPU restart callback is called") the low-address protection bit
gets mistakenly unset in control register 0 save area of the
absolute zero memory. That area is used when manual PSW restart
happened to hit an offline CPU. In this case the low-address
protection for that CPU will be dropped.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Fixes:
915fea04f932 ("s390/smp: enable DAT before CPU restart callback is called")
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Alexander Gordeev [Sat, 13 Aug 2022 17:45:21 +0000 (19:45 +0200)]
s390/boot: fix absolute zero lowcore corruption on boot
Crash dump always starts on CPU0. In case CPU0 is offline the
prefix page is not installed and the absolute zero lowcore is
used. However, struct lowcore::mcesad is never assigned and
stays zero. That leads to __machine_kdump() -> save_vx_regs()
call silently stores vector registers to the absolute lowcore
at 0x11b0 offset.
Fixes:
a62bc0739253 ("s390/kdump: add support for vector extension")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>