Linus Torvalds [Sun, 31 Jul 2022 16:21:13 +0000 (09:21 -0700)]
Merge tag 'locking_urgent_for_v5.19' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
- Avoid rwsem lockups in certain situations when handling the handoff
bit
* tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
Linus Torvalds [Sun, 31 Jul 2022 16:12:58 +0000 (09:12 -0700)]
Merge tag 'edac_urgent_for_v5.19' of git://git./linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Relax the condition under which the DIMM label in ghes_edac is set in
order to accomodate an HPE BIOS which sets only the device but not
the bank
- Two forgotten fixes to synopsys_edac when handling error interrupts
* tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/ghes: Set the DIMM label unconditionally
EDAC/synopsys: Re-enable the error interrupts on v3 hw
EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hw
Hongnan Li [Fri, 22 Jul 2022 08:27:32 +0000 (16:27 +0800)]
erofs: update ctx->pos for every emitted dirent
erofs_readdir update ctx->pos after filling a batch of dentries
and it may cause dir/files duplication for NFS readdirplus which
depends on ctx->pos to fill dir correctly. So update ctx->pos for
every emitted dirent in erofs_fill_dentries to fix it.
Also fix the update of ctx->pos when the initial file position has
exceeded nameoff.
Fixes:
3e917cc305c6 ("erofs: make filesystem exportable")
Signed-off-by: Hongnan Li <hongnan.li@linux.alibaba.com>
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220722082732.30935-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Linus Torvalds [Sun, 31 Jul 2022 00:24:16 +0000 (17:24 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Last set of ARM fixes for 5.19:
- fix for MAX_DMA_ADDRESS overflow
- fix for find_*_bit performing an out of bounds memory access"
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: findbit: fix overflowing offset
ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
Waiman Long [Wed, 22 Jun 2022 20:04:19 +0000 (16:04 -0400)]
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
With commit
d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
consistent"), the writer that sets the handoff bit can be interrupted
out without clearing the bit if the wait queue isn't empty. This disables
reader and writer optimistic lock spinning and stealing.
Now if a non-first writer in the queue is somehow woken up or a new
waiter enters the slowpath, it can't acquire the lock. This is not the
case before commit
d257cc8cb8d5 as the writer that set the handoff bit
will clear it when exiting out via the out_nolock path. This is less
efficient as the busy rwsem stays in an unlock state for a longer time.
In some cases, this new behavior may cause lockups as shown in [1] and
[2].
This patch allows a non-first writer to ignore the handoff bit if it
is not originally set or initiated by the first waiter. This patch is
shown to be effective in fixing the lockup problem reported in [1].
[1] https://lore.kernel.org/lkml/
20220617134325.GC30825@techsingularity.net/
[2] https://lore.kernel.org/lkml/
3f02975c-1a9d-be20-32cf-
f1d8e3dfafcc@oracle.com/
Fixes:
d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Donnelly <john.p.donnelly@oracle.com>
Tested-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lore.kernel.org/r/20220622200419.778799-1-longman@redhat.com
Linus Torvalds [Sat, 30 Jul 2022 04:02:35 +0000 (21:02 -0700)]
Merge tag 'mm-hotfixes-stable-2022-07-29' of git://git./linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Two hotfixes, both cc:stable"
* tag 'mm-hotfixes-stable-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/hmm: fault non-owner device private entries
page_alloc: fix invalid watermark check on a negative value
Linus Torvalds [Fri, 29 Jul 2022 23:07:35 +0000 (16:07 -0700)]
Merge tag 'block-5.19-2022-07-29' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"Just a single fix for NVMe, yet another quirk addition"
* tag 'block-5.19-2022-07-29' of git://git.kernel.dk/linux-block:
nvme-pci: Crucial P2 has bogus namespace ids
Linus Torvalds [Fri, 29 Jul 2022 20:25:31 +0000 (13:25 -0700)]
Merge tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"Maxime had the dog^Wmailing list server eat his homework^Wmisc pull
request.
Two more small fixes, one in nouveau svm code and the other in
simpledrm.
nouveau:
- page migration fix
simpledrm:
- fix mode_valid return value"
* tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm:
nouveau/svm: Fix to migrate all requested pages
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
Dave Airlie [Fri, 29 Jul 2022 20:09:48 +0000 (06:09 +1000)]
Merge tag 'drm-misc-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
One fix to fix simpledrm mode_valid return value, and one for page
migration in nouveau
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <maxime@cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20220729094514.sfzhc3gqjgwgal62@penduick
Linus Torvalds [Fri, 29 Jul 2022 20:07:03 +0000 (13:07 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four fixes, three in drivers.
The two biggest fixes are ufs and the remaining driver and core fix
are small and obvious (and the core fix is low risk)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix a race condition related to device management
scsi: core: Fix warning in scsi_alloc_sgtables()
scsi: ufs: host: Hold reference returned by of_parse_phandle()
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
Mark Brown [Fri, 29 Jul 2022 19:22:22 +0000 (20:22 +0100)]
Add SPI Driver to HPE GXP Architecture
Merge series from nick.hawkins@hpe.com <nick.hawkins@hpe.com>:
The GXP supports 3 separate SPI interfaces to accommodate the system
flash, core flash, and other functions. The SPI engine supports variable
clock frequency, selectable 3-byte or 4-byte addressing and a
configurable x1, x2, and x4 command/address/data modes. The memory
buffer for reading and writing ranges between 256 bytes and 8KB. This
driver supports access to the core flash and bios part.
Eiichi Tsukata [Thu, 28 Jul 2022 04:39:07 +0000 (04:39 +0000)]
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
Updates descriptions for "mitigations=off" and "mitigations=auto,nosmt"
with the respective retbleed= settings.
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: corbet@lwn.net
Link: https://lore.kernel.org/r/20220728043907.165688-1-eiichi.tsukata@nutanix.com
Ralph Campbell [Mon, 25 Jul 2022 18:36:14 +0000 (11:36 -0700)]
mm/hmm: fault non-owner device private entries
If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a
device private PTE is found, the hmm_range::dev_private_owner page is used
to determine if the device private page should not be faulted in.
However, if the device private page is not owned by the caller,
hmm_range_fault() returns an error instead of calling migrate_to_ram() to
fault in the page.
For example, if a page is migrated to GPU private memory and a RDMA fault
capable NIC tries to read the migrated page, without this patch it will
get an error. With this patch, the page will be migrated back to system
memory and the NIC will be able to read the data.
Link: https://lkml.kernel.org/r/20220727000837.4128709-2-rcampbell@nvidia.com
Link: https://lkml.kernel.org/r/20220725183615.4118795-2-rcampbell@nvidia.com
Fixes:
08ddddda667b ("mm/hmm: check the device private page owner in hmm_range_fault()")
Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Reported-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jaewon Kim [Mon, 25 Jul 2022 09:52:12 +0000 (18:52 +0900)]
page_alloc: fix invalid watermark check on a negative value
There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.
This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit
f27ce0e14088 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.
If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.
Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.
Handle the negative case by using MIN.
Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com
Fixes:
f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast")
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Reported-by: GyeongHwan Hong <gh21.hong@samsung.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yong-Taek Lee <ytk.lee@samsung.com>
Cc: <stable@vger.kerenl.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Rafael J. Wysocki [Fri, 29 Jul 2022 18:27:37 +0000 (20:27 +0200)]
Merge branches 'acpi-video', 'acpi-pci' and 'acpi-docs'
Merge ACPI backlight driver changes, ACPI changes related to PCI and
ACPI documentation changes for v5.20-rc1:
- Use native backlight on Dell Inspiron N4010 (Hans de Goede).
- Use native backlight on some TongFang devices (Werner Sembach).
- Drop X86 dependency from the ACPI backlight driver Kconfig (Riwen
Lu).
- Shorten the quirk list in the ACPI backlight driver by identifying
Clevo by board_name only (Werner Sembach).
- Remove useless NULL pointer checks from 2 ACPI PCI link management
functions (Andrey Strachuk).
- Fix obsolete example in the ACPI EINJ documentation (Qifu Zhang).
- Update links and references to _DSD-related documents (Sudeep Holla).
* acpi-video:
ACPI: video: Use native backlight on Dell Inspiron N4010
ACPI: video: Shortening quirk list by identifying Clevo by board_name only
ACPI: video: Force backlight native for some TongFang devices
ACPI: video: Drop X86 dependency from Kconfig
* acpi-pci:
ACPI/PCI: Remove useless NULL pointer checks
* acpi-docs:
Documentation: ACPI: EINJ: Fix obsolete example
Documentation: ACPI: Update links and references to DSD related docs
Linus Torvalds [Fri, 29 Jul 2022 18:26:28 +0000 (11:26 -0700)]
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git./linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix addresses for bss symbols, describing variables used in resolving
data access in tools such as 'perf c2c' and 'perf mem'.
- Skip symbols if SHF_ALLOC flag is not set, a technique used for
listing deprecated symbols, its addresses are zeros, so not useful.
- Remove undefined behavior from bpf_perf_object__next() when dealing
with an empty bpf_objects_list list.
- Make a ARM CoreSight disasm script work with both python2 and
python3.
- Sync x86's cpufeatures header with with the kernel sources.
* tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf bpf: Remove undefined behavior from bpf_perf_object__next()
perf symbol: Skip symbols if SHF_ALLOC flag is not set
perf symbol: Correct address for bss symbols
perf scripts python: Let script to be python2 compliant
tools headers cpufeatures: Sync with the kernel sources
Linus Torvalds [Fri, 29 Jul 2022 18:20:40 +0000 (11:20 -0700)]
Merge tag 'wq-for-5.19-rc8-fixes' of git://git./linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"Just one commit to suppress a spurious warning added during the 5.19
cycle"
* tag 'wq-for-5.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Avoid a false warning in unbind_workers()
Rafael J. Wysocki [Fri, 29 Jul 2022 18:16:42 +0000 (20:16 +0200)]
Merge branches 'acpi-pm', 'acpi-soc', 'acpi-tables' and 'acpi-resource'
Merge ACPI power management changes, ACPI LPSS driver changes, ACPI
table parsing code changes and ACPI resource handling changes for
v5.20-rc1:
- Save NVS memory during transitions into S3 on Lenovo G40-45 (Manyi
Li).
- Add support for upcoming AMD uPEP device ID AMDI008 to the ACPI
suspend-to-idle driver for x86 platforms (Shyam Sundar S K).
- Clean up checks related to the ACPI_FADT_LOW_POWER_S0 platform flag
in the LPIT table driver and the suspend-to-idle driver for x86
platforms (Rafael Wysocki).
- Print information messages regarding declared LPS0 idle support in
the platform firmware (Rafael Wysocki).
- Fix missing check in register_device_clock() in the ACPI driver for
Intel SoCs (huhai).
- Fix ACS setup in the VIOT table parser (Eric Auger).
- Skip IRQ override on AMD Zen platforms where it's harmful (Chuanhong
Guo).
* acpi-pm:
ACPI: PM: x86: Print messages regarding LPS0 idle support
ACPI: PM: s2idle: Use LPS0 idle if ACPI_FADT_LOW_POWER_S0 is unset
Revert "ACPI / PM: LPIT: Register sysfs attributes based on FADT"
ACPI: PM: s2idle: Add support for upcoming AMD uPEP HID AMDI008
ACPI: PM: save NVS memory for Lenovo G40-45
* acpi-soc:
ACPI: LPSS: Fix missing check in register_device_clock()
* acpi-tables:
ACPI: VIOT: Fix ACS setup
* acpi-resource:
ACPI: resource: skip IRQ override on AMD Zen platforms
Rafael J. Wysocki [Fri, 29 Jul 2022 18:08:25 +0000 (20:08 +0200)]
Merge branches 'acpi-processor', 'acpi-apei' and 'acpi-ec'
Merge ACPI processor driver changes, APEI changes and ACPI EC driver
changes for v5.20-rc1:
- Drop leftover acpi_processor_get_limit_info() declaration (Riwen Lu).
- Split out thermal initialization from ACPI PSS (Riwen Lu).
- Annotate more functions in the ACPI CPU idle driver to live in the
cpuidle section (Guilherme G. Piccoli).
- Fix _EINJ vs "special purpose" EFI memory regions (Dan Williams).
- Implement a better fix to avoid spamming the console with old error
logs (Tony Luck).
- Fix typo in a comment in the APEI code (Xiang wangx).
- Clean up the ACPI EC driver after previous changes in it (Hans
de Goede).
* acpi-processor:
ACPI: processor: Drop leftover acpi_processor_get_limit_info() declaration
ACPI: processor: Split out thermal initialization from ACPI PSS
ACPI: processor/idle: Annotate more functions to live in cpuidle section
* acpi-apei:
ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP
ACPI: APEI: Better fix to avoid spamming the console with old error logs
ACPI: APEI: Fix double word in a comment
* acpi-ec:
ACPI: EC: Drop unused ident initializers from dmi_system_id tables
ACPI: EC: Re-use boot_ec when possible even when EC_FLAGS_TRUST_DSDT_GPE is set
ACPI: EC: Drop the EC_FLAGS_IGNORE_DSDT_GPE quirk
ACPI: EC: Remove duplicate ThinkPad X1 Carbon 6th entry from DMI quirks
Rafael J. Wysocki [Fri, 29 Jul 2022 17:58:52 +0000 (19:58 +0200)]
Merge branch 'acpi-bus'
Merge ACPI device object management changes for v5.20-rc1.
- Use the facilities provided by the driver core and some additional
helpers to handle the children of a given ACPI device object in
multiple places instead of using the children and node list heads in
struct acpi_device which is error prone (Rafael Wysocki).
- Fix ACPI-related device reference counting issue in the hisi_lpc bus
driver (Yang Yingliang).
- Drop the children and node list heads that are not needed any more
from struct acpi_device (Rafael Wysocki).
- Drop driver member from struct acpi_device (Uwe Kleine-König).
- Drop redundant check from acpi_device_remove() (Uwe Kleine-König).
* acpi-bus:
ACPI: bus: Drop unused list heads from struct acpi_device
hisi_lpc: Use acpi_dev_for_each_child()
bus: hisi_lpc: fix missing platform_device_put() in hisi_lpc_acpi_probe()
ACPI: bus: Drop driver member of struct acpi_device
ACPI: bus: Drop redundant check in acpi_device_remove()
mfd: core: Use acpi_dev_for_each_child()
ACPI / MMC: PM: Unify fixing up device power
soundwire: Use acpi_dev_for_each_child()
platform/x86/thinkpad_acpi: Use acpi_dev_for_each_child()
ACPI: scan: Walk ACPI device's children using driver core
ACPI: bus: Introduce acpi_dev_for_each_child_reverse()
ACPI: video: Use acpi_dev_for_each_child()
ACPI: bus: Export acpi_dev_for_each_child() to modules
ACPI: property: Use acpi_dev_for_each_child() for child lookup
ACPI: container: Use acpi_dev_for_each_child()
USB: ACPI: Replace usb_acpi_find_port() with acpi_find_child_by_adr()
thunderbolt: ACPI: Replace tb_acpi_find_port() with acpi_find_child_by_adr()
ACPI: glue: Introduce acpi_find_child_by_adr()
ACPI: glue: Introduce acpi_dev_has_children()
ACPI: glue: Use acpi_dev_for_each_child()
Linus Torvalds [Fri, 29 Jul 2022 17:57:26 +0000 (10:57 -0700)]
Merge tag 'pm-5.19-rc9' of git://git./linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Make some false positive RCU splats resulting from a recent intel_idle
driver change go away (Waiman Long)"
* tag 'pm-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: Fix false positive RCU splats due to incorrect hardirqs state
Lai Jiangshan [Fri, 29 Jul 2022 09:44:38 +0000 (17:44 +0800)]
workqueue: Avoid a false warning in unbind_workers()
Doing set_cpus_allowed_ptr() with wq_unbound_cpumask can be possible
fails and trigger the false warning.
Use cpu_possible_mask instead when wq_unbound_cpumask has no active CPUs.
It is very easy to trigger the warning:
Set wq_unbound_cpumask to a small set of CPUs.
Offline all the CPUs of wq_unbound_cpumask.
Offline an extra CPU and trigger the warning.
Fixes:
10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs")
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Torvalds [Fri, 29 Jul 2022 17:46:03 +0000 (10:46 -0700)]
Merge tag 'riscv-for-linus-5.19-rc9' of git://git./linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
"A build fix for 'make vdso_install' that avoids an issue trying to
install the compat VDSO"
* tag 'riscv-for-linus-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: compat: vdso: Fix vdso_install target
Rafael J. Wysocki [Fri, 29 Jul 2022 17:46:00 +0000 (19:46 +0200)]
Merge branches 'pm-devfreq', 'pm-qos', 'pm-tools' and 'pm-docs'
Merge devfreq changes, PM QoS change, and power management tools and
documentation changes for v5.20-rc1:
- Add new devfreq driver for Mediatek CCI (Cache Coherent
Interconnect) (Johnson Wang).
- Convert the Samsung Exynos SoC Bus bindings to DT schema of
exynos-bus.c (Krzysztof Kozlowski).
- Address kernel-doc warnings by adding the description for unused
fucntion parameters in devfreq core (Mauro Carvalho Chehab).
- Use NULL to pass a null pointer rather than zero according to the
function propotype in imx-bus.c (Colin Ian King).
- Print error message instead of error interger value in
tegra30-devfreq.c (Dmitry Osipenko).
- Add checks to prevent setting negative frequency QoS limits for
CPUs (Shivnandan Kumar).
- Update the pm-graph suite of utilities to the latest revision 5.9
including multiple improvements (Todd Brandt).
- Drop pme_interrupt reference from the PCI power management
documentation (Mario Limonciello).
* pm-devfreq:
PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
PM / devfreq: shut up kernel-doc warnings
dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
dt-bindings: interconnect: Add MediaTek CCI dt-bindings
* pm-qos:
PM: QoS: Add check to make sure CPU freq is non-negative
* pm-tools:
pm-graph v5.9
* pm-docs:
Documentation: PM: Drop pme_interrupt reference
Rafael J. Wysocki [Fri, 29 Jul 2022 17:33:13 +0000 (19:33 +0200)]
Merge branches 'pm-core', 'pm-sleep', 'powercap', 'pm-domains' and 'pm-em'
Merge core device power management changes for v5.20-rc1:
- Extend support for wakeirq to callback wrappers used during system
suspend and resume (Ulf Hansson).
- Defer waiting for device probe before loading a hibernation image
till the first actual device access to avoid possible deadlocks
reported by syzbot (Tetsuo Handa).
- Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
Helgaas).
- Add Raptor Lake-P to the list of processors supported by the Intel
RAPL driver (George D Sworo).
- Add Alder Lake-N and Raptor Lake-P to the list of processors for
which Power Limit4 is supported in the Intel RAPL driver (Sumeet
Pawnikar).
- Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
attempting to remove it (Hsin-Yi Wang).
- Change the Energy Model code to represent power in micro-Watts and
adjust its users accordingly (Lukasz Luba).
* pm-core:
PM: runtime: Extend support for wakeirq for force_suspend|resume
* pm-sleep:
PM: hibernate: defer device probing when resuming from hibernation
PM: wakeup: Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP
* powercap:
powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
powercap: intel_rapl: Add support for RAPTORLAKE_P
* pm-domains:
PM: domains: Ensure genpd_debugfs_dir exists before remove
* pm-em:
cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
firmware: arm_scmi: Get detailed power scale from perf
Documentation: EM: Switch to micro-Watts scale
PM: EM: convert power field to micro-Watts precision and align drivers
Rafael J. Wysocki [Fri, 29 Jul 2022 17:19:23 +0000 (19:19 +0200)]
Merge branches 'pm-cpufreq' and 'pm-cpuidle'
Merge processor power management changes for v5.20-rc1:
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplug locking from store() used by cpufreq
sysfs attributes (Viresh Kumar).
- Make the ACPI cpufreq driver support the boost control interface on
Zhaoxin/Centaur processors (Tony W Wang-oc).
- Print a warning message on attempts to free an active cpufreq policy
which should never happen (Viresh Kumar).
- Fix grammar in the Kconfig help text for the loongson2 cpufreq
driver (Randy Dunlap).
- Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
governor (Zhao Liu).
- Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
cpuidle driver (Eiichi Tsukata).
- Modify intel_idle to treat C1 and C1E as independent idle states on
Sapphire Rapids (Artem Bityutskiy).
* pm-cpufreq:
cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask
cpufreq: loongson2: fix Kconfig "its" grammar
cpufreq: Warn users while freeing active policy
cpufreq: ACPI: Add Zhaoxin/Centaur turbo boost control interface support
cpufreq: Drop unnecessary cpus locking from store()
cpufreq: Optimize cpufreq_show_cpus()
* pm-cpuidle:
intel_idle: make SPR C1 and C1E be independent
cpuidle: haltpoll: Add trace points for guest_halt_poll_ns grow/shrink
Rafael J. Wysocki [Fri, 29 Jul 2022 17:10:56 +0000 (19:10 +0200)]
Merge tag 'thermal-v5.20-rc1' of git://git./linux/kernel/git/thermal/linux
Pull thermal control changes for 5.20-rc1 from Daniel Lezcano:
"- Make per cpufreq / devfreq cooling device ops instead of using a
global variable, fix comments and rework the trace information
(Lukasz Luba)
- Add the include/dt-bindings/thermal.h under the area covered by the
thermal maintainer in the MAINTAINERS file (Lukas Bulwahn)
- Improve the error output by giving the sensor identification when a
thermal zone failed to initialize, the DT bindings by changing the
positive logic and adding the r8a779f0 support on the rcar3 (Wolfram
Sang)
- Convert the QCom tsens DT binding to the dtsformat format (Krzysztof
Kozlowski)
- Remove the pointless get_trend() function in the QCom, Ux500 and
tegra thermal drivers, along with the unused DROP_FULL and
RAISE_FULL trends definitions. Simplify the code by using clamp()
macros (Daniel Lezcano)
- Fix ref_table memory leak at probe time on the k3_j72xx bandgap
(Bryan Brattlof)
- Fix array underflow in prep_lookup_table (Dan Carpenter)
- Add static annotation to the k3_j72xx_bandgap_j7* data structure
(Jin Xiaoyun)
- Fix typos in comments detected on sun8i by Coccinelle (Julia Lawall)
- Fix typos in comments on rzg2l (Biju Das)
- Remove as unnecessary call to dev_err() as the error is already
printed by the failing function on u8500 (Yang Li)
- Register the thermal zones as hwmon sensors for the Qcom thermal
sensors (Dmitry Baryshkov)
- Fix 'tmon' tool compilation issue by adding phtread.h include
(Markus Mayer)
- Fix typo in the comments for the 'tmon' tool (Slark Xiao)
- Consolidate the thermal core code by beginning to move the thermal
trip structure from the thermal OF code as a generic structure to be
used by the different sensors when registering a thermal zone
(Daniel Lezcano)"
* tag 'thermal-v5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (36 commits)
thermal/of: Initialize trip points separately
thermal/of: Use thermal trips stored in the thermal zone
thermal/core: Add thermal_trip in thermal_zone
thermal/core: Rename 'trips' to 'num_trips'
thermal/core: Move thermal_set_delay_jiffies to static
thermal/core: Remove unneeded EXPORT_SYMBOLS
thermal/of: Move thermal_trip structure to thermal.h
thermal/of: Remove the device node pointer for thermal_trip
thermal/of: Replace device node match with device node search
thermal/core: Remove duplicate information when an error occurs
thermal/core: Avoid calling ->get_trip_temp() unnecessarily
thermal/tools/tmon: Fix typo 'the the' in comment
thermal/tools/tmon: Include pthread and time headers in tmon.h
thermal/ti-soc-thermal: Fix comment typo
thermal/drivers/qcom/spmi-adc-tm5: Register thermal zones as hwmon sensors
thermal/drivers/qcom/temp-alarm: Register thermal zones as hwmon sensors
thermal/drivers/u8500: Remove unnecessary print function dev_err()
thermal/drivers/rzg2l: Fix comments
thermal/drivers/sun8i: Fix typo in comment
thermal/drivers/k3_j72xx_bandgap: Make k3_j72xx_bandgap_j721e_data and k3_j72xx_bandgap_j7200_data static
...
Linus Torvalds [Fri, 29 Jul 2022 17:10:30 +0000 (10:10 -0700)]
Merge tag 'loongarch-fixes-5.19-5' of git://git./linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- Fix cache size calculation, stack protection attributes, ptrace's
fpr_set and "ROM Size" in boardinfo
- Some cleanups and improvements of assembly
- Some cleanups of unused code and useless code
* tag 'loongarch-fixes-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix wrong "ROM Size" of boardinfo
LoongArch: Fix missing fcsr in ptrace's fpr_set
LoongArch: Fix shared cache size calculation
LoongArch: Disable executable stack by default
LoongArch: Remove unused variables
LoongArch: Remove clock setting during cpu hotplug stage
LoongArch: Remove useless header compiler.h
LoongArch: Remove several syntactic sugar macros for branches
LoongArch: Re-tab the assembly files
LoongArch: Simplify "BGT foo, zero" with BGTZ
LoongArch: Simplify "BLT foo, zero" with BLTZ
LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ
LoongArch: Use the "move" pseudo-instruction where applicable
LoongArch: Use the "jr" pseudo-instruction where applicable
LoongArch: Use ABI names of registers where appropriate
Rafael J. Wysocki [Fri, 29 Jul 2022 17:08:18 +0000 (19:08 +0200)]
Merge branch 'thermal-core'
Merge changes that make the thermal core use ida_alloc()/free()
directly instead of ida_simple_get()/ida_simple_remove() that have been
deprecated.
* thermal-core:
thermal: Directly use ida_alloc()/free()
Linus Torvalds [Fri, 29 Jul 2022 16:57:07 +0000 (09:57 -0700)]
Merge tag 'powerpc-5.19-6' of git://git./linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Re-enable the new amdgpu display engine for powerpc, as long as the
compiler is correctly configured.
- Disable stack variable initialisation in prom_init to fix GCC 12
allmodconfig.
Thanks to Dan Horák and Sudip Mukherjee.
* tag 'powerpc-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
drm/amdgpu: Re-enable DCN for 64-bit powerpc
powerpc/64s: Disable stack variable initialisation for prom_init
Nick Hawkins [Thu, 28 Jul 2022 16:14:59 +0000 (11:14 -0500)]
MAINTAINERS: add spi support to GXP
Add the spi driver and dt-binding documentation
Signed-off-by: Nick Hawkins <nick.hawkins@hpe.com>
Link: https://lore.kernel.org/r/20220728161459.7738-6-nick.hawkins@hpe.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Nick Hawkins [Thu, 28 Jul 2022 16:14:56 +0000 (11:14 -0500)]
spi: dt-bindings: add documentation for hpe,gxp-spifi
Create documentation for the hpe,gxp-spifi binding to support access to
the SPI parts
Signed-off-by: Nick Hawkins <nick.hawkins@hpe.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220728161459.7738-3-nick.hawkins@hpe.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Nick Hawkins [Thu, 28 Jul 2022 16:14:55 +0000 (11:14 -0500)]
spi: spi-gxp: Add support for HPE GXP SoCs
The GXP supports 3 separate SPI interfaces to accommodate the system
flash, core flash, and other functions. The SPI engine supports variable
clock frequency, selectable 3-byte or 4-byte addressing and a
configurable x1, x2, and x4 command/address/data modes. The memory
buffer for reading and writing ranges between 256 bytes and 8KB. This
driver supports access to the core flash and bios part.
Signed-off-by: Nick Hawkins <nick.hawkins@hpe.com>
Link: https://lore.kernel.org/r/20220728161459.7738-2-nick.hawkins@hpe.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Rafael J. Wysocki [Fri, 29 Jul 2022 15:15:30 +0000 (17:15 +0200)]
Merge back cpuidle material for 5.20.
Uwe Kleine-König [Tue, 12 Jul 2022 16:15:19 +0000 (18:15 +0200)]
pwm: lpc18xx: Fix period handling
The calculation:
val = (u64)NSEC_PER_SEC * LPC18XX_PWM_TIMER_MAX;
do_div(val, lpc18xx_pwm->clk_rate);
lpc18xx_pwm->max_period_ns = val;
is bogus because with NSEC_PER_SEC =
1000000000,
LPC18XX_PWM_TIMER_MAX = 0xffffffff and clk_rate < NSEC_PER_SEC this
overflows the (on lpc18xx (i.e. ARM32) 32 bit wide) unsigned int
.max_period_ns. This results (dependant of the actual clk rate) in an
arbitrary limitation of the maximal period. E.g. for clkrate =
333333333 (Hz) we get max_period_ns = 9 instead of
12884901897.
So make .max_period_ns an u64 and pass period and duty as u64 to not
discard relevant digits. And also make use of mul_u64_u64_div_u64()
which prevents all overflows assuming clk_rate < NSEC_PER_SEC.
Fixes:
841e6f90bb78 ("pwm: NXP LPC18xx PWM/SCT driver")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Uwe Kleine-König [Tue, 12 Jul 2022 16:15:18 +0000 (18:15 +0200)]
pwm: lpc18xx: Convert to use dev_err_probe()
This has various upsides:
- It emits the symbolic name of the error code
- It is silent in the EPROBE_DEFER case and properly sets the defer reason
- It reduces the number of code lines slightly
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Uwe Kleine-König [Tue, 12 Jul 2022 08:46:56 +0000 (10:46 +0200)]
pwm: twl-led: Document some limitations and link to the reference manual
I found these just from reading the reference manual and the driver
source. It's unclear to me if there are glitches when updating the ON
and OFF registers.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Lee Jones [Thu, 14 Jul 2022 11:25:28 +0000 (12:25 +0100)]
MAINTAINERS: Remove myself as PWM maintainer
Thierry and Uwe are doing a fine job, leaving me surplus to requirement.
Happy to pop back on-board if anything changes in the future.
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: linux-pwm@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Lee Jones <lee@kernel.org>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Lukas Bulwahn [Mon, 13 Jun 2022 12:33:19 +0000 (14:33 +0200)]
MAINTAINERS: Add include/dt-bindings/pwm to PWM SUBSYSTEM
Maintainers of the directory Documentation/devicetree/bindings/pwm
are also the maintainers of the corresponding directory
include/dt-bindings/pwm.
Add the file entry for include/dt-bindings/pwm to the appropriate
section in MAINTAINERS.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Fabien Parent [Tue, 31 May 2022 11:45:43 +0000 (13:45 +0200)]
dt-bindings: pwm: mediatek: Add compatible string for MT8195
MT8195's PWM IP is compatible with the MT8183 PWM IP.
Signed-off-by: Fabien Parent <fparent@baylibre.com>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Nikita Travkin [Mon, 11 Jul 2022 20:33:40 +0000 (01:33 +0500)]
pwm: Add clock based PWM output driver
Some systems have clocks exposed to external devices. If the clock
controller supports duty-cycle configuration, such clocks can be used as
pwm outputs. In fact PWM and CLK subsystems are interfaced with in a
similar way and an "opposite" driver already exists (clk-pwm). Add a
driver that would enable pwm devices to be used via clk subsystem.
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Nikita Travkin <nikita@trvn.ru>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Nikita Travkin [Mon, 11 Jul 2022 20:33:39 +0000 (01:33 +0500)]
dt-bindings: pwm: Document clk based PWM controller
Add YAML devicetree binding for clk based PWM controller
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Nikita Travkin <nikita@trvn.ru>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Uwe Kleine-König [Thu, 21 Jul 2022 10:31:29 +0000 (12:31 +0200)]
pwm: sifive: Shut down hardware only after pwmchip_remove() completed
The PWMs are expected to be functional until pwmchip_remove() is called.
So disable the clks only afterwards.
Fixes:
9e37a53eb051 ("pwm: sifive: Add a driver for SiFive SoC PWM")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Uwe Kleine-König [Thu, 21 Jul 2022 10:31:28 +0000 (12:31 +0200)]
pwm: sifive: Ensure the clk is enabled exactly once per running PWM
.apply() assumes the clk to be for a given PWM iff the PWM is enabled.
So make sure this is the case when .probe() completes. And in .remove()
disable the according number of times.
This fixes a clk enable/disable imbalance, if some PWMs are already running
at probe time.
Fixes:
9e37a53eb051 (pwm: sifive: Add a driver for SiFive SoC PWM)
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Uwe Kleine-König [Thu, 21 Jul 2022 10:31:27 +0000 (12:31 +0200)]
pwm: sifive: Simplify clk handling
The clk is necessary for both register access and (enabled) operation of
the PWM. Instead of
clk_enable()
update_hw()
if pwm_got_enabled():
clk_enable()
elif pwm_got_disabled():
clk_disable()
clk_disable()
which is some cases only calls clk_enable() to immediately afterwards
call clk_disable again, do:
if (!prev_state.enabled)
clk_enable()
# clk enabled exactly once
update_hw()
if (!next_state.enabled)
clk_disable()
which is much easier.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Uwe Kleine-König [Thu, 21 Jul 2022 10:31:26 +0000 (12:31 +0200)]
pwm: sifive: Enable clk only after period check in .apply()
For the period check and the initial calculations of register values there
is no hardware access needed. So delay enabling the clk a bit to simplify
the code flow a bit.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Uwe Kleine-König [Thu, 21 Jul 2022 10:31:25 +0000 (12:31 +0200)]
pwm: sifive: Reduce time the controller lock is held
The lock is only to serialize access and update to user_count and
approx_period between different PWMs served by the same pwm_chip.
So the lock needs only to be taken during the check if the (chip global)
period can and/or needs to be changed.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Uwe Kleine-König [Thu, 21 Jul 2022 10:31:24 +0000 (12:31 +0200)]
pwm: sifive: Fold pwm_sifive_enable() into its only caller
There is only a single caller of pwm_sifive_enable() which only enables
or disables the clk. Put this implementation directly into
pwm_sifive_apply() which allows further simplification in the next
change.
There is no change in behaviour.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Uwe Kleine-König [Thu, 21 Jul 2022 10:31:23 +0000 (12:31 +0200)]
pwm: sifive: Simplify offset calculation for PWMCMP registers
Instead of explicitly using PWM_SIFIVE_PWMCMP0 + pwm->hwpwm *
PWM_SIFIVE_SIZE_PWMCMP for each access to one of the PWMCMP registers,
introduce a macro that takes the hwpwm id as parameter.
For the register definition using a plain 4 instead of the cpp constant
PWM_SIFIVE_SIZE_PWMCMP is easier to read, so define the offset macro
without the constant. The latter can then be dropped as there are no
users left.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Tiezhu Yang [Thu, 21 Jul 2022 09:53:01 +0000 (17:53 +0800)]
LoongArch: Fix wrong "ROM Size" of boardinfo
We can see the "ROM Size" is different in the following outputs:
[root@linux loongson]# cat /sys/firmware/loongson/boardinfo
BIOS Information
Vendor : Loongson
Version : vUDK2018-LoongArch-V2.0.pre-beta8
ROM Size : 63 KB
Release Date : 06/15/2022
Board Information
Manufacturer : Loongson
Board Name : Loongson-LS3A5000-7A1000-1w-A2101
Family : LOONGSON64
[root@linux loongson]# dmidecode | head -11
...
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: Loongson
Version: vUDK2018-LoongArch-V2.0.pre-beta8
Release Date: 06/15/2022
ROM Size: 4 MB
According to "BIOS Information (Type 0) structure" in the SMBIOS
Reference Specification [1], it shows 64K * (n+1) is the size of
the physical device containing the BIOS if the size is less than
16M.
Additionally, we can see the related code in dmidecode [2]:
u64 s = { .l = (code1 + 1) << 6 };
So the output of dmidecode is correct, the output of boardinfo
is wrong, fix it.
By the way, at present no need to consider the size is 16M or
greater on LoongArch, because it is usually 4M or 8M which is
enough to use.
[1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.6.0.pdf
[2] https://git.savannah.nongnu.org/cgit/dmidecode.git/tree/dmidecode.c#n347
Fixes:
628c3bb40e9a ("LoongArch: Add boot and setup routines")
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Qi Hu [Thu, 14 Jul 2022 06:25:50 +0000 (14:25 +0800)]
LoongArch: Fix missing fcsr in ptrace's fpr_set
In file ptrace.c, function fpr_set does not copy fcsr data from ubuf
to kbuf. That's the reason why fcsr cannot be modified by ptrace.
This patch fixs this problem and allows users using ptrace to modify
the fcsr.
Co-developed-by: Xu Li <lixu@loongson.cn>
Signed-off-by: Qi Hu <huqi@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Wed, 13 Jul 2022 10:00:41 +0000 (18:00 +0800)]
LoongArch: Fix shared cache size calculation
Current calculation of shared cache size is from the node (die) scope,
but we hope 'lscpu' to show the shared cache size of the whole package
for multi-die chips (e.g., Loongson-3C5000L, which contains 4 dies in
one package). So fix it by multiplying nodes_per_package.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Huacai Chen [Tue, 26 Jul 2022 12:43:11 +0000 (20:43 +0800)]
LoongArch: Disable executable stack by default
Disable executable stack for LoongArch by default, as all modern
architectures do.
Reported-by: Andreas Schwab <schwab@suse.de>
Suggested-by: WANG Xuerui <git@xen0n.name>
Link: https://sourceware.org/pipermail/binutils/2022-July/121992.html
Tested-by: WANG Xuerui <git@xen0n.name>
Tested-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Wed, 20 Jul 2022 07:21:52 +0000 (15:21 +0800)]
LoongArch: Remove unused variables
There are some variables never used or referenced, this patch
removes these varaibles and make the code cleaner.
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Bibo Mao [Wed, 20 Jul 2022 07:21:51 +0000 (15:21 +0800)]
LoongArch: Remove clock setting during cpu hotplug stage
On physical machine we can save power by disabling clock of hot removed
cpu. However as different platforms require different methods to
configure clocks, the code is platform-specific, and probably belongs to
firmware/pmu or cpu regulator, rather than generic arch/loongarch code.
Also, there is no such register on QEMU virt machine since the
clock/frequency regulation is not emulated.
This patch removes the hard-coded clock register accesses in generic
LoongArch cpu hotplug flow.
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Jun Yi [Thu, 21 Jul 2022 11:10:49 +0000 (19:10 +0800)]
LoongArch: Remove useless header compiler.h
The content of LoongArch's compiler.h is trivial, with some unused
anywhere, so inline the definitions and remove the header.
Signed-off-by: Jun Yi <yijun@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:15 +0000 (23:57 +0800)]
LoongArch: Remove several syntactic sugar macros for branches
These syntactic sugars have been supported by upstream binutils from the
beginning, so no need to patch them locally.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:22 +0000 (23:57 +0800)]
LoongArch: Re-tab the assembly files
Reflow the *.S files for better stylistic consistency, namely hard tabs
after mnemonic position, and vertical alignment of the first operand
with hard tabs. Tab width is obviously 8. Some pre-existing intra-block
vertical alignments are preserved.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:21 +0000 (23:57 +0800)]
LoongArch: Simplify "BGT foo, zero" with BGTZ
Support for the syntactic sugar is present in upstream binutils port
from the beginning. Use it for shorter lines and better consistency.
Generated code should be identical.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:20 +0000 (23:57 +0800)]
LoongArch: Simplify "BLT foo, zero" with BLTZ
Support for the syntactic sugar is present in upstream binutils port
from the beginning. Use it for shorter lines and better consistency.
Generated code should be identical.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:19 +0000 (23:57 +0800)]
LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ
While B{EQ,NE}Z and B{EQ,NE} are different instructions, and the vastly
expanded range for branch destination does not really matter in the few
cases touched, use the B{EQ,NE}Z where possible for shorter lines and
better consistency (e.g. some places used "BEQ foo, zero", while some
used "BEQ zero, foo").
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:18 +0000 (23:57 +0800)]
LoongArch: Use the "move" pseudo-instruction where applicable
Some of the assembly code in the LoongArch port likely originated
from a time when the assembler did not support pseudo-instructions like
"move" or "jr", so the desugared form was used and readability suffers
(to a minor degree) as a result.
As the upstream toolchain supports these pseudo-instructions from the
beginning, migrate the existing few usages to them for better
readability.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:17 +0000 (23:57 +0800)]
LoongArch: Use the "jr" pseudo-instruction where applicable
Some of the assembly code in the LoongArch port likely originated
from a time when the assembler did not support pseudo-instructions like
"move" or "jr", so the desugared form was used and readability suffers
(to a minor degree) as a result.
As the upstream toolchain supports these pseudo-instructions from the
beginning, migrate the existing few usages to them for better
readability.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:16 +0000 (23:57 +0800)]
LoongArch: Use ABI names of registers where appropriate
Some of the assembly in the LoongArch port seem to come from a
prehistoric time, when the assembler didn't even have support for the
ABI names we all come to know and love, thus used raw register numbers
which hampered readability.
The usages are found with a regex match inside arch/loongarch, then
manually adjusted for those non-definitions.
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Russell King (Oracle) [Tue, 26 Jul 2022 22:51:48 +0000 (23:51 +0100)]
ARM: findbit: fix overflowing offset
When offset is larger than the size of the bit array, we should not
attempt to access the array as we can perform an access beyond the
end of the array. Fix this by changing the pre-condition.
Using "cmp r2, r1; bhs ..." covers us for the size == 0 case, since
this will always take the branch when r1 is zero, irrespective of
the value of r2. This means we can fix this bug without adding any
additional code!
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Thadeu Lima de Souza Cascardo [Thu, 28 Jul 2022 12:26:02 +0000 (09:26 -0300)]
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Some cloud hypervisors do not provide IBPB on very recent CPU processors,
including AMD processors affected by Retbleed.
Using IBPB before firmware calls on such systems would cause a GPF at boot
like the one below. Do not enable such calls when IBPB support is not
present.
EFI Variables Facility v0.08 2004-May-17
general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: efi_rts_wq efi_call_rts
RIP: 0010:efi_call_rts
Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48
RSP: 0018:
ffffb373800d7e38 EFLAGS:
00010246
RAX:
0000000000000001 RBX:
0000000000000006 RCX:
0000000000000049
RDX:
0000000000000000 RSI:
ffff94fbc19d8fe0 RDI:
ffff94fbc1b2b300
RBP:
ffffb373800d7e70 R08:
0000000000000000 R09:
0000000000000000
R10:
000000000000000b R11:
000000000000000b R12:
ffffb3738001fd78
R13:
ffff94fbc2fcfc00 R14:
ffffb3738001fd80 R15:
0000000000000048
FS:
0000000000000000(0000) GS:
ffff94fc3da00000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
ffff94fc30201000 CR3:
000000006f610000 CR4:
00000000000406f0
Call Trace:
<TASK>
? __wake_up
process_one_work
worker_thread
? rescuer_thread
kthread
? kthread_complete_and_exit
ret_from_fork
</TASK>
Modules linked in:
Fixes:
28a99e95f55c ("x86/amd: Use IBPB for firmware calls")
Reported-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220728122602.2500509-1-cascardo@canonical.com
Linus Torvalds [Fri, 29 Jul 2022 03:34:59 +0000 (20:34 -0700)]
Merge tag 'drm-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drm
Pull drm fix from Dave Airlie:
"Quiet extra week, just a single fix for i915 workaround with execlist
backend.
i915:
- Further reset robustness improvements for execlists [Wa_22011802037]"
* tag 'drm-fixes-2022-07-29' of git://anongit.freedesktop.org/drm/drm:
drm/i915/reset: Add additional steps for Wa_22011802037 for execlist backend
Dave Airlie [Fri, 29 Jul 2022 01:39:13 +0000 (11:39 +1000)]
Merge tag 'drm-intel-fixes-2022-07-28-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Further reset robustness improvements for execlists [Wa_22011802037] (Umesh Nerlige Ramappa)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/YuJIWaEbKcs/q0NY@tursulin-desk
Mike Snitzer [Wed, 20 Jul 2022 17:58:04 +0000 (13:58 -0400)]
dm: fix dm-raid crash if md_handle_request() splits bio
Commit
ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone")
introduced the optimization to _not_ perform bio_associate_blkg()'s
relatively costly work when DM core clones its bio. But in doing so it
exposed the possibility for DM's cloned bio to alter DM target
behavior (e.g. crash) if a target were to issue IO without first
calling bio_set_dev().
The DM raid target can trigger an MD crash due to its need to split
the DM bio that is passed to md_handle_request(). The split will
recurse to submit_bio_noacct() using a bio with an uninitialized
->bi_blkg. This NULL bio->bi_blkg causes blk_throtl_bio() to
dereference a NULL blkg_to_tg(bio->bi_blkg).
Fix this in DM core by adding a new 'needs_bio_set_dev' target flag that
will make alloc_tio() call bio_set_dev() on behalf of the target.
dm-raid is the only target that requires this flag. bio_set_dev()
initializes the DM cloned bio's ->bi_blkg, using bio_associate_blkg,
before passing the bio to md_handle_request().
Long-term fix would be to audit and refactor MD code to rely on DM to
split its bio, using dm_accept_partial_bio(), but there are MD raid
personalities (e.g. raid1 and raid10) whose implementation are tightly
coupled to handling the bio splitting inline.
Fixes:
ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone")
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka [Sun, 24 Jul 2022 18:33:52 +0000 (14:33 -0400)]
dm raid: fix address sanitizer warning in raid_resume
There is a KASAN warning in raid_resume when running the lvm test
lvconvert-raid.sh. The reason for the warning is that mddev->raid_disks
is greater than rs->raid_disks, so the loop touches one entry beyond
the allocated length.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mikulas Patocka [Sun, 24 Jul 2022 18:31:35 +0000 (14:31 -0400)]
dm raid: fix address sanitizer warning in raid_status
There is this warning when using a kernel with the address sanitizer
and running this testsuite:
https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/swraid/scsi_raid
==================================================================
BUG: KASAN: slab-out-of-bounds in raid_status+0x1747/0x2820 [dm_raid]
Read of size 4 at addr
ffff888079d2c7e8 by task lvcreate/13319
CPU: 0 PID: 13319 Comm: lvcreate Not tainted 5.18.0-0.rc3.<snip> #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0x6a/0x9c
print_address_description.constprop.0+0x1f/0x1e0
print_report.cold+0x55/0x244
kasan_report+0xc9/0x100
raid_status+0x1747/0x2820 [dm_raid]
dm_ima_measure_on_table_load+0x4b8/0xca0 [dm_mod]
table_load+0x35c/0x630 [dm_mod]
ctl_ioctl+0x411/0x630 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
__x64_sys_ioctl+0x12a/0x1a0
do_syscall_64+0x5b/0x80
The warning is caused by reading conf->max_nr_stripes in raid_status. The
code in raid_status reads mddev->private, casts it to struct r5conf and
reads the entry max_nr_stripes.
However, if we have different raid type than 4/5/6, mddev->private
doesn't point to struct r5conf; it may point to struct r0conf, struct
r1conf, struct r10conf or struct mpconf. If we cast a pointer to one
of these structs to struct r5conf, we will be reading invalid memory
and KASAN warns about it.
Fix this bug by reading struct r5conf only if raid type is 4, 5 or 6.
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Christie [Sun, 17 Jul 2022 22:45:08 +0000 (17:45 -0500)]
dm: Start pr_preempt from the same starting path
pr_preempt has a similar issue as reserve where for all the
reservation types except the All Registrants ones the preempt can
create a reservation. And a follow up reservation or release needs to
go down the same path the preempt did. This has the pr_preempt work
like reserve and release where we always start from the first path in
the first group.
This commit has been tested with windows failover clustering's
validation test and libiscsi's PGR tests to check for regressions.
They both don't have tests to verify this case, so I tested it
manually.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Christie [Sun, 17 Jul 2022 22:45:07 +0000 (17:45 -0500)]
dm: Fix PR release handling for non All Registrants
This commit fixes a bug where we are leaving the reservation in place
even though pr_release has run and returned success.
If we have a Write Exclusive, Exclusive Access, or Write/Exclusive
Registrants only reservation, the release must be sent down the path
that is the reservation holder. The problem is multipath_prepare_ioctl
most likely selected path N for the reservation, then later when we do
the release multipath_prepare_ioctl will select a completely different
path. The device will then return success becuase the nvme and scsi
specs say to return success if there is no reservation or if the
release is sent down from a path that is not the holder. We then think
we have released the reservation.
This commit has us loop over each path and send a release so we can
make sure the release is executed on the correct path. It has been
tested with windows failover clustering's validation test which checks
this case, and it has been tested manually (the libiscsi PGR tests
don't have a test case for this yet, but I will be adding one).
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Christie [Sun, 17 Jul 2022 22:45:06 +0000 (17:45 -0500)]
dm: Start pr_reserve from the same starting path
When an app does a pr_reserve it will go to whatever path we happen to
be using at the time. This can result in errors when the app does a
second pr_reserve call and expects success but gets a failure because
the reserve is not done on the holder's path. This commit has us
always start trying to do reserves from the first path in the first
group.
Windows failover clustering will produce the type of pattern above.
With this commit, we will now pass its validation test for this case.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Christie [Sun, 17 Jul 2022 22:45:05 +0000 (17:45 -0500)]
dm: Allow dm_call_pr to be used for path searches
The specs state that if you send a reserve down a path that is already
the holder success must be returned and if it goes down a path that
is not the holder reservation conflict must be returned. Windows
failover clustering will send a second reservation and expects that a
device returns success. The problem for multipathing is that for an
All Registrants reservation, we can send the reserve down any path but
for all other reservation types there is one path that is the holder.
To handle this we could add PR state to dm but that can get nasty.
Look at target_core_pr.c for an example of the type of things we'd
have to track. It will also get more complicated because other
initiators can change the state so we will have to add in async
event/sense handling.
This commit, and the 3 commits that follow, tries to keep dm simple
and keep just doing passthrough. This commit modifies dm_call_pr to be
able to find the first usable path that can execute our pr_op then
return. When dm_pr_reserve is converted to dm_call_pr in the next
commit for the normal case we will use the same path for every
reserve.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Mike Snitzer [Fri, 22 Jul 2022 19:31:23 +0000 (15:31 -0400)]
dm: return early from dm_pr_call() if DM device is suspended
Otherwise PR ops may be issued while the broader DM device is being
reconfigured, etc.
Fixes:
9c72bad1f31a ("dm: call PR reserve/unreserve on each underlying device")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Alistair Popple [Wed, 20 Jul 2022 06:27:45 +0000 (16:27 +1000)]
nouveau/svm: Fix to migrate all requested pages
Users may request that pages from an OpenCL SVM allocation be migrated
to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
nouveau_dmem_migrate_vma() to do the migration. If the total range to be
migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
starting address means that only the first chunk will get migrated.
Fix the calculation so that the entire range will get migrated if
possible.
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Fixes:
e3d8b0890469 ("drm/nouveau/svm: map pages after migration")
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220720062745.960701-1-apopple@nvidia.com
Cc: <stable@vger.kernel.org> # v5.8+
Linus Torvalds [Thu, 28 Jul 2022 18:54:59 +0000 (11:54 -0700)]
Merge tag 'net-5.19-final' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth and netfilter, no known blockers for
the release.
Current release - regressions:
- wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop(), fix
taking the lock before its initialized
- Bluetooth: mgmt: fix double free on error path
Current release - new code bugs:
- eth: ice: fix tunnel checksum offload with fragmented traffic
Previous releases - regressions:
- tcp: md5: fix IPv4-mapped support after refactoring, don't take the
pure v6 path
- Revert "tcp: change pingpong threshold to 3", improving detection
of interactive sessions
- mld: fix netdev refcount leak in mld_{query | report}_work() due to
a race
- Bluetooth:
- always set event mask on suspend, avoid early wake ups
- L2CAP: fix use-after-free caused by l2cap_chan_put
- bridge: do not send empty IFLA_AF_SPEC attribute
Previous releases - always broken:
- ping6: fix memleak in ipv6_renew_options()
- sctp: prevent null-deref caused by over-eager error paths
- virtio-net: fix the race between refill work and close, resulting
in NAPI scheduled after close and a BUG()
- macsec:
- fix three netlink parsing bugs
- avoid breaking the device state on invalid change requests
- fix a memleak in another error path
Misc:
- dt-bindings: net: ethernet-controller: rework 'fixed-link' schema
- two more batches of sysctl data race adornment"
* tag 'net-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
stmmac: dwmac-mediatek: fix resource leak in probe
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
net: ping6: Fix memleak in ipv6_renew_options().
net/funeth: Fix fun_xdp_tx() and XDP packet reclaim
sctp: leave the err path free in sctp_stream_init to sctp_stream_free
sfc: disable softirqs for ptp TX
ptp: ocp: Select CRC16 in the Kconfig.
tcp: md5: fix IPv4-mapped support
virtio-net: fix the race between refill work and close
mptcp: Do not return EINPROGRESS when subflow creation succeeds
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Bluetooth: Always set event mask on suspend
Bluetooth: mgmt: Fix double free on error path
wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
ice: do not setup vlan for loopback VSI
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: Fix VSIs unable to share unicast MAC
ice: Fix tunnel checksum offload with fragmented traffic
ice: Fix max VLANs available for VF
netfilter: nft_queue: only allow supported familes and hooks
...
Len Brown [Thu, 28 Jul 2022 18:38:55 +0000 (14:38 -0400)]
tools/power turbostat: version 2022.07.28
update version number
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Tue, 26 Jul 2022 15:29:35 +0000 (18:29 +0300)]
tools/power turbostat: do not decode ACC for ICX and SPR
The ACC (automatic C-state conversion) feature was available on Sky Lake and
Cascade Lake Xeons (SKX and CLX), but it is not available on Ice Lake and
Sapphire Rapids Xeons (ICX and SPR). Therefore, stop decoding it for ICX and
SPR.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Tue, 26 Jul 2022 15:29:34 +0000 (18:29 +0300)]
tools/power turbostat: fix SPR PC6 limits
Sapphire Rapids Xeon (SPR) supports 2 flavors of PC6 - PC6N (non-retention) and
PC6R (retention). Before this patch we used ICX package C-state limits, which
was wrong, because ICX has only one PC6 flavor. With this patch, we use SKX PC6
limits for SPR, because they are the same.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Tue, 26 Jul 2022 15:29:33 +0000 (18:29 +0300)]
tools/power turbostat: cleanup 'automatic_cstate_conversion_probe()'
The 'automatic_cstate_conversion_probe()' function has a too long 'if'
statement, convert it to a 'switch' statement in order to improve code
readability a bit.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Artem Bityutskiy [Tue, 26 Jul 2022 15:29:32 +0000 (18:29 +0300)]
tools/power turbostat: separate SPR from ICX
Before this patch, SPR platform was considered identical to ICX platform. This
patch separates SPR support from ICX.
This patch is a preparation for adding SPR-specific package C-state limits
support.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Jiang Jian [Tue, 21 Jun 2022 08:16:17 +0000 (16:16 +0800)]
tools/power turbosstat: fix comment
remove duplicate "the" in comment
Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
Signed-off-by: Len Brown <len.brown@intel.com>
George D Sworo [Wed, 1 Jun 2022 21:49:23 +0000 (14:49 -0700)]
tools/power turbostat: Support RAPTORLAKE P
Add initial support for Raptorlake model
Signed-off-by: George D Sworo <george.d.sworo@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Fri, 13 May 2022 05:02:29 +0000 (13:02 +0800)]
tools/power turbostat: add support for ALDERLAKE_N
Add support for ALDERLAKE_N platform.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 1 Jun 2022 03:29:13 +0000 (17:29 -1000)]
tools/power turbostat: dump secondary Turbo-Ratio-Limit
Intel Performance Hybrid processors have a 2nd MSR
describing the turbo limits enforced on the Ecores.
Note, TRL and Secondary-TRL are usually R/O information,
but on overclock-capable parts, they can be written.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 1 Jun 2022 03:21:06 +0000 (17:21 -1000)]
tools/power turbostat: simplify dump_turbo_ratio_limits()
code cleanup only.
no functional change.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Wed, 1 Jun 2022 03:08:25 +0000 (17:08 -1000)]
tools/power turbostat: dump CPUID.7.EDX.Hybrid
CPUID leaf 7 EDX now tells us if the processor has hybrid CPUs
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 13 May 2022 07:35:39 +0000 (21:35 -1000)]
tools/power turbostat: update turbostat.8
Update turbostat.8 to reflect new uncore frequency output (UncMHz)
Also, refresh examples.
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 13 May 2022 07:35:39 +0000 (21:35 -1000)]
tools/power turbostat: Show uncore frequency
When CONFIG_INTEL_UNCORE_FREQ_CONTROL is effective,
(Linux 5.9 and later), print the current (and default)
min and max uncore frequency limits.
When that driver provides the current uncore frequency
(Linux 5.18 and later), print a UncMHz column
reflecting the current uncore frequency.
Note that UncMHz is an instantaneous sample, not an average.
eg.
$ sudo ./turbostat -S --show frequency
...
Uncore Frequency pkg0 die0: 800 - 3900 MHz (800 - 3900 MHz)
...
Avg_MHz Busy% Bzy_MHz TSC_MHz UncMHz
28 0.70 4049 3095 3900
Signed-off-by: Len Brown <len.brown@intel.com>
Colin Ian King [Tue, 26 Apr 2022 13:16:07 +0000 (14:16 +0100)]
tools/power turbostat: Fix file pointer leak
Currently if a fscanf fails then an early return leaks an open
file pointer. Fix this by fclosing the file before the return.
Detected using static analysis with cppcheck:
tools/power/x86/turbostat/turbostat.c:2039:3: error: Resource leak: fp [resourceLeak]
Fixes:
eae97e053fe3 ("tools/power turbostat: Support thermal throttle count print")
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Colin Ian King [Tue, 26 Apr 2022 13:10:24 +0000 (14:10 +0100)]
tools/power turbostat: replace strncmp with single character compare
Using strncmp for a single character comparison is overly complicated,
just use a simpler single character comparison instead. Also stops
static analyzers (such as cppcheck) from complaining about strncmp on
non-null terminated strings.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Chen Yu [Thu, 21 Apr 2022 14:38:34 +0000 (22:38 +0800)]
tools/power turbostat: print the kernel boot commandline
It would be handy to have cmdline in turbostat output. For example,
according to the turbostat output, there are no C-states requested.
In this case the user is very curious if something like
intel_idle.max_cstate=0 was used, or may be idle=none too. It is
also curious whether things like intel_pstate=nohwp were used.
Print the boot command line accordingly:
turbostat version 21.05.04 - Len Brown <lenb@kernel.org>
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.16.0+ root=UUID=
b42359ed-1e05-42eb-8757-
6bf2a1c19070 ro quiet splash vt.handoff=7
Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Thu, 21 Apr 2022 14:19:08 +0000 (22:19 +0800)]
tools/power turbostat: Introduce support for RaptorLake
RaptorLake is compatible with AlderLake.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Dan Carpenter [Thu, 28 Jul 2022 11:52:09 +0000 (14:52 +0300)]
stmmac: dwmac-mediatek: fix resource leak in probe
If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt()
before returning. Otherwise it is a resource leak.
Fixes:
fa4b3ca60e80 ("stmmac: dwmac-mediatek: fix clock issue")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ziyang Xuan [Thu, 28 Jul 2022 01:33:07 +0000 (09:33 +0800)]
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
device while matching route. That may trigger null-ptr-deref bug
for ip6_ptr probability as following.
=========================================================
BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
Read of size 4 at addr
0000000000000308 by task ping6/263
CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14
Call trace:
dump_backtrace+0x1a8/0x230
show_stack+0x20/0x70
dump_stack_lvl+0x68/0x84
print_report+0xc4/0x120
kasan_report+0x84/0x120
__asan_load4+0x94/0xd0
find_match.part.0+0x70/0x134
__find_rr_leaf+0x408/0x470
fib6_table_lookup+0x264/0x540
ip6_pol_route+0xf4/0x260
ip6_pol_route_output+0x58/0x70
fib6_rule_lookup+0x1a8/0x330
ip6_route_output_flags_noref+0xd8/0x1a0
ip6_route_output_flags+0x58/0x160
ip6_dst_lookup_tail+0x5b4/0x85c
ip6_dst_lookup_flow+0x98/0x120
rawv6_sendmsg+0x49c/0xc70
inet_sendmsg+0x68/0x94
Reproducer as following:
Firstly, prepare conditions:
$ip netns add ns1
$ip netns add ns2
$ip link add veth1 type veth peer name veth2
$ip link set veth1 netns ns1
$ip link set veth2 netns ns2
$ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
$ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
$ip netns exec ns1 ifconfig veth1 up
$ip netns exec ns2 ifconfig veth2 up
$ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
$ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1
Secondly, execute the following two commands in two ssh windows
respectively:
$ip netns exec ns1 sh
$while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done
$ip netns exec ns1 sh
$while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done
It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.
cpu0 cpu1
fib6_table_lookup
__find_rr_leaf
addrconf_notify [ NETDEV_CHANGEMTU ]
addrconf_ifdown
RCU_INIT_POINTER(dev->ip6_ptr, NULL)
find_match
ip6_ignore_linkdown
So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to
fix the null-ptr-deref bug.
Fixes:
dcd1f572954f ("net/ipv6: Remove fib6_idev")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kuniyuki Iwashima [Thu, 28 Jul 2022 01:22:20 +0000 (18:22 -0700)]
net: ping6: Fix memleak in ipv6_renew_options().
When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy(). As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
struct ipv6_sr_hdr *hdr;
char data[24] = {0};
int fd;
hdr = (struct ipv6_sr_hdr *)data;
hdr->hdrlen = 2;
hdr->type = IPV6_SRCRT_TYPE_4;
fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP);
setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24);
close(fd);
To fix memory leaks, let's add a destroy function.
Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range. The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0). Thus, the local DoS does
not succeed until we change the default value. However, at least
Ubuntu/Fedora/RHEL loosen it.
$ cat /usr/lib/sysctl.d/50-default.conf
...
-net.ipv4.ping_group_range = 0
2147483647
Also, there could be another path reported with these options, and
some of them require CAP_NET_RAW.
setsockopt
IPV6_ADDRFORM (inet6_sk(sk)->pktoptions)
IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu)
IPV6_HOPOPTS (inet6_sk(sk)->opt)
IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt)
IPV6_RTHDR (inet6_sk(sk)->opt)
IPV6_DSTOPTS (inet6_sk(sk)->opt)
IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt)
getsockopt
IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list)
For the record, I left a different splat with syzbot's one.
unreferenced object 0xffff888006270c60 (size 96):
comm "repro2", pid 231, jiffies
4294696626 (age 13.118s)
hex dump (first 32 bytes):
01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<
00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554)
[<
000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715)
[<
00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024)
[<
000000007096a025>] __sys_setsockopt (net/socket.c:2254)
[<
000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262)
[<
000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[<
00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[0]: https://syzkaller.appspot.com/bug?extid=
a8430774139ec3ab7176
Fixes:
6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com
Reported-by: Ayushman Dutta <ayudutta@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 28 Jul 2022 09:31:12 +0000 (10:31 +0100)]
watch_queue: Fix missing locking in add_watch_to_object()
If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.
KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.
While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list. The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.
Fix this by locking the watcher list in add_watch_to_object().
Fixes:
c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: keyrings@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 28 Jul 2022 09:31:06 +0000 (10:31 +0100)]
watch_queue: Fix missing rcu annotation
Since __post_watch_notification() walks wlist->watchers with only the
RCU read lock held, we need to use RCU methods to add to the list (we
already use RCU methods to remove from the list).
Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
hlist_add_head() for that list.
Fixes:
c73be61cede5 ("pipe: Add general notification queue support")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>