platform/kernel/linux-rpi.git
23 months agoACPI: PM: save NVS memory for Lenovo G40-45
Manyi Li [Wed, 22 Jun 2022 07:42:48 +0000 (15:42 +0800)]
ACPI: PM: save NVS memory for Lenovo G40-45

[ Upstream commit 4b7ef7b05afcde44142225c184bf43a0cd9e2178 ]

[821d6f0359b0614792ab8e2fb93b503e25a65079] is to make machines
produced from 2012 to now not saving NVS region to accelerate S3.

But, Lenovo G40-45, a platform released in 2015, still needs NVS memory
saving during S3. A quirk is introduced for this platform.

Signed-off-by: Manyi Li <limanyi@uniontech.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoACPI: EC: Drop the EC_FLAGS_IGNORE_DSDT_GPE quirk
Hans de Goede [Mon, 20 Jun 2022 09:25:44 +0000 (11:25 +0200)]
ACPI: EC: Drop the EC_FLAGS_IGNORE_DSDT_GPE quirk

[ Upstream commit f7090e0ef360d674f08a22fab90e4e209fb1f658 ]

It seems that these quirks are no longer necessary since
commit 69b957c26b32 ("ACPI: EC: Fix possible issues related to EC
initialization order"), which has fixed this in a generic manner.

There are 3 commits adding DMI entries with this quirk (adding multiple
DMI entries per commit). 2/3 commits are from before the generic fix.

Which leaves commit 6306f0431914 ("ACPI: EC: Make more Asus laptops
use ECDT _GPE"), which was committed way after the generic fix.
But this was just due to slow upstreaming of it. This commit stems
from Endless from 15 Aug 2017 (committed upstream 20 May 2021):
https://github.com/endlessm/linux/pull/288

The current code should work fine without this:

 1. The EC_FLAGS_IGNORE_DSDT_GPE flag is only checked in ec_parse_device(),
    like this:

if (boot_ec && boot_ec_is_ecdt && EC_FLAGS_IGNORE_DSDT_GPE) {
ec->gpe = boot_ec->gpe;
} else {
/* parse GPE */
}

 2. ec_parse_device() is only called from acpi_ec_add() and
    acpi_ec_dsdt_probe()

 3. acpi_ec_dsdt_probe() starts with:

if (boot_ec)
return;

    so it only calls ec_parse_device() when boot_ec == NULL, meaning that
    the quirk never triggers for this call. So only the call in
    acpi_ec_add() matters.

 4. acpi_ec_add() does the following after the ec_parse_device() call:

if (boot_ec && ec->command_addr == boot_ec->command_addr &&
    ec->data_addr == boot_ec->data_addr &&
    !EC_FLAGS_TRUST_DSDT_GPE) {
/*
 * Trust PNP0C09 namespace location rather than
 * ECDT ID. But trust ECDT GPE rather than _GPE
 * because of ASUS quirks, so do not change
 * boot_ec->gpe to ec->gpe.
 */
boot_ec->handle = ec->handle;
acpi_handle_debug(ec->handle, "duplicated.\n");
acpi_ec_free(ec);
ec = boot_ec;
}

The quirk only matters if boot_ec != NULL and EC_FLAGS_TRUST_DSDT_GPE
is never set at the same time as EC_FLAGS_IGNORE_DSDT_GPE.

That means that if the addresses match we always enter this if block and
then only the ec->handle part of the data stored in ec by ec_parse_device()
is used and the rest is thrown away, after which ec is made to point
to boot_ec, at which point ec->gpe == boot_ec->gpe, so the same result
as with the quirk set, independent of the value of the quirk.

Also note the comment in this block which indicates that the gpe result
from ec_parse_device() is deliberately not taken to deal with buggy
Asus laptops and all DMI quirks setting EC_FLAGS_IGNORE_DSDT_GPE are for
Asus laptops.

Based on the above I believe that unless on some quirked laptops
the ECDT and DSDT EC addresses do not match we can drop the quirk.

I've checked dmesg output to ensure the ECDT and DSDT EC addresses match
for quirked models using https://linux-hardware.org hw-probe reports.

I've been able to confirm that the addresses match for the following
models this way: GL702VMK, X505BA, X505BP, X550VXK, X580VD.
Whereas for the following models I could find any dmesg output:
FX502VD, FX502VE, X542BA, X542BP.

Note the models without dmesg all were submitted in patches with a batch
of models and other models from the same batch checkout ok.

This, combined with that all the code adding the quirks was written before
the generic fix makes me believe that it is safe to remove this quirk now.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Daniel Drake <drake@endlessos.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoACPI: EC: Remove duplicate ThinkPad X1 Carbon 6th entry from DMI quirks
Hans de Goede [Mon, 20 Jun 2022 09:25:43 +0000 (11:25 +0200)]
ACPI: EC: Remove duplicate ThinkPad X1 Carbon 6th entry from DMI quirks

[ Upstream commit 0dd6db359e5f206cbf1dd1fd40dd211588cd2725 ]

Somehow the "ThinkPad X1 Carbon 6th" entry ended up twice in the
struct dmi_system_id acpi_ec_no_wakeup[] array. Remove one of
the entries.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: OMAP2+: pdata-quirks: Fix refcount leak bug
Liang He [Sat, 18 Jun 2022 02:06:03 +0000 (10:06 +0800)]
ARM: OMAP2+: pdata-quirks: Fix refcount leak bug

[ Upstream commit 5cdbab96bab314c6f2f5e4e8b8a019181328bf5f ]

In pdata_quirks_init_clocks(), the loop contains
of_find_node_by_name() but without corresponding of_node_put().

Signed-off-by: Liang He <windhl@126.com>
Message-Id: <20220618020603.4055792-1-windhl@126.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: OMAP2+: display: Fix refcount leak bug
Liang He [Fri, 17 Jun 2022 14:58:03 +0000 (22:58 +0800)]
ARM: OMAP2+: display: Fix refcount leak bug

[ Upstream commit 50b87a32a79bca6e275918a711fb8cc55e16d739 ]

In omapdss_init_fbdev(), of_find_node_by_name() will return a node
pointer with refcount incremented. We should use of_node_put() when
it is not used anymore.

Signed-off-by: Liang He <windhl@126.com>
Message-Id: <20220617145803.4050918-1-windhl@126.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agospi: synquacer: Add missing clk_disable_unprepare()
Guo Mengqi [Fri, 24 Jun 2022 00:56:14 +0000 (08:56 +0800)]
spi: synquacer: Add missing clk_disable_unprepare()

[ Upstream commit 917e43de2a56d9b82576f1cc94748261f1988458 ]

Add missing clk_disable_unprepare() in synquacer_spi_resume().

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Guo Mengqi <guomengqi3@huawei.com>
Link: https://lore.kernel.org/r/20220624005614.49434-1-guomengqi3@huawei.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: dts: ux500: Fix Gavini accelerometer mounting matrix
Linus Walleij [Sat, 11 Jun 2022 20:51:38 +0000 (22:51 +0200)]
ARM: dts: ux500: Fix Gavini accelerometer mounting matrix

[ Upstream commit e24c75f02a81d6ddac0072cbd7a03e799c19d558 ]

This was fixed wrong so fix it. Now verified by using
iio-sensor-proxy monitor-sensor test program.

Link: https://lore.kernel.org/r/20220611205138.491513-1-linus.walleij@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: dts: ux500: Fix Codina accelerometer mounting matrix
Linus Walleij [Sat, 11 Jun 2022 20:42:49 +0000 (22:42 +0200)]
ARM: dts: ux500: Fix Codina accelerometer mounting matrix

[ Upstream commit 0b2152e428ab91533a02888ff24e52e788dc4637 ]

This was fixed wrong so fix it again. Now verified by using
iio-sensor-proxy monitor-sensor test program.

Link: https://lore.kernel.org/r/20220611204249.472250-1-linus.walleij@linaro.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: dts: BCM5301X: Add DT for Meraki MR26
Christian Lamparter [Fri, 17 Jun 2022 22:00:29 +0000 (00:00 +0200)]
ARM: dts: BCM5301X: Add DT for Meraki MR26

[ Upstream commit 935327a73553001f8d81375c76985d05f604507f ]

Meraki MR26 is an EOL wireless access point featuring a
PoE ethernet port and two dual-band 3x3 MIMO 802.11n
radios and 1x1 dual-band WIFI dedicated to scanning.

Thank you Amir for the unit and PSU.

Hardware info:
SOC   : Broadcom BCM53015A1KFEBG (dual-core Cortex-A9 CPU at 800 MHz)
RAM   : SK Hynix Inc. H5TQ1G63EFR, 1 GBit DDR3 SDRAM = 128 MiB
NAND  : Spansion S34ML01G100TF100, 1 GBit SLC NAND Flash = 128 MiB
ETH   : 1 GBit Ethernet Port - PoE (TPS23754 PoE Interface)
WIFI0 : Broadcom BCM43431KMLG, BCM43431 802.11 abgn (3x3:3)
WIFI1 : Broadcom BCM43431KMLG, BCM43431 802.11 abgn (3x3:3)
WIFI2 : Broadcom BCM43428 "Air Marshal" 802.11 abgn (1x1:1)
BUTTON: One reset key behind a small hole next to the Ethernet Port
LEDS  : One amber (fault), one white (indicator) LED, separate RGB-LED
MISC  : Atmel AT24C64 8KiB EEPROM i2c
      : Ti INA219 26V, 12-bit, i2c output current/voltage/power monitor

SERIAL:
      WARNING: The serial port needs a TTL/RS-232 3V3 level converter!
      The Serial setting is 115200-8-N-1. The board has a populated
      right angle 1x4 0.1" pinheader.
      The pinout is: VCC (next to J3, has the pin 1 indicator), RX, TX, GND.

Odd stuff:

- uboot does not support lzma compression, but gzip'd uImage/DTB work.
- uboot claims to support FIT, but fails to pass the DTB to the kernel.
  Appending the dtb after the kernel image works.
- RGB-controller is supported through an external userspace program.
- The ubi partition contains a "board-config" volume. It stores the
  MAC Address (0x66 in binary) and Serial No. (0x7c alpha-numerical).
- SoC's temperature sensor always reports that it is on fire.
  This causes the system to immediately shutdown! Looking at reported
  "418 degree Celsius" suggests that this sensor is not working.

WIFI:
b43 is able to initialize all three WIFIs @ 802.11bg.
| b43-phy0: Broadcom 43431 WLAN found (core revision 29)
| bcma-pci-bridge 0000:01:00.0: bus1: Switched to core: 0x812
| b43-phy0: Found PHY: Analog 9, Type 7 (HT), Revision 1
| b43-phy0: Found Radio: Manuf 0x17F, ID 0x2059, Revision 0, Version 1
| b43-phy0 warning: 5 GHz band is unsupported on this PHY
| b43-phy1: Broadcom 43431 WLAN found (core revision 29)
| bcma-pci-bridge 0001:01:00.0: bus2: Switched to core: 0x812
| b43-phy1: Found PHY: Analog 9, Type 7 (HT), Revision 1
| b43-phy1: Found Radio: Manuf 0x17F, ID 0x2059, Revision 0, Version 1
| b43-phy1 warning: 5 GHz band is unsupported on this PHY
| b43-phy2: Broadcom 43228 WLAN found (core revision 30)
| bcma-pci-bridge 0002:01:00.0: bus3: Switched to core: 0x812
| b43-phy2: Found PHY: Analog 9, Type 4 (N), Revision 16
| b43-phy2: Found Radio: Manuf 0x17F, ID 0x2057, Revision 9, Version 1
| Broadcom 43xx driver loaded [ Features: NL ]

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: dts: imx6ul: fix qspi node compatible
Alexander Stein [Mon, 13 Jun 2022 12:33:57 +0000 (14:33 +0200)]
ARM: dts: imx6ul: fix qspi node compatible

[ Upstream commit 0c6cf86e1ab433b2d421880fdd9c6e954f404948 ]

imx6ul is not compatible to imx6sx, both have different erratas.
Fixes the dt_binding_check warning:
spi@21e0000: compatible: 'oneOf' conditional failed, one must be fixed:
['fsl,imx6ul-qspi', 'fsl,imx6sx-qspi'] is too long
Additional items are not allowed ('fsl,imx6sx-qspi' was unexpected)
'fsl,imx6ul-qspi' is not one of ['fsl,ls1043a-qspi']
'fsl,imx6ul-qspi' is not one of ['fsl,imx8mq-qspi']
'fsl,ls1021a-qspi' was expected
'fsl,imx7d-qspi' was expected

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: dts: imx6ul: fix lcdif node compatible
Alexander Stein [Mon, 13 Jun 2022 12:33:56 +0000 (14:33 +0200)]
ARM: dts: imx6ul: fix lcdif node compatible

[ Upstream commit 1a884d17ca324531634cce82e9f64c0302bdf7de ]

In yaml binding "fsl,imx6ul-lcdif" is listed as compatible to imx6sx-lcdif,
but not imx28-lcdif. Change the list accordingly. Fixes the
dt_binding_check warning:
lcdif@21c8000: compatible: 'oneOf' conditional failed, one must be fixed:
['fsl,imx6ul-lcdif', 'fsl,imx28-lcdif'] is too long
Additional items are not allowed ('fsl,imx28-lcdif' was unexpected)
'fsl,imx6ul-lcdif' is not one of ['fsl,imx23-lcdif', 'fsl,imx28-lcdif',
'fsl,imx6sx-lcdif']
'fsl,imx6sx-lcdif' was expected

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: dts: imx6ul: fix csi node compatible
Alexander Stein [Mon, 13 Jun 2022 12:33:55 +0000 (14:33 +0200)]
ARM: dts: imx6ul: fix csi node compatible

[ Upstream commit e0aca931a2c7c29c88ebf37f9c3cd045e083483d ]

"fsl,imx6ul-csi" was never listed as compatible to "fsl,imx7-csi", neither
in yaml bindings, nor previous txt binding. Remove the imx7 part. Fixes
the dt schema check warning:
csi@21c4000: compatible: 'oneOf' conditional failed, one must be fixed:
['fsl,imx6ul-csi', 'fsl,imx7-csi'] is too long
Additional items are not allowed ('fsl,imx7-csi' was unexpected)
'fsl,imx8mm-csi' was expected

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: dts: imx6ul: fix keypad compatible
Alexander Stein [Mon, 13 Jun 2022 12:33:53 +0000 (14:33 +0200)]
ARM: dts: imx6ul: fix keypad compatible

[ Upstream commit 7d15e0c9a515494af2e3199741cdac7002928a0e ]

According to binding, the compatible shall only contain imx6ul and imx21
compatibles. Fixes the dt_binding_check warning:
keypad@20b8000: compatible: 'oneOf' conditional failed, one must be fixed:
['fsl,imx6ul-kpp', 'fsl,imx6q-kpp', 'fsl,imx21-kpp'] is too long
Additional items are not allowed ('fsl,imx6q-kpp', 'fsl,imx21-kpp' were
unexpected)
Additional items are not allowed ('fsl,imx21-kpp' was unexpected)
'fsl,imx21-kpp' was expected

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: dts: imx6ul: change operating-points to uint32-matrix
Alexander Stein [Mon, 13 Jun 2022 12:33:52 +0000 (14:33 +0200)]
ARM: dts: imx6ul: change operating-points to uint32-matrix

[ Upstream commit edb67843983bbdf61b4c8c3c50618003d38bb4ae ]

operating-points is a uint32-matrix as per opp-v1.yaml. Change it
accordingly. While at it, change fsl,soc-operating-points as well,
although there is no bindings file (yet). But they should have the same
format. Fixes the dt_binding_check warning:
cpu@0: operating-points:0: [696000, 1275000, 528000, 1175000, 396000,
1025000, 198000, 950000] is too long
cpu@0: operating-points:0: Additional items are not allowed (528000,
1175000, 396000, 1025000, 198000, 950000 were unexpected)

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoARM: dts: imx6ul: add missing properties for sram
Alexander Stein [Mon, 13 Jun 2022 12:33:51 +0000 (14:33 +0200)]
ARM: dts: imx6ul: add missing properties for sram

[ Upstream commit 5655699cf5cff9f4c4ee703792156bdd05d1addf ]

All 3 properties are required by sram.yaml. Fixes the dtbs_check
warning:
sram@900000: '#address-cells' is a required property
sram@900000: '#size-cells' is a required property
sram@900000: 'ranges' is a required property

Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agowait: Fix __wait_event_hrtimeout for RT/DL tasks
Juri Lelli [Mon, 27 Jun 2022 09:50:51 +0000 (11:50 +0200)]
wait: Fix __wait_event_hrtimeout for RT/DL tasks

[ Upstream commit cceeeb6a6d02e7b9a74ddd27a3225013b34174aa ]

Changes to hrtimer mode (potentially made by __hrtimer_init_sleeper on
PREEMPT_RT) are not visible to hrtimer_start_range_ns, thus not
accounted for by hrtimer_start_expires call paths. In particular,
__wait_event_hrtimeout suffers from this problem as we have, for
example:

fs/aio.c::read_events
  wait_event_interruptible_hrtimeout
    __wait_event_hrtimeout
      hrtimer_init_sleeper_on_stack <- this might "mode |= HRTIMER_MODE_HARD"
                                       on RT if task runs at RT/DL priority
        hrtimer_start_range_ns
          WARN_ON_ONCE(!(mode & HRTIMER_MODE_HARD) ^ !timer->is_hard)
          fires since the latter doesn't see the change of mode done by
          init_sleeper

Fix it by making __wait_event_hrtimeout call hrtimer_sleeper_start_expires,
which is aware of the special RT/DL case, instead of hrtimer_start_range_ns.

Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20220627095051.42470-1-juri.lelli@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoirqchip/mips-gic: Check the return value of ioremap() in gic_of_init()
William Dean [Sat, 23 Jul 2022 10:01:28 +0000 (18:01 +0800)]
irqchip/mips-gic: Check the return value of ioremap() in gic_of_init()

[ Upstream commit 71349cc85e5930dce78ed87084dee098eba24b59 ]

The function ioremap() in gic_of_init() can fail, so
its return value should be checked.

Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: William Dean <williamsukatube@163.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220723100128.2964304-1-williamsukatube@163.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agosched/core: Always flush pending blk_plug
John Keeping [Fri, 8 Jul 2022 16:27:02 +0000 (17:27 +0100)]
sched/core: Always flush pending blk_plug

[ Upstream commit 401e4963bf45c800e3e9ea0d3a0289d738005fd4 ]

With CONFIG_PREEMPT_RT, it is possible to hit a deadlock between two
normal priority tasks (SCHED_OTHER, nice level zero):

INFO: task kworker/u8:0:8 blocked for more than 491 seconds.
      Not tainted 5.15.49-rt46 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:0    state:D stack:    0 pid:    8 ppid:     2 flags:0x00000000
Workqueue: writeback wb_workfn (flush-7:0)
[<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134)
[<c08a3d84>] (schedule) from [<c08a65a0>] (rt_mutex_slowlock_block.constprop.0+0xb8/0x174)
[<c08a65a0>] (rt_mutex_slowlock_block.constprop.0) from [<c08a6708>]
+(rt_mutex_slowlock.constprop.0+0xac/0x174)
[<c08a6708>] (rt_mutex_slowlock.constprop.0) from [<c0374d60>] (fat_write_inode+0x34/0x54)
[<c0374d60>] (fat_write_inode) from [<c0297304>] (__writeback_single_inode+0x354/0x3ec)
[<c0297304>] (__writeback_single_inode) from [<c0297998>] (writeback_sb_inodes+0x250/0x45c)
[<c0297998>] (writeback_sb_inodes) from [<c0297c20>] (__writeback_inodes_wb+0x7c/0xb8)
[<c0297c20>] (__writeback_inodes_wb) from [<c0297f24>] (wb_writeback+0x2c8/0x2e4)
[<c0297f24>] (wb_writeback) from [<c0298c40>] (wb_workfn+0x1a4/0x3e4)
[<c0298c40>] (wb_workfn) from [<c0138ab8>] (process_one_work+0x1fc/0x32c)
[<c0138ab8>] (process_one_work) from [<c0139120>] (worker_thread+0x22c/0x2d8)
[<c0139120>] (worker_thread) from [<c013e6e0>] (kthread+0x16c/0x178)
[<c013e6e0>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38)
Exception stack(0xc10e3fb0 to 0xc10e3ff8)
3fa0:                                     00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000

INFO: task tar:2083 blocked for more than 491 seconds.
      Not tainted 5.15.49-rt46 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:tar             state:D stack:    0 pid: 2083 ppid:  2082 flags:0x00000000
[<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134)
[<c08a3d84>] (schedule) from [<c08a41b0>] (io_schedule+0x14/0x24)
[<c08a41b0>] (io_schedule) from [<c08a455c>] (bit_wait_io+0xc/0x30)
[<c08a455c>] (bit_wait_io) from [<c08a441c>] (__wait_on_bit_lock+0x54/0xa8)
[<c08a441c>] (__wait_on_bit_lock) from [<c08a44f4>] (out_of_line_wait_on_bit_lock+0x84/0xb0)
[<c08a44f4>] (out_of_line_wait_on_bit_lock) from [<c0371fb0>] (fat_mirror_bhs+0xa0/0x144)
[<c0371fb0>] (fat_mirror_bhs) from [<c0372a68>] (fat_alloc_clusters+0x138/0x2a4)
[<c0372a68>] (fat_alloc_clusters) from [<c0370b14>] (fat_alloc_new_dir+0x34/0x250)
[<c0370b14>] (fat_alloc_new_dir) from [<c03787c0>] (vfat_mkdir+0x58/0x148)
[<c03787c0>] (vfat_mkdir) from [<c0277b60>] (vfs_mkdir+0x68/0x98)
[<c0277b60>] (vfs_mkdir) from [<c027b484>] (do_mkdirat+0xb0/0xec)
[<c027b484>] (do_mkdirat) from [<c0100060>] (ret_fast_syscall+0x0/0x1c)
Exception stack(0xc2e1bfa8 to 0xc2e1bff0)
bfa0:                   01ee42f0 01ee4208 01ee42f0 000041ed 00000000 00004000
bfc0: 01ee42f0 01ee4208 00000000 00000027 01ee4302 00000004 000dcb00 01ee4190
bfe0: 000dc368 bed11924 0006d4b0 b6ebddfc

Here the kworker is waiting on msdos_sb_info::s_lock which is held by
tar which is in turn waiting for a buffer which is locked waiting to be
flushed, but this operation is plugged in the kworker.

The lock is a normal struct mutex, so tsk_is_pi_blocked() will always
return false on !RT and thus the behaviour changes for RT.

It seems that the intent here is to skip blk_flush_plug() in the case
where a non-preemptible lock (such as a spinlock) has been converted to
a rtmutex on RT, which is the case covered by the SM_RTLOCK_WAIT
schedule flag.  But sched_submit_work() is only called from schedule()
which is never called in this scenario, so the check can simply be
deleted.

Looking at the history of the -rt patchset, in fact this change was
present from v5.9.1-rt20 until being dropped in v5.13-rt1 as it was part
of a larger patch [1] most of which was replaced by commit b4bfa3fcfe3b
("sched/core: Rework the __schedule() preempt argument").

As described in [1]:

   The schedule process must distinguish between blocking on a regular
   sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock
   and rwlock):
   - rwsem and mutex must flush block requests (blk_schedule_flush_plug())
     even if blocked on a lock. This can not deadlock because this also
     happens for non-RT.
     There should be a warning if the scheduling point is within a RCU read
     section.

   - spinlock and rwlock must not flush block requests. This will deadlock
     if the callback attempts to acquire a lock which is already acquired.
     Similarly to being preempted, there should be no warning if the
     scheduling point is within a RCU read section.

and with the tsk_is_pi_blocked() in the scheduler path, we hit the first
issue.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0022-locking-rtmutex-Use-custom-scheduling-function-for-s.patch?h=linux-5.10.y-rt-patches

Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20220708162702.1758865-1-john@metanate.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agogenirq: GENERIC_IRQ_IPI depends on SMP
Samuel Holland [Fri, 1 Jul 2022 20:00:50 +0000 (15:00 -0500)]
genirq: GENERIC_IRQ_IPI depends on SMP

[ Upstream commit 0f5209fee90b4544c58b4278d944425292789967 ]

The generic IPI code depends on the IRQ affinity mask being allocated
and initialized. This will not be the case if SMP is disabled. Fix up
the remaining driver that selected GENERIC_IRQ_IPI in a non-SMP config.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220701200056.46555-3-samuel@sholland.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoirqchip/mips-gic: Only register IPI domain when SMP is enabled
Samuel Holland [Fri, 1 Jul 2022 20:00:49 +0000 (15:00 -0500)]
irqchip/mips-gic: Only register IPI domain when SMP is enabled

[ Upstream commit 8190cc572981f2f13b6ffc26c7cfa7899e5d3ccc ]

The MIPS GIC irqchip driver may be selected in a uniprocessor
configuration, but it unconditionally registers an IPI domain.

Limit the part of the driver dealing with IPIs to only be compiled when
GENERIC_IRQ_IPI is enabled, which corresponds to an SMP configuration.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220701200056.46555-2-samuel@sholland.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agogenirq: Don't return error on missing optional irq_request_resources()
Antonio Borneo [Thu, 12 May 2022 16:05:44 +0000 (18:05 +0200)]
genirq: Don't return error on missing optional irq_request_resources()

[ Upstream commit 95001b756467ecc9f5973eb5e74e97699d9bbdf1 ]

Function irq_chip::irq_request_resources() is reported as optional
in the declaration of struct irq_chip.
If the parent irq_chip does not implement it, we should ignore it
and return.

Don't return error if the functions is missing.

Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220512160544.13561-1-antonio.borneo@foss.st.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agosched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg
Chen Yu [Sun, 12 Jun 2022 16:34:28 +0000 (00:34 +0800)]
sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg

[ Upstream commit 70fb5ccf2ebb09a0c8ebba775041567812d45f86 ]

[Problem Statement]
select_idle_cpu() might spend too much time searching for an idle CPU,
when the system is overloaded.

The following histogram is the time spent in select_idle_cpu(),
when running 224 instances of netperf on a system with 112 CPUs
per LLC domain:

@usecs:
[0]                  533 |                                                    |
[1]                 5495 |                                                    |
[2, 4)             12008 |                                                    |
[4, 8)            239252 |                                                    |
[8, 16)          4041924 |@@@@@@@@@@@@@@                                      |
[16, 32)        12357398 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         |
[32, 64)        14820255 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128)       13047682 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       |
[128, 256)       8235013 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@                        |
[256, 512)       4507667 |@@@@@@@@@@@@@@@                                     |
[512, 1K)        2600472 |@@@@@@@@@                                           |
[1K, 2K)          927912 |@@@                                                 |
[2K, 4K)          218720 |                                                    |
[4K, 8K)           98161 |                                                    |
[8K, 16K)          37722 |                                                    |
[16K, 32K)          6715 |                                                    |
[32K, 64K)           477 |                                                    |
[64K, 128K)            7 |                                                    |

netperf latency usecs:
=======
case             load         Lat_99th     std%
TCP_RR           thread-224       257.39 (  0.21)

The time spent in select_idle_cpu() is visible to netperf and might have a negative
impact.

[Symptom analysis]
The patch [1] from Mel Gorman has been applied to track the efficiency
of select_idle_sibling. Copy the indicators here:

SIS Search Efficiency(se_eff%):
        A ratio expressed as a percentage of runqueues scanned versus
        idle CPUs found. A 100% efficiency indicates that the target,
        prev or recent CPU of a task was idle at wakeup. The lower the
        efficiency, the more runqueues were scanned before an idle CPU
        was found.

SIS Domain Search Efficiency(dom_eff%):
        Similar, except only for the slower SIS
patch.

SIS Fast Success Rate(fast_rate%):
        Percentage of SIS that used target, prev or
recent CPUs.

SIS Success rate(success_rate%):
        Percentage of scans that found an idle CPU.

The test is based on Aubrey's schedtests tool, including netperf, hackbench,
schbench and tbench.

Test on vanilla kernel:
schedstat_parse.py -f netperf_vanilla.log
case         load     se_eff%     dom_eff%   fast_rate% success_rate%
TCP_RR    28 threads      99.978       18.535       99.995      100.000
TCP_RR    56 threads      99.397        5.671       99.964      100.000
TCP_RR    84 threads      21.721        6.818       73.632      100.000
TCP_RR   112 threads      12.500        5.533       59.000      100.000
TCP_RR   140 threads       8.524        4.535       49.020      100.000
TCP_RR   168 threads       6.438        3.945       40.309       99.999
TCP_RR   196 threads       5.397        3.718       32.320       99.982
TCP_RR   224 threads       4.874        3.661       25.775       99.767
UDP_RR    28 threads      99.988       17.704       99.997      100.000
UDP_RR    56 threads      99.528        5.977       99.970      100.000
UDP_RR    84 threads      24.219        6.992       76.479      100.000
UDP_RR   112 threads      13.907        5.706       62.538      100.000
UDP_RR   140 threads       9.408        4.699       52.519      100.000
UDP_RR   168 threads       7.095        4.077       44.352      100.000
UDP_RR   196 threads       5.757        3.775       35.764       99.991
UDP_RR   224 threads       5.124        3.704       28.748       99.860

schedstat_parse.py -f schbench_vanilla.log
(each group has 28 tasks)
case         load     se_eff%     dom_eff%   fast_rate% success_rate%
normal    1   mthread      99.152        6.400       99.941      100.000
normal    2   mthreads      97.844        4.003       99.908      100.000
normal    3   mthreads      96.395        2.118       99.917       99.998
normal    4   mthreads      55.288        1.451       98.615       99.804
normal    5   mthreads       7.004        1.870       45.597       61.036
normal    6   mthreads       3.354        1.346       20.777       34.230
normal    7   mthreads       2.183        1.028       11.257       21.055
normal    8   mthreads       1.653        0.825        7.849       15.549

schedstat_parse.py -f hackbench_vanilla.log
(each group has 28 tasks)
case load         se_eff%     dom_eff%   fast_rate% success_rate%
process-pipe      1 group          99.991        7.692       99.999      100.000
process-pipe     2 groups          99.934        4.615       99.997      100.000
process-pipe     3 groups          99.597        3.198       99.987      100.000
process-pipe     4 groups          98.378        2.464       99.958      100.000
process-pipe     5 groups          27.474        3.653       89.811       99.800
process-pipe     6 groups          20.201        4.098       82.763       99.570
process-pipe     7 groups          16.423        4.156       77.398       99.316
process-pipe     8 groups          13.165        3.920       72.232       98.828
process-sockets      1 group          99.977        5.882       99.999      100.000
process-sockets     2 groups          99.927        5.505       99.996      100.000
process-sockets     3 groups          99.397        3.250       99.980      100.000
process-sockets     4 groups          79.680        4.258       98.864       99.998
process-sockets     5 groups           7.673        2.503       63.659       92.115
process-sockets     6 groups           4.642        1.584       58.946       88.048
process-sockets     7 groups           3.493        1.379       49.816       81.164
process-sockets     8 groups           3.015        1.407       40.845       75.500
threads-pipe      1 group          99.997        0.000      100.000      100.000
threads-pipe     2 groups          99.894        2.932       99.997      100.000
threads-pipe     3 groups          99.611        4.117       99.983      100.000
threads-pipe     4 groups          97.703        2.624       99.937      100.000
threads-pipe     5 groups          22.919        3.623       87.150       99.764
threads-pipe     6 groups          18.016        4.038       80.491       99.557
threads-pipe     7 groups          14.663        3.991       75.239       99.247
threads-pipe     8 groups          12.242        3.808       70.651       98.644
threads-sockets      1 group          99.990        6.667       99.999      100.000
threads-sockets     2 groups          99.940        5.114       99.997      100.000
threads-sockets     3 groups          99.469        4.115       99.977      100.000
threads-sockets     4 groups          87.528        4.038       99.400      100.000
threads-sockets     5 groups           6.942        2.398       59.244       88.337
threads-sockets     6 groups           4.359        1.954       49.448       87.860
threads-sockets     7 groups           2.845        1.345       41.198       77.102
threads-sockets     8 groups           2.871        1.404       38.512       74.312

schedstat_parse.py -f tbench_vanilla.log
case load       se_eff%     dom_eff%   fast_rate% success_rate%
loopback   28 threads        99.976       18.369       99.995      100.000
loopback   56 threads        99.222        7.799       99.934      100.000
loopback   84 threads        19.723        6.819       70.215      100.000
loopback  112 threads        11.283        5.371       55.371       99.999
loopback  140 threads         0.000        0.000        0.000        0.000
loopback  168 threads         0.000        0.000        0.000        0.000
loopback  196 threads         0.000        0.000        0.000        0.000
loopback  224 threads         0.000        0.000        0.000        0.000

According to the test above, if the system becomes busy, the
SIS Search Efficiency(se_eff%) drops significantly. Although some
benchmarks would finally find an idle CPU(success_rate% = 100%), it is
doubtful whether it is worth it to search the whole LLC domain.

[Proposal]
It would be ideal to have a crystal ball to answer this question:
How many CPUs must a wakeup path walk down, before it can find an idle
CPU? Many potential metrics could be used to predict the number.
One candidate is the sum of util_avg in this LLC domain. The benefit
of choosing util_avg is that it is a metric of accumulated historic
activity, which seems to be smoother than instantaneous metrics
(such as rq->nr_running). Besides, choosing the sum of util_avg
would help predict the load of the LLC domain more precisely, because
SIS_PROP uses one CPU's idle time to estimate the total LLC domain idle
time.

In summary, the lower the util_avg is, the more select_idle_cpu()
should scan for idle CPU, and vice versa. When the sum of util_avg
in this LLC domain hits 85% or above, the scan stops. The reason to
choose 85% as the threshold is that this is the imbalance_pct(117)
when a LLC sched group is overloaded.

Introduce the quadratic function:

y = SCHED_CAPACITY_SCALE - p * x^2
and y'= y / SCHED_CAPACITY_SCALE

x is the ratio of sum_util compared to the CPU capacity:
x = sum_util / (llc_weight * SCHED_CAPACITY_SCALE)
y' is the ratio of CPUs to be scanned in the LLC domain,
and the number of CPUs to scan is calculated by:

nr_scan = llc_weight * y'

Choosing quadratic function is because:
[1] Compared to the linear function, it scans more aggressively when the
    sum_util is low.
[2] Compared to the exponential function, it is easier to calculate.
[3] It seems that there is no accurate mapping between the sum of util_avg
    and the number of CPUs to be scanned. Use heuristic scan for now.

For a platform with 112 CPUs per LLC, the number of CPUs to scan is:
sum_util%   0    5   15   25  35  45  55   65   75   85   86 ...
scan_nr   112  111  108  102  93  81  65   47   25    1    0 ...

For a platform with 16 CPUs per LLC, the number of CPUs to scan is:
sum_util%   0    5   15   25  35  45  55   65   75   85   86 ...
scan_nr    16   15   15   14  13  11   9    6    3    0    0 ...

Furthermore, to minimize the overhead of calculating the metrics in
select_idle_cpu(), borrow the statistics from periodic load balance.
As mentioned by Abel, on a platform with 112 CPUs per LLC, the
sum_util calculated by periodic load balance after 112 ms would
decay to about 0.5 * 0.5 * 0.5 * 0.7 = 8.75%, thus bringing a delay
in reflecting the latest utilization. But it is a trade-off.
Checking the util_avg in newidle load balance would be more frequent,
but it brings overhead - multiple CPUs write/read the per-LLC shared
variable and introduces cache contention. Tim also mentioned that,
it is allowed to be non-optimal in terms of scheduling for the
short-term variations, but if there is a long-term trend in the load
behavior, the scheduler can adjust for that.

When SIS_UTIL is enabled, the select_idle_cpu() uses the nr_scan
calculated by SIS_UTIL instead of the one from SIS_PROP. As Peter and
Mel suggested, SIS_UTIL should be enabled by default.

This patch is based on the util_avg, which is very sensitive to the
CPU frequency invariance. There is an issue that, when the max frequency
has been clamp, the util_avg would decay insanely fast when
the CPU is idle. Commit addca285120b ("cpufreq: intel_pstate: Handle no_turbo
in frequency invariance") could be used to mitigate this symptom, by adjusting
the arch_max_freq_ratio when turbo is disabled. But this issue is still
not thoroughly fixed, because the current code is unaware of the user-specified
max CPU frequency.

[Test result]

netperf and tbench were launched with 25% 50% 75% 100% 125% 150%
175% 200% of CPU number respectively. Hackbench and schbench were launched
by 1, 2 ,4, 8 groups. Each test lasts for 100 seconds and repeats 3 times.

The following is the benchmark result comparison between
baseline:vanilla v5.19-rc1 and compare:patched kernel. Positive compare%
indicates better performance.

Each netperf test is a:
netperf -4 -H 127.0.1 -t TCP/UDP_RR -c -C -l 100
netperf.throughput
=======
case             load     baseline(std%) compare%( std%)
TCP_RR           28 threads  1.00 (  0.34)  -0.16 (  0.40)
TCP_RR           56 threads  1.00 (  0.19)  -0.02 (  0.20)
TCP_RR           84 threads  1.00 (  0.39)  -0.47 (  0.40)
TCP_RR           112 threads  1.00 (  0.21)  -0.66 (  0.22)
TCP_RR           140 threads  1.00 (  0.19)  -0.69 (  0.19)
TCP_RR           168 threads  1.00 (  0.18)  -0.48 (  0.18)
TCP_RR           196 threads  1.00 (  0.16) +194.70 ( 16.43)
TCP_RR           224 threads  1.00 (  0.16) +197.30 (  7.85)
UDP_RR           28 threads  1.00 (  0.37)  +0.35 (  0.33)
UDP_RR           56 threads  1.00 ( 11.18)  -0.32 (  0.21)
UDP_RR           84 threads  1.00 (  1.46)  -0.98 (  0.32)
UDP_RR           112 threads  1.00 ( 28.85)  -2.48 ( 19.61)
UDP_RR           140 threads  1.00 (  0.70)  -0.71 ( 14.04)
UDP_RR           168 threads  1.00 ( 14.33)  -0.26 ( 11.16)
UDP_RR           196 threads  1.00 ( 12.92) +186.92 ( 20.93)
UDP_RR           224 threads  1.00 ( 11.74) +196.79 ( 18.62)

Take the 224 threads as an example, the SIS search metrics changes are
illustrated below:

    vanilla                    patched
   4544492          +237.5%   15338634        sched_debug.cpu.sis_domain_search.avg
     38539        +39686.8%   15333634        sched_debug.cpu.sis_failed.avg
  128300000          -87.9%   15551326        sched_debug.cpu.sis_scanned.avg
   5842896          +162.7%   15347978        sched_debug.cpu.sis_search.avg

There is -87.9% less CPU scans after patched, which indicates lower overhead.
Besides, with this patch applied, there is -13% less rq lock contention
in perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested
.try_to_wake_up.default_wake_function.woken_wake_function.
This might help explain the performance improvement - Because this patch allows
the waking task to remain on the previous CPU, rather than grabbing other CPUs'
lock.

Each hackbench test is a:
hackbench -g $job --process/threads --pipe/sockets -l 1000000 -s 100
hackbench.throughput
=========
case             load     baseline(std%) compare%( std%)
process-pipe     1 group   1.00 (  1.29)  +0.57 (  0.47)
process-pipe     2 groups   1.00 (  0.27)  +0.77 (  0.81)
process-pipe     4 groups   1.00 (  0.26)  +1.17 (  0.02)
process-pipe     8 groups   1.00 (  0.15)  -4.79 (  0.02)
process-sockets  1 group   1.00 (  0.63)  -0.92 (  0.13)
process-sockets  2 groups   1.00 (  0.03)  -0.83 (  0.14)
process-sockets  4 groups   1.00 (  0.40)  +5.20 (  0.26)
process-sockets  8 groups   1.00 (  0.04)  +3.52 (  0.03)
threads-pipe     1 group   1.00 (  1.28)  +0.07 (  0.14)
threads-pipe     2 groups   1.00 (  0.22)  -0.49 (  0.74)
threads-pipe     4 groups   1.00 (  0.05)  +1.88 (  0.13)
threads-pipe     8 groups   1.00 (  0.09)  -4.90 (  0.06)
threads-sockets  1 group   1.00 (  0.25)  -0.70 (  0.53)
threads-sockets  2 groups   1.00 (  0.10)  -0.63 (  0.26)
threads-sockets  4 groups   1.00 (  0.19) +11.92 (  0.24)
threads-sockets  8 groups   1.00 (  0.08)  +4.31 (  0.11)

Each tbench test is a:
tbench -t 100 $job 127.0.0.1
tbench.throughput
======
case             load     baseline(std%) compare%( std%)
loopback         28 threads  1.00 (  0.06)  -0.14 (  0.09)
loopback         56 threads  1.00 (  0.03)  -0.04 (  0.17)
loopback         84 threads  1.00 (  0.05)  +0.36 (  0.13)
loopback         112 threads  1.00 (  0.03)  +0.51 (  0.03)
loopback         140 threads  1.00 (  0.02)  -1.67 (  0.19)
loopback         168 threads  1.00 (  0.38)  +1.27 (  0.27)
loopback         196 threads  1.00 (  0.11)  +1.34 (  0.17)
loopback         224 threads  1.00 (  0.11)  +1.67 (  0.22)

Each schbench test is a:
schbench -m $job -t 28 -r 100 -s 30000 -c 30000
schbench.latency_90%_us
========
case             load     baseline(std%) compare%( std%)
normal           1 mthread  1.00 ( 31.22)  -7.36 ( 20.25)*
normal           2 mthreads  1.00 (  2.45)  -0.48 (  1.79)
normal           4 mthreads  1.00 (  1.69)  +0.45 (  0.64)
normal           8 mthreads  1.00 (  5.47)  +9.81 ( 14.28)

*Consider the Standard Deviation, this -7.36% regression might not be valid.

Also, a OLTP workload with a commercial RDBMS has been tested, and there
is no significant change.

There were concerns that unbalanced tasks among CPUs would cause problems.
For example, suppose the LLC domain is composed of 8 CPUs, and 7 tasks are
bound to CPU0~CPU6, while CPU7 is idle:

          CPU0    CPU1    CPU2    CPU3    CPU4    CPU5    CPU6    CPU7
util_avg  1024    1024    1024    1024    1024    1024    1024    0

Since the util_avg ratio is 87.5%( = 7/8 ), which is higher than 85%,
select_idle_cpu() will not scan, thus CPU7 is undetected during scan.
But according to Mel, it is unlikely the CPU7 will be idle all the time
because CPU7 could pull some tasks via CPU_NEWLY_IDLE.

lkp(kernel test robot) has reported a regression on stress-ng.sock on a
very busy system. According to the sched_debug statistics, it might be caused
by SIS_UTIL terminates the scan and chooses a previous CPU earlier, and this
might introduce more context switch, especially involuntary preemption, which
impacts a busy stress-ng. This regression has shown that, not all benchmarks
in every scenario benefit from idle CPU scan limit, and it needs further
investigation.

Besides, there is slight regression in hackbench's 16 groups case when the
LLC domain has 16 CPUs. Prateek mentioned that we should scan aggressively
in an LLC domain with 16 CPUs. Because the cost to search for an idle one
among 16 CPUs is negligible. The current patch aims to propose a generic
solution and only considers the util_avg. Something like the below could
be applied on top of the current patch to fulfill the requirement:

if (llc_weight <= 16)
nr_scan = nr_scan * 32 / llc_weight;

For LLC domain with 16 CPUs, the nr_scan will be expanded to 2 times large.
The smaller the CPU number this LLC domain has, the larger nr_scan will be
expanded. This needs further investigation.

There is also ongoing work[2] from Abel to filter out the busy CPUs during
wakeup, to further speed up the idle CPU scan. And it could be a following-up
optimization on top of this change.

Suggested-by: Tim Chen <tim.c.chen@intel.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yicong Yang <yangyicong@hisilicon.com>
Tested-by: Mohini Narkhede <mohini.narkhede@intel.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lore.kernel.org/r/20220612163428.849378-1-yu.c.chen@intel.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoext2: Add more validity checks for inode counts
Jan Kara [Tue, 26 Jul 2022 11:13:50 +0000 (13:13 +0200)]
ext2: Add more validity checks for inode counts

[ Upstream commit fa78f336937240d1bc598db817d638086060e7e9 ]

Add checks verifying number of inodes stored in the superblock matches
the number computed from number of inodes per group. Also verify we have
at least one block worth of inodes per group. This prevents crashes on
corrupted filesystems.

Reported-by: syzbot+d273f7d7f58afd93be48@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoarm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"
Catalin Marinas [Fri, 10 Jun 2022 15:21:41 +0000 (16:21 +0100)]
arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"

[ Upstream commit 20794545c14692094a882d2221c251c4573e6adf ]

This reverts commit e5b8d9218951e59df986f627ec93569a0d22149b.

Pages mapped in user-space with PROT_MTE have the allocation tags either
zeroed or copied/restored to some user values. In order for the kernel
to access such pages via page_address(), resetting the tag in
page->flags was necessary. This tag resetting was deferred to
set_pte_at() -> mte_sync_page_tags() but it can race with another CPU
reading the flags (via page_to_virt()):

P0 (mte_sync_page_tags): P1 (memcpy from virt_to_page):
  Rflags!=0xff
  Wflags=0xff
  DMB (doesn't help)
  Wtags=0
  Rtags=0   // fault

Since now the post_alloc_hook() function resets the page->flags tag when
unpoisoning is skipped for user pages (including the __GFP_ZEROTAGS
case), revert the arm64 commit calling page_kasan_tag_reset().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Peter Collingbourne <pcc@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20220610152141.2148929-5-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoarm64: fix oops in concurrently setting insn_emulation sysctls
haibinzhang (张海斌) [Sat, 2 Jul 2022 05:43:19 +0000 (05:43 +0000)]
arm64: fix oops in concurrently setting insn_emulation sysctls

[ Upstream commit af483947d472eccb79e42059276c4deed76f99a6 ]

emulation_proc_handler() changes table->data for proc_dointvec_minmax
and can generate the following Oops if called concurrently with itself:

 | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
 | Internal error: Oops: 96000006 [#1] SMP
 | Call trace:
 | update_insn_emulation_mode+0xc0/0x148
 | emulation_proc_handler+0x64/0xb8
 | proc_sys_call_handler+0x9c/0xf8
 | proc_sys_write+0x18/0x20
 | __vfs_write+0x20/0x48
 | vfs_write+0xe4/0x1d0
 | ksys_write+0x70/0xf8
 | __arm64_sys_write+0x20/0x28
 | el0_svc_common.constprop.0+0x7c/0x1c0
 | el0_svc_handler+0x2c/0xa0
 | el0_svc+0x8/0x200

To fix this issue, keep the table->data as &insn->current_mode and
use container_of() to retrieve the insn pointer. Another mutex is
used to protect against the current_mode update but not for retrieving
insn_emulation as table->data is no longer changing.

Co-developed-by: hewenliang <hewenliang4@huawei.com>
Signed-off-by: hewenliang <hewenliang4@huawei.com>
Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20220128090324.2727688-1-hewenliang4@huawei.com
Link: https://lore.kernel.org/r/9A004C03-250B-46C5-BF39-782D7551B00E@tencent.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoarm64: Do not forget syscall when starting a new thread.
Francis Laniel [Wed, 8 Jun 2022 16:24:46 +0000 (17:24 +0100)]
arm64: Do not forget syscall when starting a new thread.

[ Upstream commit de6921856f99c11d3986c6702d851e1328d4f7f6 ]

Enable tracing of the execve*() system calls with the
syscalls:sys_exit_execve tracepoint by removing the call to
forget_syscall() when starting a new thread and preserving the value of
regs->syscallno across exec.

Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
Link: https://lore.kernel.org/r/20220608162447.666494-2-flaniel@linux.microsoft.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoarch: make TRACE_IRQFLAGS_NMI_SUPPORT generic
Mark Rutland [Wed, 11 May 2022 13:17:32 +0000 (14:17 +0100)]
arch: make TRACE_IRQFLAGS_NMI_SUPPORT generic

[ Upstream commit 4510bffb4d0246cdcc1f14c7367c026b807a862d ]

On most architectures, IRQ flag tracing is disabled in NMI context, and
architectures need to define and select TRACE_IRQFLAGS_NMI_SUPPORT in
order to enable this.

Commit:

  859d069ee1ddd878 ("lockdep: Prepare for NMI IRQ state tracking")

Permitted IRQ flag tracing in NMI context, allowing lockdep to work in
NMI context where an architecture had suitable entry logic. At the time,
most architectures did not have such suitable entry logic, and this broke
lockdep on such architectures. Thus, this was partially disabled in
commit:

  ed00495333ccc80f ("locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs")

... with architectures needing to select TRACE_IRQFLAGS_NMI_SUPPORT to
enable IRQ flag tracing in NMI context.

Currently TRACE_IRQFLAGS_NMI_SUPPORT is defined under
arch/x86/Kconfig.debug. Move it to arch/Kconfig so architectures can
select it without having to provide their own definition.

Since the regular TRACE_IRQFLAGS_SUPPORT is selected by
arch/x86/Kconfig, the select of TRACE_IRQFLAGS_NMI_SUPPORT is moved
there too.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220511131733.4074499-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agox86: Handle idle=nomwait cmdline properly for x86_idle
Wyes Karny [Mon, 6 Jun 2022 18:03:34 +0000 (23:33 +0530)]
x86: Handle idle=nomwait cmdline properly for x86_idle

[ Upstream commit 8bcedb4ce04750e1ccc9a6b6433387f6a9166a56 ]

When kernel is booted with idle=nomwait do not use MWAIT as the
default idle state.

If the user boots the kernel with idle=nomwait, it is a clear
direction to not use mwait as the default idle state.
However, the current code does not take this into consideration
while selecting the default idle state on x86.

Fix it by checking for the idle=nomwait boot option in
prefer_mwait_c1_over_halt().

Also update the documentation around idle=nomwait appropriately.

[ dhansen: tweak commit message ]

Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lkml.kernel.org/r/fdc2dc2d0a1bc21c2f53d989ea2d2ee3ccbc0dbe.1654538381.git-series.wyes.karny@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
23 months agoepoll: autoremove wakers even more aggressively
Benjamin Segall [Wed, 15 Jun 2022 21:24:23 +0000 (14:24 -0700)]
epoll: autoremove wakers even more aggressively

commit a16ceb13961068f7209e34d7984f8e42d2c06159 upstream.

If a process is killed or otherwise exits while having active network
connections and many threads waiting on epoll_wait, the threads will all
be woken immediately, but not removed from ep->wq.  Then when network
traffic scans ep->wq in wake_up, every wakeup attempt will fail, and will
not remove the entries from the list.

This means that the cost of the wakeup attempt is far higher than usual,
does not decrease, and this also competes with the dying threads trying to
actually make progress and remove themselves from the wq.

Handle this by removing visited epoll wq entries unconditionally, rather
than only when the wakeup succeeds - the structure of ep_poll means that
the only potential loss is the timed_out->eavail heuristic, which now can
race and result in a redundant ep_send_events attempt.  (But only when
incoming data and a timeout actually race, not on every timeout)

Shakeel added:

: We are seeing this issue in production with real workloads and it has
: caused hard lockups.  Particularly network heavy workloads with a lot
: of threads in epoll_wait() can easily trigger this issue if they get
: killed (oom-killed in our case).

Link: https://lkml.kernel.org/r/xm26fsjotqda.fsf@google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Roman Penyaev <rpenyaev@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Heiher <r@hev.cc>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agonetfilter: nf_tables: fix null deref due to zeroed list head
Florian Westphal [Tue, 9 Aug 2022 16:34:02 +0000 (18:34 +0200)]
netfilter: nf_tables: fix null deref due to zeroed list head

commit 580077855a40741cf511766129702d97ff02f4d9 upstream.

In nf_tables_updtable, if nf_tables_table_enable returns an error,
nft_trans_destroy is called to free the transaction object.

nft_trans_destroy() calls list_del(), but the transaction was never
placed on a list -- the list head is all zeroes, this results in
a null dereference:

BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59
Call Trace:
 nft_trans_destroy+0x26/0x59
 nf_tables_newtable+0x4bc/0x9bc
 [..]

Its sane to assume that nft_trans_destroy() can be called
on the transaction object returned by nft_trans_alloc(), so
make sure the list head is initialised.

Fixes: 55dd6f93076b ("netfilter: nf_tables: use new transaction infrastructure to handle table")
Reported-by: mingi cho <mgcho.minic@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agonetfilter: nf_tables: do not allow RULE_ID to refer to another chain
Thadeu Lima de Souza Cascardo [Tue, 9 Aug 2022 17:01:48 +0000 (14:01 -0300)]
netfilter: nf_tables: do not allow RULE_ID to refer to another chain

commit 36d5b2913219ac853908b0f1c664345e04313856 upstream.

When doing lookups for rules on the same batch by using its ID, a rule from
a different chain can be used. If a rule is added to a chain but tries to
be positioned next to a rule from a different chain, it will be linked to
chain2, but the use counter on chain1 would be the one to be incremented.

When looking for rules by ID, use the chain that was used for the lookup by
name. The chain used in the context copied to the transaction needs to
match that same chain. That way, struct nft_rule does not need to get
enlarged with another member.

Fixes: 1a94e38d254b ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
Fixes: 75dd48e2e420 ("netfilter: nf_tables: Support RULE_ID reference in new rule")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agonetfilter: nf_tables: do not allow CHAIN_ID to refer to another table
Thadeu Lima de Souza Cascardo [Tue, 9 Aug 2022 17:01:47 +0000 (14:01 -0300)]
netfilter: nf_tables: do not allow CHAIN_ID to refer to another table

commit 95f466d22364a33d183509629d0879885b4f547e upstream.

When doing lookups for chains on the same batch by using its ID, a chain
from a different table can be used. If a rule is added to a table but
refers to a chain in a different table, it will be linked to the chain in
table2, but would have expressions referring to objects in table1.

Then, when table1 is removed, the rule will not be removed as its linked to
a chain in table2. When expressions in the rule are processed or removed,
that will lead to a use-after-free.

When looking for chains by ID, use the table that was used for the lookup
by name, and only return chains belonging to that same table.

Fixes: 837830a4b439 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agonetfilter: nf_tables: do not allow SET_ID to refer to another table
Thadeu Lima de Souza Cascardo [Tue, 9 Aug 2022 17:01:46 +0000 (14:01 -0300)]
netfilter: nf_tables: do not allow SET_ID to refer to another table

commit 470ee20e069a6d05ae549f7d0ef2bdbcee6a81b2 upstream.

When doing lookups for sets on the same batch by using its ID, a set from a
different table can be used.

Then, when the table is removed, a reference to the set may be kept after
the set is freed, leading to a potential use-after-free.

When looking for sets by ID, use the table that was used for the lookup by
name, and only return sets belonging to that same table.

This fixes CVE-2022-2586, also reported as ZDI-CAN-17470.

Reported-by: Team Orca of Sea Security (@seasecresponse)
Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agousb: dwc3: gadget: fix high speed multiplier setting
Michael Grzeschik [Mon, 4 Jul 2022 14:18:12 +0000 (16:18 +0200)]
usb: dwc3: gadget: fix high speed multiplier setting

commit 8affe37c525d800a2628c4ecfaed13b77dc5634a upstream.

For High-Speed Transfers the prepare_one_trb function is calculating the
multiplier setting for the trb based on the length parameter of the trb
currently prepared. This assumption is wrong. For trbs with a sg list,
the length of the actual request has to be taken instead.

Fixes: 40d829fb2ec6 ("usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets")
Cc: stable <stable@kernel.org>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Link: https://lore.kernel.org/r/20220704141812.1532306-3-m.grzeschik@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agousb: dwc3: gadget: refactor dwc3_repare_one_trb
Michael Grzeschik [Mon, 4 Jul 2022 14:18:11 +0000 (16:18 +0200)]
usb: dwc3: gadget: refactor dwc3_repare_one_trb

commit 23385cec5f354794dadced7f28c31da7ae3eb54c upstream.

The function __dwc3_prepare_one_trb has many parameters. Since it is
only used in dwc3_prepare_one_trb there is no point in keeping the
function. We merge both functions and get rid of the big list of
parameters.

Fixes: 40d829fb2ec6 ("usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets")
Cc: stable <stable@kernel.org>
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Link: https://lore.kernel.org/r/20220704141812.1532306-2-m.grzeschik@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoarm64: dts: uniphier: Fix USB interrupts for PXs3 SoC
Kunihiko Hayashi [Tue, 2 Aug 2022 13:36:47 +0000 (22:36 +0900)]
arm64: dts: uniphier: Fix USB interrupts for PXs3 SoC

commit fe17b91a7777df140d0f1433991da67ba658796c upstream.

An interrupt for USB device are shared with USB host. Set interrupt-names
property to common "dwc_usb3" instead of "host" and "peripheral".

Cc: stable@vger.kernel.org
Fixes: d7b9beb830d7 ("arm64: dts: uniphier: Add USB3 controller nodes")
Reported-by: Ryuta NAKANISHI <nakanishi.ryuta@socionext.com>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoARM: dts: uniphier: Fix USB interrupts for PXs2 SoC
Kunihiko Hayashi [Tue, 2 Aug 2022 13:36:25 +0000 (22:36 +0900)]
ARM: dts: uniphier: Fix USB interrupts for PXs2 SoC

commit 9b0dc7abb5cc43a2dbf90690c3c6011dcadc574d upstream.

An interrupt for USB device are shared with USB host. Set interrupt-names
property to common "dwc_usb3" instead of "host" and "peripheral".

Cc: stable@vger.kernel.org
Fixes: 45be1573ad19 ("ARM: dts: uniphier: Add USB3 controller nodes")
Reported-by: Ryuta NAKANISHI <nakanishi.ryuta@socionext.com>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoUSB: HCD: Fix URB giveback issue in tasklet function
Weitao Wang [Tue, 26 Jul 2022 07:49:18 +0000 (15:49 +0800)]
USB: HCD: Fix URB giveback issue in tasklet function

commit 26c6c2f8a907c9e3a2f24990552a4d77235791e6 upstream.

Usb core introduce the mechanism of giveback of URB in tasklet context to
reduce hardware interrupt handling time. On some test situation(such as
FIO with 4KB block size), when tasklet callback function called to
giveback URB, interrupt handler add URB node to the bh->head list also.
If check bh->head list again after finish all URB giveback of local_list,
then it may introduce a "dynamic balance" between giveback URB and add URB
to bh->head list. This tasklet callback function may not exit for a long
time, which will cause other tasklet function calls to be delayed. Some
real-time applications(such as KB and Mouse) will see noticeable lag.

In order to prevent the tasklet function from occupying the cpu for a long
time at a time, new URBS will not be added to the local_list even though
the bh->head list is not empty. But also need to ensure the left URB
giveback to be processed in time, so add a member high_prio for structure
giveback_urb_bh to prioritize tasklet and schelule this tasklet again if
bh->head list is not empty.

At the same time, we are able to prioritize tasklet through structure
member high_prio. So, replace the local high_prio_bh variable with this
structure member in usb_hcd_giveback_urb.

Fixes: 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet context")
Cc: stable <stable@kernel.org>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Link: https://lore.kernel.org/r/20220726074918.5114-1-WeitaoWang-oc@zhaoxin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agousb: typec: ucsi: Acknowledge the GET_ERROR_STATUS command completion
Linyu Yuan [Tue, 26 Jul 2022 06:45:49 +0000 (14:45 +0800)]
usb: typec: ucsi: Acknowledge the GET_ERROR_STATUS command completion

commit a7dc438b5e446afcd1b3b6651da28271400722f2 upstream.

We found PPM will not send any notification after it report error status
and OPM issue GET_ERROR_STATUS command to read the details about error.

According UCSI spec, PPM may clear the Error Status Data after the OPM
has acknowledged the command completion.

This change add operation to acknowledge the command completion from PPM.

Fixes: bdc62f2bae8f (usb: typec: ucsi: Simplified registration and I/O API)
Cc: <stable@vger.kernel.org> # 5.10
Signed-off-by: Jack Pham <quic_jackp@quicinc.com>
Signed-off-by: Linyu Yuan <quic_linyyuan@quicinc.com>
Link: https://lore.kernel.org/r/1658817949-4632-1-git-send-email-quic_linyyuan@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agocoresight: Clear the connection field properly
Suzuki K Poulose [Tue, 14 Jun 2022 21:40:24 +0000 (22:40 +0100)]
coresight: Clear the connection field properly

commit 2af89ebacf299b7fba5f3087d35e8a286ec33706 upstream.

coresight devices track their connections (output connections) and
hold a reference to the fwnode. When a device goes away, we walk through
the devices on the coresight bus and make sure that the references
are dropped. This happens both ways:
 a) For all output connections from the device, drop the reference to
    the target device via coresight_release_platform_data()

b) Iterate over all the devices on the coresight bus and drop the
   reference to fwnode if *this* device is the target of the output
   connection, via coresight_remove_conns()->coresight_remove_match().

However, the coresight_remove_match() doesn't clear the fwnode field,
after dropping the reference, this causes use-after-free and
additional refcount drops on the fwnode.

e.g., if we have two devices, A and B, with a connection, A -> B.
If we remove B first, B would clear the reference on B, from A
via coresight_remove_match(). But when A is removed, it still has
a connection with fwnode still pointing to B. Thus it tries to  drops
the reference in coresight_release_platform_data(), raising the bells
like :

[   91.990153] ------------[ cut here ]------------
[   91.990163] refcount_t: addition on 0; use-after-free.
[   91.990212] WARNING: CPU: 0 PID: 461 at lib/refcount.c:25 refcount_warn_saturate+0xa0/0x144
[   91.990260] Modules linked in: coresight_funnel coresight_replicator coresight_etm4x(-)
 crct10dif_ce coresight ip_tables x_tables ipv6 [last unloaded: coresight_cpu_debug]
[   91.990398] CPU: 0 PID: 461 Comm: rmmod Tainted: G        W       T 5.19.0-rc2+ #53
[   91.990418] Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb  1 2019
[   91.990434] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   91.990454] pc : refcount_warn_saturate+0xa0/0x144
[   91.990476] lr : refcount_warn_saturate+0xa0/0x144
[   91.990496] sp : ffff80000c843640
[   91.990509] x29: ffff80000c843640 x28: ffff800009957c28 x27: ffff80000c8439a8
[   91.990560] x26: ffff00097eff1990 x25: ffff8000092b6ad8 x24: ffff00097eff19a8
[   91.990610] x23: ffff80000c8439a8 x22: 0000000000000000 x21: ffff80000c8439c2
[   91.990659] x20: 0000000000000000 x19: ffff00097eff1a10 x18: ffff80000ab99c40
[   91.990708] x17: 0000000000000000 x16: 0000000000000000 x15: ffff80000abf6fa0
[   91.990756] x14: 000000000000001d x13: 0a2e656572662d72 x12: 657466612d657375
[   91.990805] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffff8000081aba28
[   91.990854] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 746e756f63666572
[   91.990903] x5 : ffff00097648ec58 x4 : 0000000000000000 x3 : 0000000000000027
[   91.990952] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00080260ba00
[   91.991000] Call trace:
[   91.991012]  refcount_warn_saturate+0xa0/0x144
[   91.991034]  kobject_get+0xac/0xb0
[   91.991055]  of_node_get+0x2c/0x40
[   91.991076]  of_fwnode_get+0x40/0x60
[   91.991094]  fwnode_handle_get+0x3c/0x60
[   91.991116]  fwnode_get_nth_parent+0xf4/0x110
[   91.991137]  fwnode_full_name_string+0x48/0xc0
[   91.991158]  device_node_string+0x41c/0x530
[   91.991178]  pointer+0x320/0x3ec
[   91.991198]  vsnprintf+0x23c/0x750
[   91.991217]  vprintk_store+0x104/0x4b0
[   91.991238]  vprintk_emit+0x8c/0x360
[   91.991257]  vprintk_default+0x44/0x50
[   91.991276]  vprintk+0xcc/0xf0
[   91.991295]  _printk+0x68/0x90
[   91.991315]  of_node_release+0x13c/0x14c
[   91.991334]  kobject_put+0x98/0x114
[   91.991354]  of_node_put+0x24/0x34
[   91.991372]  of_fwnode_put+0x40/0x5c
[   91.991390]  fwnode_handle_put+0x38/0x50
[   91.991411]  coresight_release_platform_data+0x74/0xb0 [coresight]
[   91.991472]  coresight_unregister+0x64/0xcc [coresight]
[   91.991525]  etm4_remove_dev+0x64/0x78 [coresight_etm4x]
[   91.991563]  etm4_remove_amba+0x1c/0x2c [coresight_etm4x]
[   91.991598]  amba_remove+0x3c/0x19c

Reproducible by: (Build all coresight components as modules):

  #!/bin/sh
  while true
  do
     for m in tmc stm cpu_debug etm4x replicator funnel
     do
      modprobe coresight_${m}
     done

     for m in tmc stm cpu_debug etm4x replicator funnel
     do
      rmmode coresight_${m}
     done
  done

Cc: stable@vger.kernel.org
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Fixes: 37ea1ffddffa ("coresight: Use fwnode handle instead of device names")
Link: https://lore.kernel.org/r/20220614214024.3005275-1-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoMIPS: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
Huacai Chen [Thu, 14 Jul 2022 08:41:34 +0000 (16:41 +0800)]
MIPS: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK

commit e1a534f5d074db45ae5cbac41d8912b98e96a006 upstream.

When CONFIG_CPUMASK_OFFSTACK and CONFIG_DEBUG_PER_CPU_MAPS is selected,
cpu_max_bits_warn() generates a runtime warning similar as below while
we show /proc/cpuinfo. Fix this by using nr_cpu_ids (the runtime limit)
instead of NR_CPUS to iterate CPUs.

[    3.052463] ------------[ cut here ]------------
[    3.059679] WARNING: CPU: 3 PID: 1 at include/linux/cpumask.h:108 show_cpuinfo+0x5e8/0x5f0
[    3.070072] Modules linked in: efivarfs autofs4
[    3.076257] CPU: 0 PID: 1 Comm: systemd Not tainted 5.19-rc5+ #1052
[    3.084034] Hardware name: Loongson Loongson-3A4000-7A1000-1w-V0.1-CRB/Loongson-LS3A4000-7A1000-1w-EVB-V1.21, BIOS Loongson-UDK2018-V2.0.04082-beta7 04/27
[    3.099465] Stack : 9000000100157b08 9000000000f18530 9000000000cf846c 9000000100154000
[    3.109127]         9000000100157a50 0000000000000000 9000000100157a58 9000000000ef7430
[    3.118774]         90000001001578e8 0000000000000040 0000000000000020 ffffffffffffffff
[    3.128412]         0000000000aaaaaa 1ab25f00eec96a37 900000010021de80 900000000101c890
[    3.138056]         0000000000000000 0000000000000000 0000000000000000 0000000000aaaaaa
[    3.147711]         ffff8000339dc220 0000000000000001 0000000006ab4000 0000000000000000
[    3.157364]         900000000101c998 0000000000000004 9000000000ef7430 0000000000000000
[    3.167012]         0000000000000009 000000000000006c 0000000000000000 0000000000000000
[    3.176641]         9000000000d3de08 9000000001639390 90000000002086d8 00007ffff0080286
[    3.186260]         00000000000000b0 0000000000000004 0000000000000000 0000000000071c1c
[    3.195868]         ...
[    3.199917] Call Trace:
[    3.203941] [<98000000002086d8>] show_stack+0x38/0x14c
[    3.210666] [<9800000000cf846c>] dump_stack_lvl+0x60/0x88
[    3.217625] [<980000000023d268>] __warn+0xd0/0x100
[    3.223958] [<9800000000cf3c90>] warn_slowpath_fmt+0x7c/0xcc
[    3.231150] [<9800000000210220>] show_cpuinfo+0x5e8/0x5f0
[    3.238080] [<98000000004f578c>] seq_read_iter+0x354/0x4b4
[    3.245098] [<98000000004c2e90>] new_sync_read+0x17c/0x1c4
[    3.252114] [<98000000004c5174>] vfs_read+0x138/0x1d0
[    3.258694] [<98000000004c55f8>] ksys_read+0x70/0x100
[    3.265265] [<9800000000cfde9c>] do_syscall+0x7c/0x94
[    3.271820] [<9800000000202fe4>] handle_syscall+0xc4/0x160
[    3.281824] ---[ end trace 8b484262b4b8c24c ]---

Cc: stable@vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agopowerpc/powernv: Avoid crashing if rng is NULL
Michael Ellerman [Wed, 27 Jul 2022 14:32:17 +0000 (00:32 +1000)]
powerpc/powernv: Avoid crashing if rng is NULL

commit 90b5d4fe0b3ba7f589c6723c6bfb559d9e83956a upstream.

On a bare-metal Power8 system that doesn't have an "ibm,power-rng", a
malicious QEMU and guest that ignore the absence of the
KVM_CAP_PPC_HWRNG flag, and calls H_RANDOM anyway, will dereference a
NULL pointer.

In practice all Power8 machines have an "ibm,power-rng", but let's not
rely on that, add a NULL check and early return in
powernv_get_random_real_mode().

Fixes: e928e9cb3601 ("KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220727143219.2684192-1-mpe@ellerman.id.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agopowerpc/ptdump: Fix display of RW pages on FSL_BOOK3E
Christophe Leroy [Tue, 28 Jun 2022 14:43:35 +0000 (16:43 +0200)]
powerpc/ptdump: Fix display of RW pages on FSL_BOOK3E

commit dd8de84b57b02ba9c1fe530a6d916c0853f136bd upstream.

On FSL_BOOK3E, _PAGE_RW is defined with two bits, one for user and one
for supervisor. As soon as one of the two bits is set, the page has
to be display as RW. But the way it is implemented today requires both
bits to be set in order to display it as RW.

Instead of display RW when _PAGE_RW bits are set and R otherwise,
reverse the logic and display R when _PAGE_RW bits are all 0 and
RW otherwise.

This change has no impact on other platforms as _PAGE_RW is a single
bit on all of them.

Fixes: 8eb07b187000 ("powerpc/mm: Dump linux pagetables")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0c33b96317811edf691e81698aaee8fa45ec3449.1656427391.git.christophe.leroy@csgroup.eu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agopowerpc/fsl-pci: Fix Class Code of PCIe Root Port
Pali Rohár [Wed, 6 Jul 2022 10:10:43 +0000 (12:10 +0200)]
powerpc/fsl-pci: Fix Class Code of PCIe Root Port

commit 0c551abfa004ce154d487d91777bf221c808a64f upstream.

By default old pre-3.0 Freescale PCIe controllers reports invalid PCI Class
Code 0x0b20 for PCIe Root Port. It can be seen by lspci -b output on P2020
board which has this pre-3.0 controller:

  $ lspci -bvnn
  00:00.0 Power PC [0b20]: Freescale Semiconductor Inc P2020E [1957:0070] (rev 21)
          !!! Invalid class 0b20 for header type 01
          Capabilities: [4c] Express Root Port (Slot-), MSI 00

Fix this issue by programming correct PCI Class Code 0x0604 for PCIe Root
Port to the Freescale specific PCIe register 0x474.

With this change lspci -b output is:

  $ lspci -bvnn
  00:00.0 PCI bridge [0604]: Freescale Semiconductor Inc P2020E [1957:0070] (rev 21) (prog-if 00 [Normal decode])
          Capabilities: [4c] Express Root Port (Slot-), MSI 00

Without any "Invalid class" error. So class code was properly reflected
into standard (read-only) PCI register 0x08.

Same fix is already implemented in U-Boot pcie_fsl.c driver in commit:
http://source.denx.de/u-boot/u-boot/-/commit/d18d06ac35229345a0af80977a408cfbe1d1015b

Fix activated by U-Boot stay active also after booting Linux kernel.
But boards which use older U-Boot version without that fix are affected and
still require this fix.

So implement this class code fix also in kernel fsl_pci.c driver.

Cc: stable@vger.kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220706101043.4867-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoPCI: Add defines for normal and subtractive PCI bridges
Pali Rohár [Mon, 14 Feb 2022 11:41:08 +0000 (12:41 +0100)]
PCI: Add defines for normal and subtractive PCI bridges

commit 904b10fb189cc15376e9bfce1ef0282e68b0b004 upstream.

Add these PCI class codes to pci_ids.h:

  PCI_CLASS_BRIDGE_PCI_NORMAL
  PCI_CLASS_BRIDGE_PCI_SUBTRACTIVE

Use these defines in all kernel code for describing PCI class codes for
normal and subtractive PCI bridges.

[bhelgaas: similar change in pci-mvebu.c]
Link: https://lore.kernel.org/r/20220214114109.26809-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Guenter Roeck <linux@roeck-us.net>a
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
[ gregkh - take only the pci_ids.h portion for stable backports ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
Alexander Lobakin [Fri, 24 Jun 2022 12:13:05 +0000 (14:13 +0200)]
ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()

commit e5a16a5c4602c119262f350274021f90465f479d upstream.

test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!

Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable@vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agomedia: [PATCH] pci: atomisp_cmd: fix three missing checks on list iterator
Xiaomeng Tong [Thu, 14 Apr 2022 04:14:15 +0000 (05:14 +0100)]
media: [PATCH] pci: atomisp_cmd: fix three missing checks on list iterator

commit 09b204eb9de9fdf07d028c41c4331b5cfeb70dd7 upstream.

The three bugs are here:
__func__, s3a_buf->s3a_data->exp_id);
__func__, md_buf->metadata->exp_id);
__func__, dis_buf->dis_data->exp_id);

The list iterator 's3a_buf/md_buf/dis_buf' will point to a bogus
position containing HEAD if the list is empty or no element is found.
This case must be checked before any use of the iterator, otherwise
it will lead to a invalid memory access.

To fix this bug, add an check. Use a new variable '*_iter' as the
list iterator, while use the old variable '*_buf' as a dedicated
pointer to point to the found element.

Link: https://lore.kernel.org/linux-media/20220414041415.3342-1-xiam0nd.tong@gmail.com
Cc: stable@vger.kernel.org
Fixes: ad85094b293e4 ("Revert "media: staging: atomisp: Remove driver"")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agombcache: add functions to delete entry if unused
Jan Kara [Tue, 12 Jul 2022 10:54:21 +0000 (12:54 +0200)]
mbcache: add functions to delete entry if unused

commit 3dc96bba65f53daa217f0a8f43edad145286a8f5 upstream.

Add function mb_cache_entry_delete_or_get() to delete mbcache entry if
it is unused and also add a function to wait for entry to become unused
- mb_cache_entry_wait_unused(). We do not share code between the two
deleting function as one of them will go away soon.

CC: stable@vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agombcache: don't reclaim used entries
Jan Kara [Tue, 12 Jul 2022 10:54:20 +0000 (12:54 +0200)]
mbcache: don't reclaim used entries

commit 58318914186c157477b978b1739dfe2f1b9dc0fe upstream.

Do not reclaim entries that are currently used by somebody from a
shrinker. Firstly, these entries are likely useful. Secondly, we will
need to keep such entries to protect pending increment of xattr block
refcount.

CC: stable@vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220712105436.32204-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agomd-raid10: fix KASAN warning
Mikulas Patocka [Tue, 26 Jul 2022 08:33:12 +0000 (04:33 -0400)]
md-raid10: fix KASAN warning

commit d17f744e883b2f8d13cca252d71cfe8ace346f7d upstream.

There's a KASAN warning in raid10_remove_disk when running the lvm
test lvconvert-raid-reshape.sh. We fix this warning by verifying that the
value "number" is valid.

BUG: KASAN: slab-out-of-bounds in raid10_remove_disk+0x61/0x2a0 [raid10]
Read of size 8 at addr ffff889108f3d300 by task mdX_raid10/124682

CPU: 3 PID: 124682 Comm: mdX_raid10 Not tainted 5.19.0-rc6 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x34/0x44
 print_report.cold+0x45/0x57a
 ? __lock_text_start+0x18/0x18
 ? raid10_remove_disk+0x61/0x2a0 [raid10]
 kasan_report+0xa8/0xe0
 ? raid10_remove_disk+0x61/0x2a0 [raid10]
 raid10_remove_disk+0x61/0x2a0 [raid10]
Buffer I/O error on dev dm-76, logical block 15344, async page read
 ? __mutex_unlock_slowpath.constprop.0+0x1e0/0x1e0
 remove_and_add_spares+0x367/0x8a0 [md_mod]
 ? super_written+0x1c0/0x1c0 [md_mod]
 ? mutex_trylock+0xac/0x120
 ? _raw_spin_lock+0x72/0xc0
 ? _raw_spin_lock_bh+0xc0/0xc0
 md_check_recovery+0x848/0x960 [md_mod]
 raid10d+0xcf/0x3360 [raid10]
 ? sched_clock_cpu+0x185/0x1a0
 ? rb_erase+0x4d4/0x620
 ? var_wake_function+0xe0/0xe0
 ? psi_group_change+0x411/0x500
 ? preempt_count_sub+0xf/0xc0
 ? _raw_spin_lock_irqsave+0x78/0xc0
 ? __lock_text_start+0x18/0x18
 ? raid10_sync_request+0x36c0/0x36c0 [raid10]
 ? preempt_count_sub+0xf/0xc0
 ? _raw_spin_unlock_irqrestore+0x19/0x40
 ? del_timer_sync+0xa9/0x100
 ? try_to_del_timer_sync+0xc0/0xc0
 ? _raw_spin_lock_irqsave+0x78/0xc0
 ? __lock_text_start+0x18/0x18
 ? _raw_spin_unlock_irq+0x11/0x24
 ? __list_del_entry_valid+0x68/0xa0
 ? finish_wait+0xa3/0x100
 md_thread+0x161/0x260 [md_mod]
 ? unregister_md_personality+0xa0/0xa0 [md_mod]
 ? _raw_spin_lock_irqsave+0x78/0xc0
 ? prepare_to_wait_event+0x2c0/0x2c0
 ? unregister_md_personality+0xa0/0xa0 [md_mod]
 kthread+0x148/0x180
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>

Allocated by task 124495:
 kasan_save_stack+0x1e/0x40
 __kasan_kmalloc+0x80/0xa0
 setup_conf+0x140/0x5c0 [raid10]
 raid10_run+0x4cd/0x740 [raid10]
 md_run+0x6f9/0x1300 [md_mod]
 raid_ctr+0x2531/0x4ac0 [dm_raid]
 dm_table_add_target+0x2b0/0x620 [dm_mod]
 table_load+0x1c8/0x400 [dm_mod]
 ctl_ioctl+0x29e/0x560 [dm_mod]
 dm_compat_ctl_ioctl+0x7/0x20 [dm_mod]
 __do_compat_sys_ioctl+0xfa/0x160
 do_syscall_64+0x90/0xc0
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Last potentially related work creation:
 kasan_save_stack+0x1e/0x40
 __kasan_record_aux_stack+0x9e/0xc0
 kvfree_call_rcu+0x84/0x480
 timerfd_release+0x82/0x140
L __fput+0xfa/0x400
 task_work_run+0x80/0xc0
 exit_to_user_mode_prepare+0x155/0x160
 syscall_exit_to_user_mode+0x12/0x40
 do_syscall_64+0x42/0xc0
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Second to last potentially related work creation:
 kasan_save_stack+0x1e/0x40
 __kasan_record_aux_stack+0x9e/0xc0
 kvfree_call_rcu+0x84/0x480
 timerfd_release+0x82/0x140
 __fput+0xfa/0x400
 task_work_run+0x80/0xc0
 exit_to_user_mode_prepare+0x155/0x160
 syscall_exit_to_user_mode+0x12/0x40
 do_syscall_64+0x42/0xc0
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

The buggy address belongs to the object at ffff889108f3d200
 which belongs to the cache kmalloc-256 of size 256
The buggy address is located 0 bytes to the right of
 256-byte region [ffff889108f3d200ffff889108f3d300)

The buggy address belongs to the physical page:
page:000000007ef2a34c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1108f3c
head:000000007ef2a34c order:2 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=2)
raw: 4000000000010200 0000000000000000 dead000000000001 ffff889100042b40
raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff889108f3d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff889108f3d280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff889108f3d300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                   ^
 ffff889108f3d380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff889108f3d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agomd-raid: destroy the bitmap after destroying the thread
Mikulas Patocka [Sun, 24 Jul 2022 18:26:12 +0000 (14:26 -0400)]
md-raid: destroy the bitmap after destroying the thread

commit e151db8ecfb019b7da31d076130a794574c89f6f upstream.

When we ran the lvm test "shell/integrity-blocksize-3.sh" on a kernel with
kasan, we got failure in write_page.

The reason for the failure is that md_bitmap_destroy is called before
destroying the thread and the thread may be waiting in the function
write_page for the bio to complete. When the thread finishes waiting, it
executes "if (test_bit(BITMAP_WRITE_ERROR, &bitmap->flags))", which
triggers the kasan warning.

Note that the commit 48df498daf62 that caused this bug claims that it is
neede for md-cluster, you should check md-cluster and possibly find
another bugfix for it.

BUG: KASAN: use-after-free in write_page+0x18d/0x680 [md_mod]
Read of size 8 at addr ffff889162030c78 by task mdX_raid1/5539

CPU: 10 PID: 5539 Comm: mdX_raid1 Not tainted 5.19.0-rc2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x34/0x44
 print_report.cold+0x45/0x57a
 ? __lock_text_start+0x18/0x18
 ? write_page+0x18d/0x680 [md_mod]
 kasan_report+0xa8/0xe0
 ? write_page+0x18d/0x680 [md_mod]
 kasan_check_range+0x13f/0x180
 write_page+0x18d/0x680 [md_mod]
 ? super_sync+0x4d5/0x560 [dm_raid]
 ? md_bitmap_file_kick+0xa0/0xa0 [md_mod]
 ? rs_set_dev_and_array_sectors+0x2e0/0x2e0 [dm_raid]
 ? mutex_trylock+0x120/0x120
 ? preempt_count_add+0x6b/0xc0
 ? preempt_count_sub+0xf/0xc0
 md_update_sb+0x707/0xe40 [md_mod]
 md_reap_sync_thread+0x1b2/0x4a0 [md_mod]
 md_check_recovery+0x533/0x960 [md_mod]
 raid1d+0xc8/0x2a20 [raid1]
 ? var_wake_function+0xe0/0xe0
 ? psi_group_change+0x411/0x500
 ? preempt_count_sub+0xf/0xc0
 ? _raw_spin_lock_irqsave+0x78/0xc0
 ? __lock_text_start+0x18/0x18
 ? raid1_end_read_request+0x2a0/0x2a0 [raid1]
 ? preempt_count_sub+0xf/0xc0
 ? _raw_spin_unlock_irqrestore+0x19/0x40
 ? del_timer_sync+0xa9/0x100
 ? try_to_del_timer_sync+0xc0/0xc0
 ? _raw_spin_lock_irqsave+0x78/0xc0
 ? __lock_text_start+0x18/0x18
 ? __list_del_entry_valid+0x68/0xa0
 ? finish_wait+0xa3/0x100
 md_thread+0x161/0x260 [md_mod]
 ? unregister_md_personality+0xa0/0xa0 [md_mod]
 ? _raw_spin_lock_irqsave+0x78/0xc0
 ? prepare_to_wait_event+0x2c0/0x2c0
 ? unregister_md_personality+0xa0/0xa0 [md_mod]
 kthread+0x148/0x180
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>

Allocated by task 5522:
 kasan_save_stack+0x1e/0x40
 __kasan_kmalloc+0x80/0xa0
 md_bitmap_create+0xa8/0xe80 [md_mod]
 md_run+0x777/0x1300 [md_mod]
 raid_ctr+0x249c/0x4a30 [dm_raid]
 dm_table_add_target+0x2b0/0x620 [dm_mod]
 table_load+0x1c8/0x400 [dm_mod]
 ctl_ioctl+0x29e/0x560 [dm_mod]
 dm_compat_ctl_ioctl+0x7/0x20 [dm_mod]
 __do_compat_sys_ioctl+0xfa/0x160
 do_syscall_64+0x90/0xc0
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Freed by task 5680:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x40
 kasan_set_free_info+0x20/0x40
 __kasan_slab_free+0xf7/0x140
 kfree+0x80/0x240
 md_bitmap_free+0x1c3/0x280 [md_mod]
 __md_stop+0x21/0x120 [md_mod]
 md_stop+0x9/0x40 [md_mod]
 raid_dtr+0x1b/0x40 [dm_raid]
 dm_table_destroy+0x98/0x1e0 [dm_mod]
 __dm_destroy+0x199/0x360 [dm_mod]
 dev_remove+0x10c/0x160 [dm_mod]
 ctl_ioctl+0x29e/0x560 [dm_mod]
 dm_compat_ctl_ioctl+0x7/0x20 [dm_mod]
 __do_compat_sys_ioctl+0xfa/0x160
 do_syscall_64+0x90/0xc0
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoserial: mvebu-uart: uart2 error bits clearing
Narendra Hadke [Tue, 26 Jul 2022 09:12:21 +0000 (11:12 +0200)]
serial: mvebu-uart: uart2 error bits clearing

commit a7209541239e5dd44d981289e5f9059222d40fd1 upstream.

For mvebu uart2, error bits are not cleared on buffer read.
This causes interrupt loop and system hang.

Cc: stable@vger.kernel.org
Reviewed-by: Yi Guo <yi.guo@cavium.com>
Reviewed-by: Nadav Haklai <nadavh@marvell.com>
Signed-off-by: Narendra Hadke <nhadke@marvell.com>
Signed-off-by: Pali Rohár <pali@kernel.org>
Link: https://lore.kernel.org/r/20220726091221.12358-1-pali@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agofuse: ioctl: translate ENOSYS
Miklos Szeredi [Thu, 21 Jul 2022 14:06:18 +0000 (16:06 +0200)]
fuse: ioctl: translate ENOSYS

commit 02c0cab8e7345b06f1c0838df444e2902e4138d3 upstream.

Overlayfs may fail to complete updates when a filesystem lacks
fileattr/xattr syscall support and responds with an ENOSYS error code,
resulting in an unexpected "Function not implemented" error.

This bug may occur with FUSE filesystems, such as davfs2.

Steps to reproduce:

  # install davfs2, e.g., apk add davfs2
  mkdir /test mkdir /test/lower /test/upper /test/work /test/mnt
  yes '' | mount -t davfs -o ro http://some-web-dav-server/path \
    /test/lower
  mount -t overlay -o upperdir=/test/upper,lowerdir=/test/lower \
    -o workdir=/test/work overlay /test/mnt

  # when "some-file" exists in the lowerdir, this fails with "Function
  # not implemented", with dmesg showing "overlayfs: failed to retrieve
  # lower fileattr (/some-file, err=-38)"
  touch /test/mnt/some-file

The underlying cause of this regresion is actually in FUSE, which fails to
translate the ENOSYS error code returned by userspace filesystem (which
means that the ioctl operation is not supported) to ENOTTY.

Reported-by: Christian Kohlschütter <christian@kohlschutter.com>
Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags")
Fixes: 59efec7b9039 ("fuse: implement ioctl support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agofuse: limit nsec
Miklos Szeredi [Thu, 21 Jul 2022 14:06:18 +0000 (16:06 +0200)]
fuse: limit nsec

commit 47912eaa061a6a81e4aa790591a1874c650733c0 upstream.

Limit nanoseconds to 0..999999999.

Fixes: d8a5ba45457e ("[PATCH] FUSE - core")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoksmbd: fix use-after-free bug in smb2_tree_disconect
Namjae Jeon [Thu, 28 Jul 2022 12:57:08 +0000 (21:57 +0900)]
ksmbd: fix use-after-free bug in smb2_tree_disconect

commit cf6531d98190fa2cf92a6d8bbc8af0a4740a223c upstream.

smb2_tree_disconnect() freed the struct ksmbd_tree_connect,
but it left the dangling pointer. It can be accessed
again under compound requests.

This bug can lead an oops looking something link:

[ 1685.468014 ] BUG: KASAN: use-after-free in ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd]
[ 1685.468068 ] Read of size 4 at addr ffff888102172180 by task kworker/1:2/4807
...
[ 1685.468130 ] Call Trace:
[ 1685.468132 ]  <TASK>
[ 1685.468135 ]  dump_stack_lvl+0x49/0x5f
[ 1685.468141 ]  print_report.cold+0x5e/0x5cf
[ 1685.468145 ]  ? ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd]
[ 1685.468157 ]  kasan_report+0xaa/0x120
[ 1685.468194 ]  ? ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd]
[ 1685.468206 ]  __asan_report_load4_noabort+0x14/0x20
[ 1685.468210 ]  ksmbd_tree_conn_disconnect+0x131/0x160 [ksmbd]
[ 1685.468222 ]  smb2_tree_disconnect+0x175/0x250 [ksmbd]
[ 1685.468235 ]  handle_ksmbd_work+0x30e/0x1020 [ksmbd]
[ 1685.468247 ]  process_one_work+0x778/0x11c0
[ 1685.468251 ]  ? _raw_spin_lock_irq+0x8e/0xe0
[ 1685.468289 ]  worker_thread+0x544/0x1180
[ 1685.468293 ]  ? __cpuidle_text_end+0x4/0x4
[ 1685.468297 ]  kthread+0x282/0x320
[ 1685.468301 ]  ? process_one_work+0x11c0/0x11c0
[ 1685.468305 ]  ? kthread_complete_and_exit+0x30/0x30
[ 1685.468309 ]  ret_from_fork+0x1f/0x30

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17816
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoksmbd: prevent out of bound read for SMB2_TREE_CONNNECT
Hyunchul Lee [Thu, 28 Jul 2022 12:58:53 +0000 (21:58 +0900)]
ksmbd: prevent out of bound read for SMB2_TREE_CONNNECT

commit 824d4f64c20093275f72fc8101394d75ff6a249e upstream.

if Status is not 0 and PathLength is long,
smb_strndup_from_utf16 could make out of bound
read in smb2_tree_connnect.

This bug can lead an oops looking something like:

[ 1553.882047] BUG: KASAN: slab-out-of-bounds in smb_strndup_from_utf16+0x469/0x4c0 [ksmbd]
[ 1553.882064] Read of size 2 at addr ffff88802c4eda04 by task kworker/0:2/42805
...
[ 1553.882095] Call Trace:
[ 1553.882098]  <TASK>
[ 1553.882101]  dump_stack_lvl+0x49/0x5f
[ 1553.882107]  print_report.cold+0x5e/0x5cf
[ 1553.882112]  ? smb_strndup_from_utf16+0x469/0x4c0 [ksmbd]
[ 1553.882122]  kasan_report+0xaa/0x120
[ 1553.882128]  ? smb_strndup_from_utf16+0x469/0x4c0 [ksmbd]
[ 1553.882139]  __asan_report_load_n_noabort+0xf/0x20
[ 1553.882143]  smb_strndup_from_utf16+0x469/0x4c0 [ksmbd]
[ 1553.882155]  ? smb_strtoUTF16+0x3b0/0x3b0 [ksmbd]
[ 1553.882166]  ? __kmalloc_node+0x185/0x430
[ 1553.882171]  smb2_tree_connect+0x140/0xab0 [ksmbd]
[ 1553.882185]  handle_ksmbd_work+0x30e/0x1020 [ksmbd]
[ 1553.882197]  process_one_work+0x778/0x11c0
[ 1553.882201]  ? _raw_spin_lock_irq+0x8e/0xe0
[ 1553.882206]  worker_thread+0x544/0x1180
[ 1553.882209]  ? __cpuidle_text_end+0x4/0x4
[ 1553.882214]  kthread+0x282/0x320
[ 1553.882218]  ? process_one_work+0x11c0/0x11c0
[ 1553.882221]  ? kthread_complete_and_exit+0x30/0x30
[ 1553.882225]  ret_from_fork+0x1f/0x30
[ 1553.882231]  </TASK>

There is no need to check error request validation in server.
This check allow invalid requests not to validate message.

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17818
Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoksmbd: fix memory leak in smb2_handle_negotiate
Namjae Jeon [Thu, 28 Jul 2022 12:56:19 +0000 (21:56 +0900)]
ksmbd: fix memory leak in smb2_handle_negotiate

commit aa7253c2393f6dcd6a1468b0792f6da76edad917 upstream.

The allocated memory didn't free under an error
path in smb2_handle_negotiate().

Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17815
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agosoundwire: qcom: Check device status before reading devid
Srinivas Kandagatla [Wed, 6 Jul 2022 09:56:44 +0000 (10:56 +0100)]
soundwire: qcom: Check device status before reading devid

commit aa1262ca66957183ea1fb32a067e145b995f3744 upstream.

As per hardware datasheet its recommended that we check the device
status before reading devid assigned by auto-enumeration.

Without this patch we see SoundWire devices with invalid enumeration
addresses on the bus.

Cc: stable@vger.kernel.org
Fixes: a6e6581942ca ("soundwire: qcom: add auto enumeration support")
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220706095644.5852-1-srinivas.kandagatla@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoscsi: qla2xxx: Zero undefined mailbox IN registers
Bikash Hazarika [Wed, 13 Jul 2022 05:20:38 +0000 (22:20 -0700)]
scsi: qla2xxx: Zero undefined mailbox IN registers

commit 6c96a3c7d49593ef15805f5e497601c87695abc9 upstream.

While requesting a new mailbox command, driver does not write any data to
unused registers.  Initialize the unused register value to zero while
requesting a new mailbox command to prevent stale entry access by firmware.

Link: https://lore.kernel.org/r/20220713052045.10683-4-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoscsi: qla2xxx: Fix incorrect display of max frame size
Bikash Hazarika [Wed, 13 Jul 2022 05:20:37 +0000 (22:20 -0700)]
scsi: qla2xxx: Fix incorrect display of max frame size

commit cf3b4fb655796674e605268bd4bfb47a47c8bce6 upstream.

Replace display field with the correct field.

Link: https://lore.kernel.org/r/20220713052045.10683-3-njavali@marvell.com
Fixes: 8777e4314d39 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoscsi: sg: Allow waiting for commands to complete on removed device
Tony Battersby [Mon, 11 Jul 2022 14:51:32 +0000 (10:51 -0400)]
scsi: sg: Allow waiting for commands to complete on removed device

commit 3455607fd7be10b449f5135c00dc306b85dc0d21 upstream.

When a SCSI device is removed while in active use, currently sg will
immediately return -ENODEV on any attempt to wait for active commands that
were sent before the removal.  This is problematic for commands that use
SG_FLAG_DIRECT_IO since the data buffer may still be in use by the kernel
when userspace frees or reuses it after getting ENODEV, leading to
corrupted userspace memory (in the case of READ-type commands) or corrupted
data being sent to the device (in the case of WRITE-type commands).  This
has been seen in practice when logging out of a iscsi_tcp session, where
the iSCSI driver may still be processing commands after the device has been
marked for removal.

Change the policy to allow userspace to wait for active sg commands even
when the device is being removed.  Return -ENODEV only when there are no
more responses to read.

Link: https://lore.kernel.org/r/5ebea46f-fe83-2d0b-233d-d0dcb362dd0a@cybernetics.com
Cc: <stable@vger.kernel.org>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoiio: light: isl29028: Fix the warning in isl29028_remove()
Zheyu Ma [Sun, 17 Jul 2022 00:42:41 +0000 (08:42 +0800)]
iio: light: isl29028: Fix the warning in isl29028_remove()

commit 06674fc7c003b9d0aa1d37fef7ab2c24802cc6ad upstream.

The driver use the non-managed form of the register function in
isl29028_remove(). To keep the release order as mirroring the ordering
in probe, the driver should use non-managed form in probe, too.

The following log reveals it:

[   32.374955] isl29028 0-0010: remove
[   32.376861] general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN PTI
[   32.377676] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
[   32.379432] RIP: 0010:kernfs_find_and_get_ns+0x28/0xe0
[   32.385461] Call Trace:
[   32.385807]  sysfs_unmerge_group+0x59/0x110
[   32.386110]  dpm_sysfs_remove+0x58/0xc0
[   32.386391]  device_del+0x296/0xe50
[   32.386959]  cdev_device_del+0x1d/0xd0
[   32.387231]  devm_iio_device_unreg+0x27/0xb0
[   32.387542]  devres_release_group+0x319/0x3d0
[   32.388162]  i2c_device_remove+0x93/0x1f0

Fixes: 2db5054ac28d ("staging: iio: isl29028: add runtime power management support")
Signed-off-by: Zheyu Ma <zheyuma97@gmail.com>
Link: https://lore.kernel.org/r/20220717004241.2281028-1-zheyuma97@gmail.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoiio: fix iio_format_avail_range() printing for none IIO_VAL_INT
Fawzi Khaber [Mon, 18 Jul 2022 13:07:06 +0000 (15:07 +0200)]
iio: fix iio_format_avail_range() printing for none IIO_VAL_INT

commit 5e1f91850365de55ca74945866c002fda8f00331 upstream.

iio_format_avail_range() should print range as follow [min, step, max], so
the function was previously calling iio_format_list() with length = 3,
length variable refers to the array size of values not the number of
elements. In case of non IIO_VAL_INT values each element has integer part
and decimal part. With length = 3 this would cause premature end of loop
and result in printing only one element.

Signed-off-by: Fawzi Khaber <fawzi.khaber@tdk.com>
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol@tdk.com>
Fixes: eda20ba1e25e ("iio: core: Consolidate iio_format_avail_{list,range}()")
Link: https://lore.kernel.org/r/20220718130706.32571-1-jmaneyrol@invensense.com
Cc: <Stable@vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoum: seed rng using host OS rng
Jason A. Donenfeld [Tue, 12 Jul 2022 23:12:21 +0000 (01:12 +0200)]
um: seed rng using host OS rng

commit 0b9ba6135d7f18b82f3d8bebb55ded725ba88e0e upstream.

UML generally does not provide access to special CPU instructions like
RDRAND, and execution tends to be rather deterministic, with no real
hardware interrupts, making good randomness really very hard, if not
all together impossible. Not only is this a security eyebrow raiser, but
it's also quite annoying when trying to do various pieces of UML-based
automation that takes a long time to boot, if ever.

Fix this by trivially calling getrandom() in the host and using that
seed as "bootloader randomness", which initializes the rng immediately
at UML boot.

The old behavior can be restored the same way as on any other arch, by
way of CONFIG_TRUST_BOOTLOADER_RANDOMNESS=n or
random.trust_bootloader=0. So seen from that perspective, this just
makes UML act like other archs, which is positive in its own right.

Additionally, wire up arch_get_random_{int,long}() in the same way, so
that reseeds can also make use of the host RNG, controllable by
CONFIG_TRUST_CPU_RANDOMNESS and random.trust_cpu, per usual.

Cc: stable@vger.kernel.org
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoum: Remove straying parenthesis
Benjamin Beichler [Tue, 31 May 2022 11:17:39 +0000 (11:17 +0000)]
um: Remove straying parenthesis

commit c6496e0a4a90d8149203c16323cff3fa46e422e7 upstream.

Commit e3a33af812c6 ("um: fix and optimize xor select template for CONFIG64 and timetravel mode")
caused a build regression when CONFIG_XOR_BLOCKS and CONFIG_UML_TIME_TRAVEL_SUPPORT
are selected.
Fix it by removing the straying parenthesis.

Cc: stable@vger.kernel.org
Fixes: e3a33af812c6 ("um: fix and optimize xor select template for CONFIG64 and timetravel mode")
Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de>
[rw: Added commit message]
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agomtd: rawnand: arasan: Update NAND bus clock instead of system clock
Amit Kumar Mahapatra [Tue, 28 Jun 2022 15:48:23 +0000 (21:18 +0530)]
mtd: rawnand: arasan: Update NAND bus clock instead of system clock

commit 7499bfeedb47efc1ee4dc793b92c610d46e6d6a6 upstream.

In current implementation the Arasan NAND driver is updating the
system clock(i.e., anand->clk) in accordance to the timing modes
(i.e., SDR or NVDDR). But as per the Arasan NAND controller spec the
flash clock or the NAND bus clock(i.e., nfc->bus_clk), need to be
updated instead. This patch keeps the system clock unchanged and updates
the NAND bus clock as per the timing modes.

Fixes: 197b88fecc50 ("mtd: rawnand: arasan: Add new Arasan NAND controller")
CC: stable@vger.kernel.org # 5.8+
Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@xilinx.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220628154824.12222-2-amit.kumar-mahapatra@xilinx.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agomtd: rawnand: arasan: Fix clock rate in NV-DDR
Olga Kitaina [Tue, 28 Jun 2022 15:48:24 +0000 (21:18 +0530)]
mtd: rawnand: arasan: Fix clock rate in NV-DDR

commit e16eceea863b417fd328588b1be1a79de0bc937f upstream.

According to the Arasan NAND controller spec, the flash clock rate for SDR
must be <= 100 MHz, while for NV-DDR it must be the same as the rate of the
CLK line for the mode. The driver previously always set 100 MHz for NV-DDR,
which would result in incorrect behavior for NV-DDR modes 0-4.

The appropriate clock rate can be calculated from the NV-DDR timing
parameters as 1/tCK, or for rates measured in picoseconds,
10^12 / nand_nvddr_timings->tCK_min.

Fixes: 197b88fecc50 ("mtd: rawnand: arasan: Add new Arasan NAND controller")
CC: stable@vger.kernel.org # 5.8+
Signed-off-by: Olga Kitaina <okitain@gmail.com>
Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@xilinx.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220628154824.12222-3-amit.kumar-mahapatra@xilinx.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agobtrfs: reject log replay if there is unsupported RO compat flag
Qu Wenruo [Tue, 7 Jun 2022 11:48:24 +0000 (19:48 +0800)]
btrfs: reject log replay if there is unsupported RO compat flag

commit dc4d31684974d140250f3ee612c3f0cab13b3146 upstream.

[BUG]
If we have a btrfs image with dirty log, along with an unsupported RO
compatible flag:

log_root 30474240
...
compat_flags 0x0
compat_ro_flags 0x40000003
( FREE_SPACE_TREE |
  FREE_SPACE_TREE_VALID |
  unknown flag: 0x40000000 )

Then even if we can only mount it RO, we will still cause metadata
update for log replay:

  BTRFS info (device dm-1): flagging fs with big metadata feature
  BTRFS info (device dm-1): using free space tree
  BTRFS info (device dm-1): has skinny extents
  BTRFS info (device dm-1): start tree-log replay

This is definitely against RO compact flag requirement.

[CAUSE]
RO compact flag only forces us to do RO mount, but we will still do log
replay for plain RO mount.

Thus this will result us to do log replay and update metadata.

This can be very problematic for new RO compat flag, for example older
kernel can not understand v2 cache, and if we allow metadata update on
RO mount and invalidate/corrupt v2 cache.

[FIX]
Just reject the mount unless rescue=nologreplay is provided:

  BTRFS error (device dm-1): cannot replay dirty log with unsupport optional features (0x40000000), try rescue=nologreplay instead

We don't want to set rescue=nologreply directly, as this would make the
end user to read the old data, and cause confusion.

Since the such case is really rare, we're mostly fine to just reject the
mount with an error message, which also includes the proper workaround.

CC: stable@vger.kernel.org #4.9+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agobpf: Fix KASAN use-after-free Read in compute_effective_progs
Tadeusz Struk [Tue, 17 May 2022 18:04:20 +0000 (11:04 -0700)]
bpf: Fix KASAN use-after-free Read in compute_effective_progs

commit 4c46091ee985ae84c60c5e95055d779fcd291d87 upstream.

Syzbot found a Use After Free bug in compute_effective_progs().
The reproducer creates a number of BPF links, and causes a fault
injected alloc to fail, while calling bpf_link_detach on them.
Link detach triggers the link to be freed by bpf_link_free(),
which calls __cgroup_bpf_detach() and update_effective_progs().
If the memory allocation in this function fails, the function restores
the pointer to the bpf_cgroup_link on the cgroup list, but the memory
gets freed just after it returns. After this, every subsequent call to
update_effective_progs() causes this already deallocated pointer to be
dereferenced in prog_list_length(), and triggers KASAN UAF error.

To fix this issue don't preserve the pointer to the prog or link in the
list, but remove it and replace it with a dummy prog without shrinking
the table. The subsequent call to __cgroup_bpf_detach() or
__cgroup_bpf_detach() will correct it.

Fixes: af6eea57437a ("bpf: Implement bpf_link-based cgroup BPF program attachment")
Reported-by: <syzbot+f264bffdfbd5614f3bb2@syzkaller.appspotmail.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://syzkaller.appspot.com/bug?id=8ebf179a95c2a2670f7cf1ba62429ec044369db4
Link: https://lore.kernel.org/bpf/20220517180420.87954-1-tadeusz.struk@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/amdgpu: fix check in fbdev init
Alex Deucher [Tue, 19 Jul 2022 18:56:59 +0000 (14:56 -0400)]
drm/amdgpu: fix check in fbdev init

The new vkms virtual display code is atomic so there is
no need to call drm_helper_disable_unused_functions()
when it is enabled.  Doing so can result in a segfault.
When the driver switched from the old virtual display code
to the new atomic virtual display code, it was missed that
we enable virtual display unconditionally under SR-IOV
so the checks here missed that case.  Add the missing
check for SR-IOV.

There is no equivalent of this patch for Linus' tree
because the relevant code no longer exists.  This patch
is only relevant to kernels 5.15 and 5.16.

Fixes: 84ec374bd580 ("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable@vger.kernel.org # 5.15.x
Cc: hgoffin@amazon.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/amdgpu: Check BO's requested pinning domains against its preferred_domains
Leo Li [Tue, 12 Jul 2022 16:30:29 +0000 (12:30 -0400)]
drm/amdgpu: Check BO's requested pinning domains against its preferred_domains

commit f5ba14043621f4afdf3ad5f92ee2d8dbebbe4340 upstream.

When pinning a buffer, we should check to see if there are any
additional restrictions imposed by bo->preferred_domains. This will
prevent the BO from being moved to an invalid domain when pinning.

For example, this can happen if the user requests to create a BO in GTT
domain for display scanout. amdgpu_dm will allow pinning to either VRAM
or GTT domains, since DCN can scanout from either or. However, in
amdgpu_bo_pin_restricted(), pinning to VRAM is preferred if there is
adequate carveout. This can lead to pinning to VRAM despite the user
requesting GTT placement for the BO.

v2: Allow the kernel to override the domain, which can happen when
    exporting a BO to a V4L camera (for example).

Signed-off-by: Leo Li <sunpeng.li@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/nouveau/kms: Fix failure path for creating DP connectors
Lyude Paul [Thu, 26 May 2022 20:43:13 +0000 (16:43 -0400)]
drm/nouveau/kms: Fix failure path for creating DP connectors

commit ca0367ca5d9216644b41f86348d6661f8d9e32d8 upstream.

It looks like that when we moved nouveau over to using drm_dp_aux_init()
and registering it's aux bus during late connector registration, we totally
forgot to fix the failure codepath in nouveau_connector_create() - as it
still seems to assume that drm_dp_aux_init() can fail (it can't).

So, let's fix that and also add a missing check to ensure that we've
properly allocated nv_connector->aux.name while we're at it.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: David Airlie <airlied@linux.ie>
Fixes: fd43ad9d47e7 ("drm/nouveau/kms/nv50-: Move AUX adapter reg to connector late register/early unregister")
Cc: <stable@vger.kernel.org> # v5.14+
Link: https://patchwork.freedesktop.org/patch/msgid/20220526204313.656473-1-lyude@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/nouveau/acpi: Don't print error when we get -EINPROGRESS from pm_runtime
Lyude Paul [Thu, 14 Jul 2022 17:42:33 +0000 (13:42 -0400)]
drm/nouveau/acpi: Don't print error when we get -EINPROGRESS from pm_runtime

commit 53c26181950ddc3c8ace3c0939c89e9c4d8deeb9 upstream.

Since this isn't actually a failure.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: David Airlie <airlied@linux.ie>
Fixes: 79e765ad665d ("drm/nouveau/drm/nouveau: Prevent handling ACPI HPD events too early")
Cc: <stable@vger.kernel.org> # v4.19+
Link: https://patchwork.freedesktop.org/patch/msgid/20220714174234.949259-2-lyude@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/nouveau: Don't pm_runtime_put_sync(), only pm_runtime_put_autosuspend()
Lyude Paul [Thu, 14 Jul 2022 17:42:34 +0000 (13:42 -0400)]
drm/nouveau: Don't pm_runtime_put_sync(), only pm_runtime_put_autosuspend()

commit c96cfaf8fc02d4bb70727dfa7ce7841a3cff9be2 upstream.

While trying to fix another issue, it occurred to me that I don't actually
think there is any situation where we want pm_runtime_put() in nouveau to
be synchronous. In fact, this kind of just seems like it would cause
issues where we may unexpectedly block a thread we don't expect to be
blocked.

So, let's only use pm_runtime_put_autosuspend().

Changes since v1:
* Use pm_runtime_put_autosuspend(), not pm_runtime_put()

Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: David Airlie <airlied@linux.ie>
Fixes: 3a6536c51d5d ("drm/nouveau: Intercept ACPI_VIDEO_NOTIFY_PROBE")
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: https://patchwork.freedesktop.org/patch/msgid/20220714174234.949259-3-lyude@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/nouveau: fix another off-by-one in nvbios_addr
Timur Tabi [Wed, 11 May 2022 16:37:16 +0000 (11:37 -0500)]
drm/nouveau: fix another off-by-one in nvbios_addr

commit c441d28945fb113220d48d6c86ebc0b090a2b677 upstream.

This check determines whether a given address is part of
image 0 or image 1.  Image 1 starts at offset image0_size,
so that address should be included.

Fixes: 4d4e9907ff572 ("drm/nouveau/bios: guard against out-of-bounds accesses to image")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220511163716.3520591-1-ttabi@nvidia.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/hyperv-drm: Include framebuffer and EDID headers
Thomas Zimmermann [Wed, 22 Jun 2022 08:34:13 +0000 (10:34 +0200)]
drm/hyperv-drm: Include framebuffer and EDID headers

commit 009a3a52791f31c57d755a73f6bc66fbdd8bd76c upstream.

Fix a number of compile errors by including the correct header
files. Examples are shown below.

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c: In function 'hyperv_blit_to_vram_rect':
  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:25:48: error: invalid use of undefined type 'struct drm_framebuffer'
   25 |         struct hyperv_drm_device *hv = to_hv(fb->dev);
      |                                                ^~

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c: In function 'hyperv_connector_get_modes':
  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:59:17: error: implicit declaration of function 'drm_add_modes_noedid' [-Werror=implicit-function-declaration]
   59 |         count = drm_add_modes_noedid(connector,
      |                 ^~~~~~~~~~~~~~~~~~~~

  ../drivers/gpu/drm/hyperv/hyperv_drm_modeset.c:62:9: error: implicit declaration of function 'drm_set_preferred_mode'; did you mean 'drm_mm_reserve_node'? [-Werror=implicit-function-declaration]
   62 |         drm_set_preferred_mode(connector, hv->preferred_width,
      |         ^~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
Fixes: 720cf96d8fec ("drm: Drop drm_framebuffer.h from drm_crtc.h")
Fixes: 255490f9150d ("drm: Drop drm_edid.h from drm_crtc.h")
Cc: Deepak Rawat <drawat.floss@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.14+
Acked-by: Maxime Ripard <maxime@cerno.tech>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220622083413.12573-1-tzimmermann@suse.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/vc4: hdmi: Disable audio if dmas property is present but empty
Phil Elwell [Mon, 13 Jun 2022 14:47:44 +0000 (16:47 +0200)]
drm/vc4: hdmi: Disable audio if dmas property is present but empty

commit db2b927f8668adf3ac765e0921cd2720f5c04172 upstream.

The dmas property is used to hold the dmaengine channel used for audio
output.

Older device trees were missing that property, so if it's not there we
disable the audio output entirely.

However, some overlays have set an empty value to that property, mostly
to workaround the fact that overlays cannot remove a property. Let's add
a test for that case and if it's empty, let's disable it as well.

Cc: <stable@vger.kernel.org>
Signed-off-by: Phil Elwell <phil@raspberrypi.org>
Link: https://lore.kernel.org/r/20220613144800.326124-18-maxime@cerno.tech
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/shmem-helper: Add missing vunmap on error
Dmitry Osipenko [Thu, 30 Jun 2022 20:00:57 +0000 (23:00 +0300)]
drm/shmem-helper: Add missing vunmap on error

commit df4aaf015775221dde8a51ee09edb919981f091e upstream.

The vmapping of dma-buf may succeed, but DRM SHMEM rejects the IOMEM
mapping, and thus, drm_gem_shmem_vmap_locked() should unvmap the IOMEM
before erroring out.

Cc: stable@vger.kernel.org
Fixes: 49a3f51dfeee ("drm/gem: Use struct dma_buf_map in GEM vmap ops and convert GEM backends")
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220630200058.1883506-2-dmitry.osipenko@collabora.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodrm/gem: Properly annotate WW context on drm_gem_lock_reservations() error
Dmitry Osipenko [Thu, 30 Jun 2022 20:04:04 +0000 (23:04 +0300)]
drm/gem: Properly annotate WW context on drm_gem_lock_reservations() error

commit 2939deac1fa220bc82b89235f146df1d9b52e876 upstream.

Use ww_acquire_fini() in the error code paths. Otherwise lockdep
thinks that lock is held when lock's memory is freed after the
drm_gem_lock_reservations() error. The ww_acquire_context needs to be
annotated as "released", which fixes the noisy "WARNING: held lock freed!"
splat of VirtIO-GPU driver with CONFIG_DEBUG_MUTEXES=y and enabled lockdep.

Cc: stable@vger.kernel.org
Fixes: 7edc3e3b975b5 ("drm: Add helpers for locking an array of BO reservations.")
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20220630200405.1883897-2-dmitry.osipenko@collabora.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agortc: rx8025: fix 12/24 hour mode detection on RX-8035
Mathew McBride [Wed, 6 Jul 2022 07:42:36 +0000 (07:42 +0000)]
rtc: rx8025: fix 12/24 hour mode detection on RX-8035

commit 71af91565052214ad86f288e0d8ffb165f790995 upstream.

The 12/24hr flag in the RX-8035 can be found in the hour register,
instead of the CTRL1 on the RX-8025. This was overlooked when
support for the RX-8035 was added, and was causing read errors when
the hour register 'overflowed'.

To deal with the relevant register not always being visible in
the relevant functions, determine the 12/24 mode at startup and
store it in the driver state.

Signed-off-by: Mathew McBride <matt@traverse.com.au>
Fixes: f120e2e33ac8 ("rtc: rx8025: implement RX-8035 support")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://lore.kernel.org/r/20220706074236.24011-1-matt@traverse.com.au
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoRISC-V: Add modules to virtual kernel memory layout dump
Xianting Tian [Thu, 11 Aug 2022 07:41:48 +0000 (15:41 +0800)]
RISC-V: Add modules to virtual kernel memory layout dump

commit f9293ad46d8ba9909187a37b7215324420ad4596 upstream.

Modules always live before the kernel, MODULES_END is fixed but
MODULES_VADDR isn't fixed, it depends on the kernel size.
Let's add it to virtual kernel memory layout dump.

As MODULES is only defined for CONFIG_64BIT, so we dump it when
CONFIG_64BIT=y.

eg,
MODULES_VADDR - MODULES_END
0xffffffff01133000 - 0xffffffff80000000

Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220811074150.3020189-5-xianting.tian@linux.alibaba.com
Cc: stable@vger.kernel.org
Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoRISC-V: Fixup schedule out issue in machine_crash_shutdown()
Xianting Tian [Thu, 11 Aug 2022 07:41:47 +0000 (15:41 +0800)]
RISC-V: Fixup schedule out issue in machine_crash_shutdown()

commit ad943893d5f1d0aeea892bf7b781cf8062b36d58 upstream.

Current task of executing crash kexec will be schedule out when panic is
triggered by RCU Stall, as it needs to wait rcu completion. It lead to
inability to enter the crash system.

The implementation of machine_crash_shutdown() is non-standard for RISC-V
according to other Arch's implementation(eg, x86, arm64), we need to send
IPI to stop secondary harts.

[224521.877268] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[224521.883471] rcu:  0-...0: (3 GPs behind) idle=cfa/0/0x1 softirq=3968793/3968793 fqs=2495
[224521.891742]  (detected by 2, t=5255 jiffies, g=60855593, q=328)
[224521.897754] Task dump for CPU 0:
[224521.901074] task:swapper/0     state:R  running task   stack:  0 pid:  0 ppid:   0 flags:0x00000008
[224521.911090] Call Trace:
[224521.913638] [<ffffffe000c432de>] __schedule+0x208/0x5ea
[224521.918957] Kernel panic - not syncing: RCU Stall
[224521.923773] bad: scheduling from the idle thread!
[224521.928571] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G   O  5.10.113-yocto-standard #1
[224521.938658] Call Trace:
[224521.941200] [<ffffffe00020395c>] walk_stackframe+0x0/0xaa
[224521.946689] [<ffffffe000c34f8e>] show_stack+0x32/0x3e
[224521.951830] [<ffffffe000c39020>] dump_stack_lvl+0x7e/0xa2
[224521.957317] [<ffffffe000c39058>] dump_stack+0x14/0x1c
[224521.962459] [<ffffffe000243884>] dequeue_task_idle+0x2c/0x40
[224521.968207] [<ffffffe000c434f4>] __schedule+0x41e/0x5ea
[224521.973520] [<ffffffe000c43826>] schedule+0x34/0xe4
[224521.978487] [<ffffffe000c46cae>] schedule_timeout+0xc6/0x170
[224521.984234] [<ffffffe000c4491e>] wait_for_completion+0x98/0xf2
[224521.990157] [<ffffffe00026d9e2>] __wait_rcu_gp+0x148/0x14a
[224521.995733] [<ffffffe0002761c4>] synchronize_rcu+0x5c/0x66
[224522.001307] [<ffffffe00026f1a6>] rcu_sync_enter+0x54/0xe6
[224522.006795] [<ffffffe00025a436>] percpu_down_write+0x32/0x11c
[224522.012629] [<ffffffe000c4266a>] _cpu_down+0x92/0x21a
[224522.017771] [<ffffffe000219a0a>] smp_shutdown_nonboot_cpus+0x90/0x118
[224522.024299] [<ffffffe00020701e>] machine_crash_shutdown+0x30/0x4a
[224522.030483] [<ffffffe00029a3f8>] __crash_kexec+0x62/0xa6
[224522.035884] [<ffffffe000c3515e>] panic+0xfa/0x2b6
[224522.040678] [<ffffffe0002772be>] rcu_sched_clock_irq+0xc26/0xcb8
[224522.046774] [<ffffffe00027fc7a>] update_process_times+0x62/0x8a
[224522.052785] [<ffffffe00028d522>] tick_sched_timer+0x9e/0x102
[224522.058533] [<ffffffe000280c3a>] __hrtimer_run_queues+0x16a/0x318
[224522.064716] [<ffffffe0002812ec>] hrtimer_interrupt+0xd4/0x228
[224522.070551] [<ffffffe0009a69b6>] riscv_timer_interrupt+0x3c/0x48
[224522.076646] [<ffffffe000268f8c>] handle_percpu_devid_irq+0xb0/0x24c
[224522.083004] [<ffffffe00026428e>] __handle_domain_irq+0xa8/0x122
[224522.089014] [<ffffffe00062f954>] riscv_intc_irq+0x38/0x60
[224522.094501] [<ffffffe000201bd4>] ret_from_exception+0x0/0xc
[224522.100161] [<ffffffe000c42146>] rcu_eqs_enter.constprop.0+0x8c/0xb8

With the patch, it can enter crash system when RCU Stall occur.

Fixes: e53d28180d4d ("RISC-V: Add kdump support")
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220811074150.3020189-4-xianting.tian@linux.alibaba.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoRISC-V: Fixup get incorrect user mode PC for kernel mode regs
Xianting Tian [Thu, 11 Aug 2022 07:41:46 +0000 (15:41 +0800)]
RISC-V: Fixup get incorrect user mode PC for kernel mode regs

commit 59c026c359c30f116fef6ee958e24d04983efbb0 upstream.

When use 'echo c > /proc/sysrq-trigger' to trigger kdump, riscv_crash_save_regs()
will be called to save regs for vmcore, we found "epc" value 00ffffffa5537400
is not a valid kernel virtual address, but is a user virtual address. Other
regs(eg, ra, sp, gp...) are correct kernel virtual address.
Actually 0x00ffffffb0dd9400 is the user mode PC of 'PID: 113 Comm: sh', which
is saved in the task's stack.

[   21.201701] CPU: 0 PID: 113 Comm: sh Kdump: loaded Not tainted 5.18.9 #45
[   21.201979] Hardware name: riscv-virtio,qemu (DT)
[   21.202160] epc : 00ffffffa5537400 ra : ffffffff80088640 sp : ff20000010333b90
[   21.202435]  gp : ffffffff810dde38 tp : ff6000000226c200 t0 : ffffffff8032be7c
[   21.202707]  t1 : 0720072007200720 t2 : 30203a7375746174 s0 : ff20000010333cf0
[   21.202973]  s1 : 0000000000000000 a0 : ff20000010333b98 a1 : 0000000000000001
[   21.203243]  a2 : 0000000000000010 a3 : 0000000000000000 a4 : 28c8f0aeffea4e00
[   21.203519]  a5 : 28c8f0aeffea4e00 a6 : 0000000000000009 a7 : ffffffff8035c9b8
[   21.203794]  s2 : ffffffff810df0a8 s3 : ffffffff810df718 s4 : ff20000010333b98
[   21.204062]  s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80c4a468
[   21.204331]  s8 : 00ffffffef451410 s9 : 0000000000000007 s10: 00aaaaaac0510700
[   21.204606]  s11: 0000000000000001 t3 : ff60000001218f00 t4 : ff60000001218f00
[   21.204876]  t5 : ff60000001218000 t6 : ff200000103338b8
[   21.205079] status: 0000000200000020 badaddr: 0000000000000000 cause: 0000000000000008

With the incorrect PC, the backtrace showed by crash tool as below, the first
stack frame is abnormal,

crash> bt
PID: 113      TASK: ff60000002269600  CPU: 0    COMMAND: "sh"
 #0 [ff2000001039bb90] __efistub_.Ldebug_info0 at 00ffffffa5537400 <-- Abnormal
 #1 [ff2000001039bcf0] panic at ffffffff806578ba
 #2 [ff2000001039bd50] sysrq_reset_seq_param_set at ffffffff8038c030
 #3 [ff2000001039bda0] __handle_sysrq at ffffffff8038c5f8
 #4 [ff2000001039be00] write_sysrq_trigger at ffffffff8038cad8
 #5 [ff2000001039be20] proc_reg_write at ffffffff801b7edc
 #6 [ff2000001039be40] vfs_write at ffffffff80152ba6
 #7 [ff2000001039be80] ksys_write at ffffffff80152ece
 #8 [ff2000001039bed0] sys_write at ffffffff80152f46

With the patch, we can get current kernel mode PC, the output as below,

[   17.607658] CPU: 0 PID: 113 Comm: sh Kdump: loaded Not tainted 5.18.9 #42
[   17.607937] Hardware name: riscv-virtio,qemu (DT)
[   17.608150] epc : ffffffff800078f8 ra : ffffffff8008862c sp : ff20000010333b90
[   17.608441]  gp : ffffffff810dde38 tp : ff6000000226c200 t0 : ffffffff8032be68
[   17.608741]  t1 : 0720072007200720 t2 : 666666666666663c s0 : ff20000010333cf0
[   17.609025]  s1 : 0000000000000000 a0 : ff20000010333b98 a1 : 0000000000000001
[   17.609320]  a2 : 0000000000000010 a3 : 0000000000000000 a4 : 0000000000000000
[   17.609601]  a5 : ff60000001c78000 a6 : 000000000000003c a7 : ffffffff8035c9a4
[   17.609894]  s2 : ffffffff810df0a8 s3 : ffffffff810df718 s4 : ff20000010333b98
[   17.610186]  s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80c4a468
[   17.610469]  s8 : 00ffffffca281410 s9 : 0000000000000007 s10: 00aaaaaab5bb6700
[   17.610755]  s11: 0000000000000001 t3 : ff60000001218f00 t4 : ff60000001218f00
[   17.611041]  t5 : ff60000001218000 t6 : ff20000010333988
[   17.611255] status: 0000000200000020 badaddr: 0000000000000000 cause: 0000000000000008

With the correct PC, the backtrace showed by crash tool as below,

crash> bt
PID: 113      TASK: ff6000000226c200  CPU: 0    COMMAND: "sh"
 #0 [ff20000010333b90] riscv_crash_save_regs at ffffffff800078f8 <--- Normal
 #1 [ff20000010333cf0] panic at ffffffff806578c6
 #2 [ff20000010333d50] sysrq_reset_seq_param_set at ffffffff8038c03c
 #3 [ff20000010333da0] __handle_sysrq at ffffffff8038c604
 #4 [ff20000010333e00] write_sysrq_trigger at ffffffff8038cae4
 #5 [ff20000010333e20] proc_reg_write at ffffffff801b7ee8
 #6 [ff20000010333e40] vfs_write at ffffffff80152bb2
 #7 [ff20000010333e80] ksys_write at ffffffff80152eda
 #8 [ff20000010333ed0] sys_write at ffffffff80152f52

Fixes: e53d28180d4d ("RISC-V: Add kdump support")
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220811074150.3020189-3-xianting.tian@linux.alibaba.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoRISC-V: kexec: Fixup use of smp_processor_id() in preemptible context
Xianting Tian [Thu, 11 Aug 2022 07:41:45 +0000 (15:41 +0800)]
RISC-V: kexec: Fixup use of smp_processor_id() in preemptible context

commit 357628e68f5c08ad578a718dc62a0031e06dbe91 upstream.

Use __smp_processor_id() to avoid check the preemption context when
CONFIG_DEBUG_PREEMPT enabled, as we will enter crash kernel and no
return.

Without the patch,
[  103.781044] sysrq: Trigger a crash
[  103.784625] Kernel panic - not syncing: sysrq triggered crash
[  103.837634] CPU1: off
[  103.889668] CPU2: off
[  103.933479] CPU3: off
[  103.939424] Starting crashdump kernel...
[  103.943442] BUG: using smp_processor_id() in preemptible [00000000] code: sh/346
[  103.950884] caller is debug_smp_processor_id+0x1c/0x26
[  103.956051] CPU: 0 PID: 346 Comm: sh Kdump: loaded Not tainted 5.10.113-00002-gce03f03bf4ec-dirty #149
[  103.965355] Call Trace:
[  103.967805] [<ffffffe00020372a>] walk_stackframe+0x0/0xa2
[  103.973206] [<ffffffe000bcf1f4>] show_stack+0x32/0x3e
[  103.978258] [<ffffffe000bd382a>] dump_stack_lvl+0x72/0x8e
[  103.983655] [<ffffffe000bd385a>] dump_stack+0x14/0x1c
[  103.988705] [<ffffffe000bdc8fe>] check_preemption_disabled+0x9e/0xaa
[  103.995057] [<ffffffe000bdc926>] debug_smp_processor_id+0x1c/0x26
[  104.001150] [<ffffffe000206c64>] machine_kexec+0x22/0xd0
[  104.006463] [<ffffffe000291a7e>] __crash_kexec+0x6a/0xa4
[  104.011774] [<ffffffe000bcf3fa>] panic+0xfc/0x2b0
[  104.016480] [<ffffffe000656ca4>] sysrq_reset_seq_param_set+0x0/0x70
[  104.022745] [<ffffffe000657310>] __handle_sysrq+0x8c/0x154
[  104.028229] [<ffffffe0006577e8>] write_sysrq_trigger+0x5a/0x6a
[  104.034061] [<ffffffe0003d90e0>] proc_reg_write+0x58/0xd4
[  104.039459] [<ffffffe00036cff4>] vfs_write+0x7e/0x254
[  104.044509] [<ffffffe00036d2f6>] ksys_write+0x58/0xbe
[  104.049558] [<ffffffe00036d36a>] sys_write+0xe/0x16
[  104.054434] [<ffffffe000201b9a>] ret_from_syscall+0x0/0x2
[  104.067863] Will call new kernel at ecc00000 from hart id 0
[  104.074939] FDT image at fc5ee000
[  104.079523] Bye...

With the patch we can got clear output,
[   67.740553] sysrq: Trigger a crash
[   67.744166] Kernel panic - not syncing: sysrq triggered crash
[   67.809123] CPU1: off
[   67.865210] CPU2: off
[   67.909075] CPU3: off
[   67.919123] Starting crashdump kernel...
[   67.924900] Will call new kernel at ecc00000 from hart id 0
[   67.932045] FDT image at fc5ee000
[   67.935560] Bye...

Fixes: 0e105f1d0037 ("riscv: use hart id instead of cpu id on machine_kexec")
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220811074150.3020189-2-xianting.tian@linux.alibaba.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agodt-bindings: riscv: fix SiFive l2-cache's cache-sets
Conor Dooley [Wed, 3 Aug 2022 18:54:00 +0000 (19:54 +0100)]
dt-bindings: riscv: fix SiFive l2-cache's cache-sets

commit b60cf8e59e61133b6c9514ff8d8c8d7049d040ef upstream.

Fix device tree schema validation error messages for the SiFive
Unmatched: ' cache-sets:0:0: 1024 was expected'.

The existing bindings allow for just 1024 cache-sets but the fu740 on
Unmatched the has 2048 cache-sets. The ISA itself permits any arbitrary
power of two, however this is not supported by dt-schema. The RTL for
the IP, to which the number of cache-sets is a tunable parameter, has
been released publicly so speculatively adding a small number of
"reasonable" values seems unwise also.

Instead, as the binding only supports two distinct controllers: add 2048
and explicitly lock it to the fu740's l2 cache while limiting 1024 to
the l2 cache on the fu540.

Fixes: af951c3a113b ("dt-bindings: riscv: Update l2 cache DT documentation to add support for SiFive FU740")
Reported-by: Atul Khare <atulkhare@rivosinc.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220803185359.942928-1-mail@conchuod.ie
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoriscv:uprobe fix SR_SPIE set/clear handling
Yipeng Zou [Thu, 21 Jul 2022 06:58:20 +0000 (14:58 +0800)]
riscv:uprobe fix SR_SPIE set/clear handling

commit 3dbe5829408bc1586f75b4667ef60e5aab0209c7 upstream.

In riscv the process of uprobe going to clear spie before exec
the origin insn,and set spie after that.But When access the page
which origin insn has been placed a page fault may happen and
irq was disabled in arch_uprobe_pre_xol function,It cause a WARN
as follows.
There is no need to clear/set spie in arch_uprobe_pre/post/abort_xol.
We can just remove it.

[   31.684157] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1488
[   31.684677] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 76, name: work
[   31.684929] preempt_count: 0, expected: 0
[   31.685969] CPU: 2 PID: 76 Comm: work Tainted: G
[   31.686542] Hardware name: riscv-virtio,qemu (DT)
[   31.686797] Call Trace:
[   31.687053] [<ffffffff80006442>] dump_backtrace+0x30/0x38
[   31.687699] [<ffffffff80812118>] show_stack+0x40/0x4c
[   31.688141] [<ffffffff8081817a>] dump_stack_lvl+0x44/0x5c
[   31.688396] [<ffffffff808181aa>] dump_stack+0x18/0x20
[   31.688653] [<ffffffff8003e454>] __might_resched+0x114/0x122
[   31.688948] [<ffffffff8003e4b2>] __might_sleep+0x50/0x7a
[   31.689435] [<ffffffff80822676>] down_read+0x30/0x130
[   31.689728] [<ffffffff8000b650>] do_page_fault+0x166/x446
[   31.689997] [<ffffffff80003c0c>] ret_from_exception+0x0/0xc

Fixes: 74784081aac8 ("riscv: Add uprobes supported")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220721065820.245755-1-zouyipeng@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoparisc: io_pgetevents_time64() needs compat syscall in 32-bit compat mode
Helge Deller [Mon, 1 Aug 2022 15:36:15 +0000 (17:36 +0200)]
parisc: io_pgetevents_time64() needs compat syscall in 32-bit compat mode

commit 6431e92fc827bdd2d28f79150d90415ba9ce0d21 upstream.

For all syscalls in 32-bit compat mode on 64-bit kernels the upper
32-bits of the 64-bit registers are zeroed out, so a negative 32-bit
signed value will show up as positive 64-bit signed value.

This behaviour breaks the io_pgetevents_time64() syscall which expects
signed 64-bit values for the "min_nr" and "nr" parameters.
Fix this by switching to the compat_sys_io_pgetevents_time64() syscall,
which uses "compat_long_t" types for those parameters.

Cc: <stable@vger.kernel.org> # v5.1+
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoparisc: Check the return value of ioremap() in lba_driver_probe()
William Dean [Fri, 22 Jul 2022 02:57:09 +0000 (10:57 +0800)]
parisc: Check the return value of ioremap() in lba_driver_probe()

commit cf59f34d7f978d14d6520fd80a78a5ad5cb8abf8 upstream.

The function ioremap() in lba_driver_probe() can fail, so
its return value should be checked.

Fixes: 4bdc0d676a643 ("remove ioremap_nocache and devm_ioremap_nocache")
Reported-by: Hacash Robot <hacashRobot@santino.com>
Signed-off-by: William Dean <williamsukatube@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v5.6+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoparisc: Drop pa_swapper_pg_lock spinlock
Helge Deller [Tue, 19 Jul 2022 04:19:41 +0000 (06:19 +0200)]
parisc: Drop pa_swapper_pg_lock spinlock

commit 3fbc9a7de0564c55d8a9584c9cd2c9dfe6bd6d43 upstream.

This spinlock was dropped with commit b7795074a046 ("parisc: Optimize
per-pagetable spinlocks") in kernel v5.12.

Remove it to silence a sparse warning.

Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: kernel test robot <lkp@intel.com>
Cc: <stable@vger.kernel.org> # v5.12+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoparisc: Fix device names in /proc/iomem
Helge Deller [Mon, 18 Jul 2022 15:06:47 +0000 (17:06 +0200)]
parisc: Fix device names in /proc/iomem

commit cab56b51ec0e69128909cef4650e1907248d821b upstream.

Fix the output of /proc/iomem to show the real hardware device name
including the pa_pathname, e.g. "Merlin 160 Core Centronics [8:16:0]".
Up to now only the pa_pathname ("[8:16.0]") was shown.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.9+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agoovl: drop WARN_ON() dentry is NULL in ovl_encode_fh()
Jiachen Zhang [Thu, 28 Jul 2022 11:49:15 +0000 (19:49 +0800)]
ovl: drop WARN_ON() dentry is NULL in ovl_encode_fh()

commit dd524b7f317de8d31d638cbfdc7be4cf9b770e42 upstream.

Some code paths cannot guarantee the inode have any dentry alias. So
WARN_ON() all !dentry may flood the kernel logs.

For example, when an overlayfs inode is watched by inotifywait (1), and
someone is trying to read the /proc/$(pidof inotifywait)/fdinfo/INOTIFY_FD,
at that time if the dentry has been reclaimed by kernel (such as
echo 2 > /proc/sys/vm/drop_caches), there will be a WARN_ON(). The
printed call stack would be like:

    ? show_mark_fhandle+0xf0/0xf0
    show_mark_fhandle+0x4a/0xf0
    ? show_mark_fhandle+0xf0/0xf0
    ? seq_vprintf+0x30/0x50
    ? seq_printf+0x53/0x70
    ? show_mark_fhandle+0xf0/0xf0
    inotify_fdinfo+0x70/0x90
    show_fdinfo.isra.4+0x53/0x70
    seq_show+0x130/0x170
    seq_read+0x153/0x440
    vfs_read+0x94/0x150
    ksys_read+0x5f/0xe0
    do_syscall_64+0x59/0x1e0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

So let's drop WARN_ON() to avoid kernel log flooding.

Reported-by: Hongbo Yin <yinhongbo@bytedance.com>
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Signed-off-by: Tianci Zhang <zhangtianci.1997@bytedance.com>
Fixes: 8ed5eec9d6c4 ("ovl: encode pure upper file handles")
Cc: <stable@vger.kernel.org> # v4.16
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agocrypto: ccp - Use kzalloc for sev ioctl interfaces to prevent kernel memory leak
John Allen [Wed, 18 May 2022 15:31:26 +0000 (15:31 +0000)]
crypto: ccp - Use kzalloc for sev ioctl interfaces to prevent kernel memory leak

commit 13dc15a3f5fd7f884e4bfa8c011a0ae868df12ae upstream.

For some sev ioctl interfaces, input may be passed that is less than or
equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data that PSP
firmware returns. In this case, kmalloc will allocate memory that is the
size of the input rather than the size of the data. Since PSP firmware
doesn't fully overwrite the buffer, the sev ioctl interfaces with the
issue may return uninitialized slab memory.

Currently, all of the ioctl interfaces in the ccp driver are safe, but
to prevent future problems, change all ioctl interfaces that allocate
memory with kmalloc to use kzalloc and memset the data buffer to zero
in sev_ioctl_do_platform_status.

Fixes: 38103671aad3 ("crypto: ccp: Use the stack and common buffer for status commands")
Fixes: e799035609e15 ("crypto: ccp: Implement SEV_PEK_CSR ioctl command")
Fixes: 76a2b524a4b1d ("crypto: ccp: Implement SEV_PDH_CERT_EXPORT ioctl command")
Fixes: d6112ea0cb344 ("crypto: ccp - introduce SEV_GET_ID2 command")
Cc: stable@vger.kernel.org
Reported-by: Andy Nguyen <theflow@google.com>
Suggested-by: David Rientjes <rientjes@google.com>
Suggested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: John Allen <john.allen@amd.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agofix short copy handling in copy_mc_pipe_to_iter()
Al Viro [Sun, 12 Jun 2022 23:50:29 +0000 (19:50 -0400)]
fix short copy handling in copy_mc_pipe_to_iter()

commit c3497fd009ef2c59eea60d21c3ac22de3585ed7d upstream.

Unlike other copying operations on ITER_PIPE, copy_mc_to_iter() can
result in a short copy.  In that case we need to trim the unused
buffers, as well as the length of partially filled one - it's not
enough to set ->head, ->iov_offset and ->count to reflect how
much had we copied.  Not hard to fix, fortunately...

I'd put a helper (pipe_discard_from(pipe, head)) into pipe_fs_i.h,
rather than iov_iter.c - it has nothing to do with iov_iter and
having it will allow us to avoid an ugly kludge in fs/splice.c.
We could put it into lib/iov_iter.c for now and move it later,
but I don't see the point going that way...

Cc: stable@kernel.org # 4.19+
Fixes: ca146f6f091e "lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()"
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agousbnet: Fix linkwatch use-after-free on disconnect
Lukas Wunner [Thu, 23 Jun 2022 12:50:59 +0000 (14:50 +0200)]
usbnet: Fix linkwatch use-after-free on disconnect

commit a69e617e533edddf3fa3123149900f36e0a6dc74 upstream.

usbnet uses the work usbnet_deferred_kevent() to perform tasks which may
sleep.  On disconnect, completion of the work was originally awaited in
->ndo_stop().  But in 2003, that was moved to ->disconnect() by historic
commit "[PATCH] USB: usbnet, prevent exotic rtnl deadlock":

  https://git.kernel.org/tglx/history/c/0f138bbfd83c

The change was made because back then, the kernel's workqueue
implementation did not allow waiting for a single work.  One had to wait
for completion of *all* work by calling flush_scheduled_work(), and that
could deadlock when waiting for usbnet_deferred_kevent() with rtnl_mutex
held in ->ndo_stop().

The commit solved one problem but created another:  It causes a
use-after-free in USB Ethernet drivers aqc111.c, asix_devices.c,
ax88179_178a.c, ch9200.c and smsc75xx.c:

* If the drivers receive a link change interrupt immediately before
  disconnect, they raise EVENT_LINK_RESET in their (non-sleepable)
  ->status() callback and schedule usbnet_deferred_kevent().
* usbnet_deferred_kevent() invokes the driver's ->link_reset() callback,
  which calls netif_carrier_{on,off}().
* That in turn schedules the work linkwatch_event().

Because usbnet_deferred_kevent() is awaited after unregister_netdev(),
netif_carrier_{on,off}() may operate on an unregistered netdev and
linkwatch_event() may run after free_netdev(), causing a use-after-free.

In 2010, usbnet was changed to only wait for a single instance of
usbnet_deferred_kevent() instead of *all* work by commit 23f333a2bfaf
("drivers/net: don't use flush_scheduled_work()").

Unfortunately the commit neglected to move the wait back to
->ndo_stop().  Rectify that omission at long last.

Reported-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/netdev/CAG48ez0MHBbENX5gCdHAUXZ7h7s20LnepBF-pa5M=7Bi-jZrEA@mail.gmail.com/
Reported-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/netdev/20220315113841.GA22337@pengutronix.de/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org
Acked-by: Oliver Neukum <oneukum@suse.com>
Link: https://lore.kernel.org/r/d1c87ebe9fc502bffcd1576e238d685ad08321e4.1655987888.git.lukas@wunner.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agofbcon: Fix accelerated fbdev scrolling while logo is still shown
Helge Deller [Thu, 2 Jun 2022 20:08:38 +0000 (22:08 +0200)]
fbcon: Fix accelerated fbdev scrolling while logo is still shown

commit 3866cba87dcd0162fb41e9b3b653d0af68fad5ec upstream.

There is no need to directly skip over to the SCROLL_REDRAW case while
the logo is still shown.

When using DRM, this change has no effect because the code will reach
the SCROLL_REDRAW case immediately anyway.

But if you run an accelerated fbdev driver and have
FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION enabled, console scrolling is
slowed down by factors so that it feels as if you use a 9600 baud
terminal.

So, drop those unnecessary checks and speed up fbdev console
acceleration during bootup.

Cc: stable@vger.kernel.org # v5.10+
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Helge Deller <deller@gmx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YpkYxk7wsBPx3po+@p100
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agofbcon: Fix boundary checks for fbcon=vc:n1-n2 parameters
Helge Deller [Thu, 2 Jun 2022 20:06:28 +0000 (22:06 +0200)]
fbcon: Fix boundary checks for fbcon=vc:n1-n2 parameters

commit cad564ca557f8d3bb3b1fa965d9a2b3f6490ec69 upstream.

The user may use the fbcon=vc:<n1>-<n2> option to tell fbcon to take
over the given range (n1...n2) of consoles. The value for n1 and n2
needs to be a positive number and up to (MAX_NR_CONSOLES - 1).
The given values were not fully checked against those boundaries yet.

To fix the issue, convert first_fb_vc and last_fb_vc to unsigned
integers and check them against the upper boundary, and make sure that
first_fb_vc is smaller than last_fb_vc.

Cc: stable@vger.kernel.org # v4.19+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Helge Deller <deller@gmx.de>
Link: https://patchwork.freedesktop.org/patch/msgid/YpkYRMojilrtZIgM@p100
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agothermal: sysfs: Fix cooling_device_stats_setup() error code path
Rafael J. Wysocki [Fri, 29 Jul 2022 15:39:07 +0000 (17:39 +0200)]
thermal: sysfs: Fix cooling_device_stats_setup() error code path

commit d5a8aa5d7d80d21ab6b266f1bed4194b61746199 upstream.

If cooling_device_stats_setup() fails to create the stats object, it
must clear the last slot in cooling_device_attr_groups that was
initially empty (so as to make it possible to add stats attributes to
the cooling device attribute groups).

Failing to do so may cause the stats attributes to be created by
mistake for a device that doesn't have a stats object, because the
slot in question might be populated previously during the registration
of another cooling device.

Fixes: 8ea229511e06 ("thermal: Add cooling device's statistics in sysfs")
Reported-by: Di Shen <di.shen@unisoc.com>
Tested-by: Di Shen <di.shen@unisoc.com>
Cc: 4.17+ <stable@vger.kernel.org> # 4.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agofs: Add missing umask strip in vfs_tmpfile
Yang Xu [Thu, 14 Jul 2022 06:11:26 +0000 (14:11 +0800)]
fs: Add missing umask strip in vfs_tmpfile

commit ac6800e279a22b28f4fc21439843025a0d5bf03e upstream.

All creation paths except for O_TMPFILE handle umask in the vfs directly
if the filesystem doesn't support or enable POSIX ACLs. If the filesystem
does then umask handling is deferred until posix_acl_create().
Because, O_TMPFILE misses umask handling in the vfs it will not honor
umask settings. Fix this by adding the missing umask handling.

Link: https://lore.kernel.org/r/1657779088-2242-2-git-send-email-xuyang2018.jy@fujitsu.com
Fixes: 60545d0d4610 ("[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...")
Cc: <stable@vger.kernel.org> # 4.19+
Reported-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-and-Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agovfs: Check the truncate maximum size in inode_newsize_ok()
David Howells [Mon, 8 Aug 2022 08:52:35 +0000 (09:52 +0100)]
vfs: Check the truncate maximum size in inode_newsize_ok()

commit e2ebff9c57fe4eb104ce4768f6ebcccf76bef849 upstream.

If something manages to set the maximum file size to MAX_OFFSET+1, this
can cause the xfs and ext4 filesystems at least to become corrupt.

Ordinarily, the kernel protects against userspace trying this by
checking the value early in the truncate() and ftruncate() system calls
calls - but there are at least two places that this check is bypassed:

 (1) Cachefiles will round up the EOF of the backing file to DIO block
     size so as to allow DIO on the final block - but this might push
     the offset negative. It then calls notify_change(), but this
     inadvertently bypasses the checking. This can be triggered if
     someone puts an 8EiB-1 file on a server for someone else to try and
     access by, say, nfs.

 (2) ksmbd doesn't check the value it is given in set_end_of_file_info()
     and then calls vfs_truncate() directly - which also bypasses the
     check.

In both cases, it is potentially possible for a network filesystem to
cause a disk filesystem to be corrupted: cachefiles in the client's
cache filesystem; ksmbd in the server's filesystem.

nfsd is okay as it checks the value, but we can then remove this check
too.

Fix this by adding a check to inode_newsize_ok(), as called from
setattr_prepare(), thereby catching the issue as filesystems set up to
perform the truncate with minimal opportunity for bypassing the new
check.

Fixes: 1f08c925e7a3 ("cachefiles: Implement backing file wrangling")
Fixes: f44158485826 ("cifsd: add file operations")
Signed-off-by: David Howells <dhowells@redhat.com>
Reported-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Cc: stable@kernel.org
Acked-by: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Steve French <sfrench@samba.org>
cc: Hyunchul Lee <hyc.lee@gmail.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
23 months agotty: vt: initialize unicode screen buffer
Tetsuo Handa [Tue, 19 Jul 2022 05:49:39 +0000 (14:49 +0900)]
tty: vt: initialize unicode screen buffer

commit af77c56aa35325daa2bc2bed5c2ebf169be61b86 upstream.

syzbot reports kernel infoleak at vcs_read() [1], for buffer can be read
immediately after resize operation. Initialize buffer using kzalloc().

  ----------
  #include <fcntl.h>
  #include <unistd.h>
  #include <sys/ioctl.h>
  #include <linux/fb.h>

  int main(int argc, char *argv[])
  {
    struct fb_var_screeninfo var = { };
    const int fb_fd = open("/dev/fb0", 3);
    ioctl(fb_fd, FBIOGET_VSCREENINFO, &var);
    var.yres = 0x21;
    ioctl(fb_fd, FBIOPUT_VSCREENINFO, &var);
    return read(open("/dev/vcsu", O_RDONLY), &var, sizeof(var)) == -1;
  }
  ----------

Link: https://syzkaller.appspot.com/bug?extid=31a641689d43387f05d3
Cc: stable <stable@vger.kernel.org>
Reported-by: syzbot <syzbot+31a641689d43387f05d3@syzkaller.appspotmail.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/4ef053cf-e796-fb5e-58b7-3ae58242a4ad@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>