platform/kernel/linux-starfive.git
11 months agoice: Fix PF with enabled XDP going no-carrier after reset
Larysa Zaremba [Tue, 12 Dec 2023 09:29:01 +0000 (10:29 +0100)]
ice: Fix PF with enabled XDP going no-carrier after reset

[ Upstream commit f5728a418945ba53e2fdf38a6e5c5a2670965e85 ]

Commit 6624e780a577fc596788 ("ice: split ice_vsi_setup into smaller
functions") has refactored a bunch of code involved in PFR. In this
process, TC queue number adjustment for XDP was lost. Bring it back.

Lack of such adjustment causes interface to go into no-carrier after a
reset, if XDP program is attached, with the following message:

ice 0000:b1:00.0: Failed to set LAN Tx queue context, error: -22
ice 0000:b1:00.0 ens801f0np0: Failed to open VSI 0x0006 on switch 0x0001
ice 0000:b1:00.0: enable VSI failed, err -22, VSI index 0, type ICE_VSI_PF
ice 0000:b1:00.0: PF VSI rebuild failed: -22
ice 0000:b1:00.0: Rebuild failed, unload and reload driver

Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoice: alter feature support check for SRIOV and LAG
Dave Ertman [Mon, 11 Dec 2023 21:19:28 +0000 (13:19 -0800)]
ice: alter feature support check for SRIOV and LAG

[ Upstream commit 4d50fcdc2476eef94c14c6761073af5667bb43b6 ]

Previously, the ice driver had support for using a handler for bonding
netdev events to ensure that conflicting features were not allowed to be
activated at the same time.  While this was still in place, additional
support was added to specifically support SRIOV and LAG together.  These
both utilized the netdev event handler, but the SRIOV and LAG feature was
behind a capabilities feature check to make sure the current NVM has
support.

The exclusion part of the event handler should be removed since there are
users who have custom made solutions that depend on the non-exclusion of
features.

Wrap the creation/registration and cleanup of the event handler and
associated structs in the probe flow with a feature check so that the
only systems that support the full implementation of LAG features will
initialize support.  This will leave other systems unhindered with
functionality as it existed before any LAG code was added.

Fixes: bb52f42acef6 ("ice: Add driver support for firmware changes for LAG")
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoice: stop trashing VF VSI aggregator node ID information
Jacob Keller [Wed, 6 Dec 2023 20:19:05 +0000 (12:19 -0800)]
ice: stop trashing VF VSI aggregator node ID information

[ Upstream commit 7d881346121a97756f34e00e6296a5d63f001f7f ]

When creating new VSIs, they are assigned into an aggregator node in the
scheduler tree. Information about which aggregator node a VSI is assigned
into is maintained by the vsi->agg_node structure. In ice_vsi_decfg(), this
information is being destroyed, by overwriting the valid flag and the
agg_id field to zero.

For VF VSIs, this breaks the aggregator node configuration replay, which
depends on this information. This results in VFs being inserted into the
default aggregator node. The resulting configuration will have unexpected
Tx bandwidth sharing behavior.

This was broken by commit 6624e780a577 ("ice: split ice_vsi_setup into
smaller functions"), which added the block to reset the agg_node data.

The vsi->agg_node structure is not managed by the scheduler code, but is
instead a wrapper around an aggregator node ID that is tracked at the VSI
layer. Its been around for a long time, and its primary purpose was for
handling VFs. The SR-IOV VF reset flow does not make use of the standard VSI
rebuild/replay logic, and uses vsi->agg_node as part of its handling to
rebuild the aggregator node configuration.

The logic for aggregator nodes stretches  back to early ice driver code from
commit b126bd6bcd67 ("ice: create scheduler aggregator node config and move
VSIs")

The logic in ice_vsi_decfg() which trashes the ice_agg_node data is clearly
wrong. It destroys information that is necessary for handling VF reset,. It
is also not the correct way to actually remove a VSI from an aggregator
node. For that, we need to implement logic in the scheduler code. Further,
non-VF VSIs properly replay their aggregator configuration using existing
scheduler replay logic.

To fix the VF replay logic, remove this broken aggregator node cleanup
logic. This is the simplest way to immediately fix this.

This ensures that VFs will have proper aggregate configuration after a
reset. This is especially important since VFs often perform resets as part
of their reconfiguration flows. Without fixing this, VFs will be placed in
the default aggregator node and Tx bandwidth will not be shared in the
expected and configured manner.

Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet: phy: skip LED triggers on PHYs on SFP modules
Daniel Golle [Tue, 12 Dec 2023 00:05:35 +0000 (00:05 +0000)]
net: phy: skip LED triggers on PHYs on SFP modules

[ Upstream commit b1dfc0f76231bbf395c59d20a2070684620d5d0f ]

Calling led_trigger_register() when attaching a PHY located on an SFP
module potentially (and practically) leads into a deadlock.
Fix this by not calling led_trigger_register() for PHYs localted on SFP
modules as such modules actually never got any LEDs.

======================================================
WARNING: possible circular locking dependency detected
6.7.0-rc4-next-20231208+ #0 Tainted: G           O
------------------------------------------------------
kworker/u8:2/43 is trying to acquire lock:
ffffffc08108c4e8 (triggers_list_lock){++++}-{3:3}, at: led_trigger_register+0x4c/0x1a8

but task is already holding lock:
ffffff80c5c6f318 (&sfp->sm_mutex){+.+.}-{3:3}, at: cleanup_module+0x2ba8/0x3120 [sfp]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&sfp->sm_mutex){+.+.}-{3:3}:
       __mutex_lock+0x88/0x7a0
       mutex_lock_nested+0x20/0x28
       cleanup_module+0x2ae0/0x3120 [sfp]
       sfp_register_bus+0x5c/0x9c
       sfp_register_socket+0x48/0xd4
       cleanup_module+0x271c/0x3120 [sfp]
       platform_probe+0x64/0xb8
       really_probe+0x17c/0x3c0
       __driver_probe_device+0x78/0x164
       driver_probe_device+0x3c/0xd4
       __driver_attach+0xec/0x1f0
       bus_for_each_dev+0x60/0xa0
       driver_attach+0x20/0x28
       bus_add_driver+0x108/0x208
       driver_register+0x5c/0x118
       __platform_driver_register+0x24/0x2c
       init_module+0x28/0xa7c [sfp]
       do_one_initcall+0x70/0x2ec
       do_init_module+0x54/0x1e4
       load_module+0x1b78/0x1c8c
       __do_sys_init_module+0x1bc/0x2cc
       __arm64_sys_init_module+0x18/0x20
       invoke_syscall.constprop.0+0x4c/0xdc
       do_el0_svc+0x3c/0xbc
       el0_svc+0x34/0x80
       el0t_64_sync_handler+0xf8/0x124
       el0t_64_sync+0x150/0x154

-> #2 (rtnl_mutex){+.+.}-{3:3}:
       __mutex_lock+0x88/0x7a0
       mutex_lock_nested+0x20/0x28
       rtnl_lock+0x18/0x20
       set_device_name+0x30/0x130
       netdev_trig_activate+0x13c/0x1ac
       led_trigger_set+0x118/0x234
       led_trigger_write+0x104/0x17c
       sysfs_kf_bin_write+0x64/0x80
       kernfs_fop_write_iter+0x128/0x1b4
       vfs_write+0x178/0x2a4
       ksys_write+0x58/0xd4
       __arm64_sys_write+0x18/0x20
       invoke_syscall.constprop.0+0x4c/0xdc
       do_el0_svc+0x3c/0xbc
       el0_svc+0x34/0x80
       el0t_64_sync_handler+0xf8/0x124
       el0t_64_sync+0x150/0x154

-> #1 (&led_cdev->trigger_lock){++++}-{3:3}:
       down_write+0x4c/0x13c
       led_trigger_write+0xf8/0x17c
       sysfs_kf_bin_write+0x64/0x80
       kernfs_fop_write_iter+0x128/0x1b4
       vfs_write+0x178/0x2a4
       ksys_write+0x58/0xd4
       __arm64_sys_write+0x18/0x20
       invoke_syscall.constprop.0+0x4c/0xdc
       do_el0_svc+0x3c/0xbc
       el0_svc+0x34/0x80
       el0t_64_sync_handler+0xf8/0x124
       el0t_64_sync+0x150/0x154

-> #0 (triggers_list_lock){++++}-{3:3}:
       __lock_acquire+0x12a0/0x2014
       lock_acquire+0x100/0x2ac
       down_write+0x4c/0x13c
       led_trigger_register+0x4c/0x1a8
       phy_led_triggers_register+0x9c/0x214
       phy_attach_direct+0x154/0x36c
       phylink_attach_phy+0x30/0x60
       phylink_sfp_connect_phy+0x140/0x510
       sfp_add_phy+0x34/0x50
       init_module+0x15c/0xa7c [sfp]
       cleanup_module+0x1d94/0x3120 [sfp]
       cleanup_module+0x2bb4/0x3120 [sfp]
       process_one_work+0x1f8/0x4ec
       worker_thread+0x1e8/0x3d8
       kthread+0x104/0x110
       ret_from_fork+0x10/0x20

other info that might help us debug this:

Chain exists of:
  triggers_list_lock --> rtnl_mutex --> &sfp->sm_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&sfp->sm_mutex);
                               lock(rtnl_mutex);
                               lock(&sfp->sm_mutex);
  lock(triggers_list_lock);

 *** DEADLOCK ***

4 locks held by kworker/u8:2/43:
 #0: ffffff80c000f938 ((wq_completion)events_power_efficient){+.+.}-{0:0}, at: process_one_work+0x150/0x4ec
 #1: ffffffc08214bde8 ((work_completion)(&(&sfp->timeout)->work)){+.+.}-{0:0}, at: process_one_work+0x150/0x4ec
 #2: ffffffc0810902f8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x18/0x20
 #3: ffffff80c5c6f318 (&sfp->sm_mutex){+.+.}-{3:3}, at: cleanup_module+0x2ba8/0x3120 [sfp]

stack backtrace:
CPU: 0 PID: 43 Comm: kworker/u8:2 Tainted: G           O       6.7.0-rc4-next-20231208+ #0
Hardware name: Bananapi BPI-R4 (DT)
Workqueue: events_power_efficient cleanup_module [sfp]
Call trace:
 dump_backtrace+0xa8/0x10c
 show_stack+0x14/0x1c
 dump_stack_lvl+0x5c/0xa0
 dump_stack+0x14/0x1c
 print_circular_bug+0x328/0x430
 check_noncircular+0x124/0x134
 __lock_acquire+0x12a0/0x2014
 lock_acquire+0x100/0x2ac
 down_write+0x4c/0x13c
 led_trigger_register+0x4c/0x1a8
 phy_led_triggers_register+0x9c/0x214
 phy_attach_direct+0x154/0x36c
 phylink_attach_phy+0x30/0x60
 phylink_sfp_connect_phy+0x140/0x510
 sfp_add_phy+0x34/0x50
 init_module+0x15c/0xa7c [sfp]
 cleanup_module+0x1d94/0x3120 [sfp]
 cleanup_module+0x2bb4/0x3120 [sfp]
 process_one_work+0x1f8/0x4ec
 worker_thread+0x1e8/0x3d8
 kthread+0x104/0x110
 ret_from_fork+0x10/0x20

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Fixes: 01e5b728e9e4 ("net: phy: Add a binding for PHY LEDs")
Link: https://lore.kernel.org/r/102a9dce38bdf00215735d04cd4704458273ad9c.1702339354.git.daniel@makrotopia.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agobnxt_en: do not map packet buffers twice
Andy Gospodarek [Thu, 14 Dec 2023 21:31:38 +0000 (13:31 -0800)]
bnxt_en: do not map packet buffers twice

[ Upstream commit 23c93c3b6275a59f2a685f4a693944b53c31df4e ]

Remove double-mapping of DMA buffers as it can prevent page pool entries
from being freed.  Mapping is managed by page pool infrastructure and
was previously managed by the driver in __bnxt_alloc_rx_page before
allowing the page pool infrastructure to manage it.

Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: David Wei <dw@davidwei.uk>
Link: https://lore.kernel.org/r/20231214213138.98095-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoBluetooth: hci_core: Fix hci_conn_hash_lookup_cis
Luiz Augusto von Dentz [Fri, 8 Dec 2023 22:22:29 +0000 (17:22 -0500)]
Bluetooth: hci_core: Fix hci_conn_hash_lookup_cis

[ Upstream commit 50efc63d1a7a7b9a6ed21adae1b9a7123ec8abc0 ]

hci_conn_hash_lookup_cis shall always match the requested CIG and CIS
ids even when they are unset as otherwise it result in not being able
to bind/connect different sockets to the same address as that would
result in having multiple sockets mapping to the same hci_conn which
doesn't really work and prevents BAP audio configuration such as
AC 6(i) when CIG and CIS are left unset.

Fixes: c14516faede3 ("Bluetooth: hci_conn: Fix not matching by CIS ID")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoBluetooth: hci_event: shut up a false-positive warning
Arnd Bergmann [Wed, 22 Nov 2023 22:17:44 +0000 (23:17 +0100)]
Bluetooth: hci_event: shut up a false-positive warning

[ Upstream commit a5812c68d849505ea657f653446512b85887f813 ]

Turning on -Wstringop-overflow globally exposed a misleading compiler
warning in bluetooth:

net/bluetooth/hci_event.c: In function 'hci_cc_read_class_of_dev':
net/bluetooth/hci_event.c:524:9: error: 'memcpy' writing 3 bytes into a
region of size 0 overflows the destination [-Werror=stringop-overflow=]
  524 |         memcpy(hdev->dev_class, rp->dev_class, 3);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The problem here is the check for hdev being NULL in bt_dev_dbg() that
leads the compiler to conclude that hdev->dev_class might be an invalid
pointer access.

Add another explicit check for the same condition to make sure gcc sees
this cannot happen.

Fixes: a9de9248064b ("[Bluetooth] Switch from OGF+OCF to using only opcodes")
Fixes: 1b56c90018f0 ("Makefile: Enable -Wstringop-overflow globally")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoBluetooth: Fix deadlock in vhci_send_frame
Ying Hsu [Fri, 10 Nov 2023 01:46:05 +0000 (01:46 +0000)]
Bluetooth: Fix deadlock in vhci_send_frame

[ Upstream commit 769bf60e17ee1a56a81e7c031192c3928312c52e ]

syzbot found a potential circular dependency leading to a deadlock:
    -> #3 (&hdev->req_lock){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    hci_dev_do_close+0x3f/0x9f net/bluetooth/hci_core.c:551
    hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
    rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
    rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
    vfs_write+0x277/0xcf5 fs/read_write.c:594
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

    -> #2 (rfkill_global_mutex){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    rfkill_register+0x30/0x7e3 net/rfkill/core.c:1045
    hci_register_dev+0x48f/0x96d net/bluetooth/hci_core.c:2622
    __vhci_create_device drivers/bluetooth/hci_vhci.c:341 [inline]
    vhci_create_device+0x3ad/0x68f drivers/bluetooth/hci_vhci.c:374
    vhci_get_user drivers/bluetooth/hci_vhci.c:431 [inline]
    vhci_write+0x37b/0x429 drivers/bluetooth/hci_vhci.c:511
    call_write_iter include/linux/fs.h:2109 [inline]
    new_sync_write fs/read_write.c:509 [inline]
    vfs_write+0xaa8/0xcf5 fs/read_write.c:596
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

    -> #1 (&data->open_mutex){+.+.}-{3:3}:
    __mutex_lock_common+0x1b6/0x1bc2 kernel/locking/mutex.c:599
    __mutex_lock kernel/locking/mutex.c:732 [inline]
    mutex_lock_nested+0x17/0x1c kernel/locking/mutex.c:784
    vhci_send_frame+0x68/0x9c drivers/bluetooth/hci_vhci.c:75
    hci_send_frame+0x1cc/0x2ff net/bluetooth/hci_core.c:2989
    hci_sched_acl_pkt net/bluetooth/hci_core.c:3498 [inline]
    hci_sched_acl net/bluetooth/hci_core.c:3583 [inline]
    hci_tx_work+0xb94/0x1a60 net/bluetooth/hci_core.c:3654
    process_one_work+0x901/0xfb8 kernel/workqueue.c:2310
    worker_thread+0xa67/0x1003 kernel/workqueue.c:2457
    kthread+0x36a/0x430 kernel/kthread.c:319
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298

    -> #0 ((work_completion)(&hdev->tx_work)){+.+.}-{0:0}:
    check_prev_add kernel/locking/lockdep.c:3053 [inline]
    check_prevs_add kernel/locking/lockdep.c:3172 [inline]
    validate_chain kernel/locking/lockdep.c:3787 [inline]
    __lock_acquire+0x2d32/0x77fa kernel/locking/lockdep.c:5011
    lock_acquire+0x273/0x4d5 kernel/locking/lockdep.c:5622
    __flush_work+0xee/0x19f kernel/workqueue.c:3090
    hci_dev_close_sync+0x32f/0x1113 net/bluetooth/hci_sync.c:4352
    hci_dev_do_close+0x47/0x9f net/bluetooth/hci_core.c:553
    hci_rfkill_set_block+0x130/0x1ac net/bluetooth/hci_core.c:935
    rfkill_set_block+0x1e6/0x3b8 net/rfkill/core.c:345
    rfkill_fop_write+0x2d8/0x672 net/rfkill/core.c:1274
    vfs_write+0x277/0xcf5 fs/read_write.c:594
    ksys_write+0x19b/0x2bd fs/read_write.c:650
    do_syscall_x64 arch/x86/entry/common.c:55 [inline]
    do_syscall_64+0x51/0xba arch/x86/entry/common.c:93
    entry_SYSCALL_64_after_hwframe+0x61/0xcb

This change removes the need for acquiring the open_mutex in
vhci_send_frame, thus eliminating the potential deadlock while
maintaining the required packet ordering.

Fixes: 92d4abd66f70 ("Bluetooth: vhci: Fix race when opening vhci device")
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoBluetooth: Fix not notifying when connection encryption changes
Luiz Augusto von Dentz [Mon, 23 Oct 2023 23:26:23 +0000 (16:26 -0700)]
Bluetooth: Fix not notifying when connection encryption changes

[ Upstream commit f67eabffb57d0bee379994a18ec5f462b2cbdf86 ]

Some layers such as SMP depend on getting notified about encryption
changes immediately as they only allow certain PDU to be transmitted
over an encrypted link which may cause SMP implementation to reject
valid PDUs received thus causing pairing to fail when it shouldn't.

Fixes: 7aca0ac4792e ("Bluetooth: Wait for HCI_OP_WRITE_AUTH_PAYLOAD_TO to complete")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/rose: fix races in rose_kill_by_device()
Eric Dumazet [Thu, 14 Dec 2023 15:27:47 +0000 (15:27 +0000)]
net/rose: fix races in rose_kill_by_device()

[ Upstream commit 64b8bc7d5f1434c636a40bdcfcd42b278d1714be ]

syzbot found an interesting netdev refcounting issue in
net/rose/af_rose.c, thanks to CONFIG_NET_DEV_REFCNT_TRACKER=y [1]

Problem is that rose_kill_by_device() can change rose->device
while other threads do not expect the pointer to be changed.

We have to first collect sockets in a temporary array,
then perform the changes while holding the socket
lock and rose_list_lock spinlock (in this order)

Change rose_release() to also acquire rose_list_lock
before releasing the netdev refcount.

[1]

[ 1185.055088][ T7889] ref_tracker: reference already released.
[ 1185.061476][ T7889] ref_tracker: allocated in:
[ 1185.066081][ T7889]  rose_bind+0x4ab/0xd10
[ 1185.070446][ T7889]  __sys_bind+0x1ec/0x220
[ 1185.074818][ T7889]  __x64_sys_bind+0x72/0xb0
[ 1185.079356][ T7889]  do_syscall_64+0x40/0x110
[ 1185.083897][ T7889]  entry_SYSCALL_64_after_hwframe+0x63/0x6b
[ 1185.089835][ T7889] ref_tracker: freed in:
[ 1185.094088][ T7889]  rose_release+0x2f5/0x570
[ 1185.098629][ T7889]  __sock_release+0xae/0x260
[ 1185.103262][ T7889]  sock_close+0x1c/0x20
[ 1185.107453][ T7889]  __fput+0x270/0xbb0
[ 1185.111467][ T7889]  task_work_run+0x14d/0x240
[ 1185.116085][ T7889]  get_signal+0x106f/0x2790
[ 1185.120622][ T7889]  arch_do_signal_or_restart+0x90/0x7f0
[ 1185.126205][ T7889]  exit_to_user_mode_prepare+0x121/0x240
[ 1185.131846][ T7889]  syscall_exit_to_user_mode+0x1e/0x60
[ 1185.137293][ T7889]  do_syscall_64+0x4d/0x110
[ 1185.141783][ T7889]  entry_SYSCALL_64_after_hwframe+0x63/0x6b
[ 1185.148085][ T7889] ------------[ cut here ]------------

WARNING: CPU: 1 PID: 7889 at lib/ref_tracker.c:255 ref_tracker_free+0x61a/0x810 lib/ref_tracker.c:255
Modules linked in:
CPU: 1 PID: 7889 Comm: syz-executor.2 Not tainted 6.7.0-rc4-syzkaller-00162-g65c95f78917e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:ref_tracker_free+0x61a/0x810 lib/ref_tracker.c:255
Code: 00 44 8b 6b 18 31 ff 44 89 ee e8 21 62 f5 fc 45 85 ed 0f 85 a6 00 00 00 e8 a3 66 f5 fc 48 8b 34 24 48 89 ef e8 27 5f f1 05 90 <0f> 0b 90 bb ea ff ff ff e9 52 fd ff ff e8 84 66 f5 fc 4c 8d 6d 44
RSP: 0018:ffffc90004917850 EFLAGS: 00010202
RAX: 0000000000000201 RBX: ffff88802618f4c0 RCX: 0000000000000000
RDX: 0000000000000202 RSI: ffffffff8accb920 RDI: 0000000000000001
RBP: ffff8880269ea5b8 R08: 0000000000000001 R09: fffffbfff23e35f6
R10: ffffffff91f1afb7 R11: 0000000000000001 R12: 1ffff92000922f0c
R13: 0000000005a2039b R14: ffff88802618f4d8 R15: 00000000ffffffff
FS: 00007f0a720ef6c0(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f43a819d988 CR3: 0000000076c64000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
netdev_tracker_free include/linux/netdevice.h:4127 [inline]
netdev_put include/linux/netdevice.h:4144 [inline]
netdev_put include/linux/netdevice.h:4140 [inline]
rose_kill_by_device net/rose/af_rose.c:195 [inline]
rose_device_event+0x25d/0x330 net/rose/af_rose.c:218
notifier_call_chain+0xb6/0x3b0 kernel/notifier.c:93
call_netdevice_notifiers_info+0xbe/0x130 net/core/dev.c:1967
call_netdevice_notifiers_extack net/core/dev.c:2005 [inline]
call_netdevice_notifiers net/core/dev.c:2019 [inline]
__dev_notify_flags+0x1f5/0x2e0 net/core/dev.c:8646
dev_change_flags+0x122/0x170 net/core/dev.c:8682
dev_ifsioc+0x9ad/0x1090 net/core/dev_ioctl.c:529
dev_ioctl+0x224/0x1090 net/core/dev_ioctl.c:786
sock_do_ioctl+0x198/0x270 net/socket.c:1234
sock_ioctl+0x22e/0x6b0 net/socket.c:1339
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl fs/ioctl.c:857 [inline]
__x64_sys_ioctl+0x18f/0x210 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f0a7147cba9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f0a720ef0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f0a7159bf80 RCX: 00007f0a7147cba9
RDX: 0000000020000040 RSI: 0000000000008914 RDI: 0000000000000004
RBP: 00007f0a714c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f0a7159bf80 R15: 00007ffc8bb3a5f8
</TASK>

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Bernard Pidoux <f6bvp@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoethernet: atheros: fix a memleak in atl1e_setup_ring_resources
Zhipeng Lu [Thu, 14 Dec 2023 13:04:04 +0000 (21:04 +0800)]
ethernet: atheros: fix a memleak in atl1e_setup_ring_resources

[ Upstream commit 309fdb1c33fe726d92d0030481346f24e1b01f07 ]

In the error handling of 'offset > adapter->ring_size', the
tx_ring->tx_buffer allocated by kzalloc should be freed,
instead of 'goto failed' instantly.

Fixes: a6a5325239c2 ("atl1e: Atheros L1E Gigabit Ethernet driver")
Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn>
Reviewed-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet: sched: ife: fix potential use-after-free
Eric Dumazet [Thu, 14 Dec 2023 11:30:38 +0000 (11:30 +0000)]
net: sched: ife: fix potential use-after-free

[ Upstream commit 19391a2ca98baa7b80279306cdf7dd43f81fa595 ]

ife_decode() calls pskb_may_pull() two times, we need to reload
ifehdr after the second one, or risk use-after-free as reported
by syzbot:

BUG: KASAN: slab-use-after-free in __ife_tlv_meta_valid net/ife/ife.c:108 [inline]
BUG: KASAN: slab-use-after-free in ife_tlv_meta_decode+0x1d1/0x210 net/ife/ife.c:131
Read of size 2 at addr ffff88802d7300a4 by task syz-executor.5/22323

CPU: 0 PID: 22323 Comm: syz-executor.5 Not tainted 6.7.0-rc3-syzkaller-00804-g074ac38d5b95 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:364 [inline]
print_report+0xc4/0x620 mm/kasan/report.c:475
kasan_report+0xda/0x110 mm/kasan/report.c:588
__ife_tlv_meta_valid net/ife/ife.c:108 [inline]
ife_tlv_meta_decode+0x1d1/0x210 net/ife/ife.c:131
tcf_ife_decode net/sched/act_ife.c:739 [inline]
tcf_ife_act+0x4e3/0x1cd0 net/sched/act_ife.c:879
tc_act include/net/tc_wrapper.h:221 [inline]
tcf_action_exec+0x1ac/0x620 net/sched/act_api.c:1079
tcf_exts_exec include/net/pkt_cls.h:344 [inline]
mall_classify+0x201/0x310 net/sched/cls_matchall.c:42
tc_classify include/net/tc_wrapper.h:227 [inline]
__tcf_classify net/sched/cls_api.c:1703 [inline]
tcf_classify+0x82f/0x1260 net/sched/cls_api.c:1800
hfsc_classify net/sched/sch_hfsc.c:1147 [inline]
hfsc_enqueue+0x315/0x1060 net/sched/sch_hfsc.c:1546
dev_qdisc_enqueue+0x3f/0x230 net/core/dev.c:3739
__dev_xmit_skb net/core/dev.c:3828 [inline]
__dev_queue_xmit+0x1de1/0x3d30 net/core/dev.c:4311
dev_queue_xmit include/linux/netdevice.h:3165 [inline]
packet_xmit+0x237/0x350 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x24aa/0x5200 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fe9acc7cae9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe9ada450c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007fe9acd9bf80 RCX: 00007fe9acc7cae9
RDX: 000000000000fce0 RSI: 00000000200002c0 RDI: 0000000000000003
RBP: 00007fe9accc847a R08: 0000000020000140 R09: 0000000000000014
R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fe9acd9bf80 R15: 00007ffd5427ae78
</TASK>

Allocated by task 22323:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
____kasan_kmalloc mm/kasan/common.c:374 [inline]
__kasan_kmalloc+0xa2/0xb0 mm/kasan/common.c:383
kasan_kmalloc include/linux/kasan.h:198 [inline]
__do_kmalloc_node mm/slab_common.c:1007 [inline]
__kmalloc_node_track_caller+0x5a/0x90 mm/slab_common.c:1027
kmalloc_reserve+0xef/0x260 net/core/skbuff.c:582
__alloc_skb+0x12b/0x330 net/core/skbuff.c:651
alloc_skb include/linux/skbuff.h:1298 [inline]
alloc_skb_with_frags+0xe4/0x710 net/core/skbuff.c:6331
sock_alloc_send_pskb+0x7e4/0x970 net/core/sock.c:2780
packet_alloc_skb net/packet/af_packet.c:2930 [inline]
packet_snd net/packet/af_packet.c:3024 [inline]
packet_sendmsg+0x1e2a/0x5200 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b

Freed by task 22323:
kasan_save_stack+0x33/0x50 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:522
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free+0x15b/0x1b0 mm/kasan/common.c:200
kasan_slab_free include/linux/kasan.h:164 [inline]
slab_free_hook mm/slub.c:1800 [inline]
slab_free_freelist_hook+0x114/0x1e0 mm/slub.c:1826
slab_free mm/slub.c:3809 [inline]
__kmem_cache_free+0xc0/0x180 mm/slub.c:3822
skb_kfree_head net/core/skbuff.c:950 [inline]
skb_free_head+0x110/0x1b0 net/core/skbuff.c:962
pskb_expand_head+0x3c5/0x1170 net/core/skbuff.c:2130
__pskb_pull_tail+0xe1/0x1830 net/core/skbuff.c:2655
pskb_may_pull_reason include/linux/skbuff.h:2685 [inline]
pskb_may_pull include/linux/skbuff.h:2693 [inline]
ife_decode+0x394/0x4f0 net/ife/ife.c:82
tcf_ife_decode net/sched/act_ife.c:727 [inline]
tcf_ife_act+0x43b/0x1cd0 net/sched/act_ife.c:879
tc_act include/net/tc_wrapper.h:221 [inline]
tcf_action_exec+0x1ac/0x620 net/sched/act_api.c:1079
tcf_exts_exec include/net/pkt_cls.h:344 [inline]
mall_classify+0x201/0x310 net/sched/cls_matchall.c:42
tc_classify include/net/tc_wrapper.h:227 [inline]
__tcf_classify net/sched/cls_api.c:1703 [inline]
tcf_classify+0x82f/0x1260 net/sched/cls_api.c:1800
hfsc_classify net/sched/sch_hfsc.c:1147 [inline]
hfsc_enqueue+0x315/0x1060 net/sched/sch_hfsc.c:1546
dev_qdisc_enqueue+0x3f/0x230 net/core/dev.c:3739
__dev_xmit_skb net/core/dev.c:3828 [inline]
__dev_queue_xmit+0x1de1/0x3d30 net/core/dev.c:4311
dev_queue_xmit include/linux/netdevice.h:3165 [inline]
packet_xmit+0x237/0x350 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x24aa/0x5200 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
__do_sys_sendto net/socket.c:2202 [inline]
__se_sys_sendto net/socket.c:2198 [inline]
__x64_sys_sendto+0xe0/0x1b0 net/socket.c:2198
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b

The buggy address belongs to the object at ffff88802d730000
which belongs to the cache kmalloc-8k of size 8192
The buggy address is located 164 bytes inside of
freed 8192-byte region [ffff88802d730000ffff88802d732000)

The buggy address belongs to the physical page:
page:ffffea0000b5cc00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2d730
head:ffffea0000b5cc00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0xfff00000000840(slab|head|node=0|zone=1|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 00fff00000000840 ffff888013042280 dead000000000122 0000000000000000
raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0x1d20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL), pid 22323, tgid 22320 (syz-executor.5), ts 950317230369, free_ts 950233467461
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x2d0/0x350 mm/page_alloc.c:1544
prep_new_page mm/page_alloc.c:1551 [inline]
get_page_from_freelist+0xa28/0x3730 mm/page_alloc.c:3319
__alloc_pages+0x22e/0x2420 mm/page_alloc.c:4575
alloc_pages_mpol+0x258/0x5f0 mm/mempolicy.c:2133
alloc_slab_page mm/slub.c:1870 [inline]
allocate_slab mm/slub.c:2017 [inline]
new_slab+0x283/0x3c0 mm/slub.c:2070
___slab_alloc+0x979/0x1500 mm/slub.c:3223
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3322
__slab_alloc_node mm/slub.c:3375 [inline]
slab_alloc_node mm/slub.c:3468 [inline]
__kmem_cache_alloc_node+0x131/0x310 mm/slub.c:3517
__do_kmalloc_node mm/slab_common.c:1006 [inline]
__kmalloc_node_track_caller+0x4a/0x90 mm/slab_common.c:1027
kmalloc_reserve+0xef/0x260 net/core/skbuff.c:582
__alloc_skb+0x12b/0x330 net/core/skbuff.c:651
alloc_skb include/linux/skbuff.h:1298 [inline]
alloc_skb_with_frags+0xe4/0x710 net/core/skbuff.c:6331
sock_alloc_send_pskb+0x7e4/0x970 net/core/sock.c:2780
packet_alloc_skb net/packet/af_packet.c:2930 [inline]
packet_snd net/packet/af_packet.c:3024 [inline]
packet_sendmsg+0x1e2a/0x5200 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xd5/0x180 net/socket.c:745
__sys_sendto+0x255/0x340 net/socket.c:2190
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1144 [inline]
free_unref_page_prepare+0x53c/0xb80 mm/page_alloc.c:2354
free_unref_page+0x33/0x3b0 mm/page_alloc.c:2494
__unfreeze_partials+0x226/0x240 mm/slub.c:2655
qlink_free mm/kasan/quarantine.c:168 [inline]
qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x18e/0x1d0 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0x65/0x90 mm/kasan/common.c:305
kasan_slab_alloc include/linux/kasan.h:188 [inline]
slab_post_alloc_hook mm/slab.h:763 [inline]
slab_alloc_node mm/slub.c:3478 [inline]
slab_alloc mm/slub.c:3486 [inline]
__kmem_cache_alloc_lru mm/slub.c:3493 [inline]
kmem_cache_alloc_lru+0x219/0x6f0 mm/slub.c:3509
alloc_inode_sb include/linux/fs.h:2937 [inline]
ext4_alloc_inode+0x28/0x650 fs/ext4/super.c:1408
alloc_inode+0x5d/0x220 fs/inode.c:261
new_inode_pseudo fs/inode.c:1006 [inline]
new_inode+0x22/0x260 fs/inode.c:1032
__ext4_new_inode+0x333/0x5200 fs/ext4/ialloc.c:958
ext4_symlink+0x5d7/0xa20 fs/ext4/namei.c:3398
vfs_symlink fs/namei.c:4464 [inline]
vfs_symlink+0x3e5/0x620 fs/namei.c:4448
do_symlinkat+0x25f/0x310 fs/namei.c:4490
__do_sys_symlinkat fs/namei.c:4506 [inline]
__se_sys_symlinkat fs/namei.c:4503 [inline]
__x64_sys_symlinkat+0x97/0xc0 fs/namei.c:4503
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82

Fixes: d57493d6d1be ("net: sched: ife: check on metadata length")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Alexander Aring <aahringo@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet: Return error from sk_stream_wait_connect() if sk_wait_event() fails
Shigeru Yoshida [Thu, 14 Dec 2023 05:09:22 +0000 (14:09 +0900)]
net: Return error from sk_stream_wait_connect() if sk_wait_event() fails

[ Upstream commit cac23b7d7627915d967ce25436d7aae26e88ed06 ]

The following NULL pointer dereference issue occurred:

BUG: kernel NULL pointer dereference, address: 0000000000000000
<...>
RIP: 0010:ccid_hc_tx_send_packet net/dccp/ccid.h:166 [inline]
RIP: 0010:dccp_write_xmit+0x49/0x140 net/dccp/output.c:356
<...>
Call Trace:
 <TASK>
 dccp_sendmsg+0x642/0x7e0 net/dccp/proto.c:801
 inet_sendmsg+0x63/0x90 net/ipv4/af_inet.c:846
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x83/0xe0 net/socket.c:745
 ____sys_sendmsg+0x443/0x510 net/socket.c:2558
 ___sys_sendmsg+0xe5/0x150 net/socket.c:2612
 __sys_sendmsg+0xa6/0x120 net/socket.c:2641
 __do_sys_sendmsg net/socket.c:2650 [inline]
 __se_sys_sendmsg net/socket.c:2648 [inline]
 __x64_sys_sendmsg+0x45/0x50 net/socket.c:2648
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x43/0x110 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

sk_wait_event() returns an error (-EPIPE) if disconnect() is called on the
socket waiting for the event. However, sk_stream_wait_connect() returns
success, i.e. zero, even if sk_wait_event() returns -EPIPE, so a function
that waits for a connection with sk_stream_wait_connect() may misbehave.

In the case of the above DCCP issue, dccp_sendmsg() is waiting for the
connection. If disconnect() is called in concurrently, the above issue
occurs.

This patch fixes the issue by returning error from sk_stream_wait_connect()
if sk_wait_event() fails.

Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting")
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reported-by: syzbot+c71bc336c5061153b502@syzkaller.appspotmail.com
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoocteontx2-pf: Fix graceful exit during PFC configuration failure
Suman Ghosh [Wed, 13 Dec 2023 18:10:44 +0000 (23:40 +0530)]
octeontx2-pf: Fix graceful exit during PFC configuration failure

[ Upstream commit 8c97ab5448f2096daba11edf8d18a44e1eb6f31d ]

During PFC configuration failure the code was not handling a graceful
exit. This patch fixes the same and add proper code for a graceful exit.

Fixes: 99c969a83d82 ("octeontx2-pf: Add egress PFC support")
Signed-off-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet: mscc: ocelot: fix pMAC TX RMON stats for bucket 256-511 and above
Vladimir Oltean [Thu, 14 Dec 2023 00:09:02 +0000 (02:09 +0200)]
net: mscc: ocelot: fix pMAC TX RMON stats for bucket 256-511 and above

[ Upstream commit 70f010da00f90415296f93fb47a561977eae41cb ]

The typo from ocelot_port_rmon_stats_cb() was also carried over to
ocelot_port_pmac_rmon_stats_cb() as well, leading to incorrect TX RMON
stats for the pMAC too.

Fixes: ab3f97a9610a ("net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20231214000902.545625-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet: mscc: ocelot: fix eMAC TX RMON stats for bucket 256-511 and above
Vladimir Oltean [Thu, 14 Dec 2023 00:09:01 +0000 (02:09 +0200)]
net: mscc: ocelot: fix eMAC TX RMON stats for bucket 256-511 and above

[ Upstream commit 52eda4641d041667fa059f4855c5f88dcebd8afe ]

There is a typo in the driver due to which we report incorrect TX RMON
counters for the 256-511 octet bucket and all the other buckets larger
than that.

Bug found with the selftest at
https://patchwork.kernel.org/project/netdevbpf/patch/20231211223346.2497157-9-tobias@waldekranz.com/

Fixes: e32036e1ae7b ("net: mscc: ocelot: add support for all sorts of standardized counters present in DSA")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20231214000902.545625-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: Correct snprintf truncation handling for fw_version buffer used by represe...
Rahul Rameshbabu [Tue, 21 Nov 2023 23:00:22 +0000 (15:00 -0800)]
net/mlx5e: Correct snprintf truncation handling for fw_version buffer used by representors

[ Upstream commit b13559b76157de9d74f04d3ca0e49d69de3b5675 ]

snprintf returns the length of the formatted string, excluding the trailing
null, without accounting for truncation. This means that is the return
value is greater than or equal to the size parameter, the fw_version string
was truncated.

Link: https://docs.kernel.org/core-api/kernel-api.html#c.snprintf
Fixes: 1b2bd0c0264f ("net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: Correct snprintf truncation handling for fw_version buffer
Rahul Rameshbabu [Tue, 21 Nov 2023 23:00:21 +0000 (15:00 -0800)]
net/mlx5e: Correct snprintf truncation handling for fw_version buffer

[ Upstream commit ad436b9c1270c40554e274f067f1b78fcc06a004 ]

snprintf returns the length of the formatted string, excluding the trailing
null, without accounting for truncation. This means that is the return
value is greater than or equal to the size parameter, the fw_version string
was truncated.

Reported-by: David Laight <David.Laight@ACULAB.COM>
Closes: https://lore.kernel.org/netdev/81cae734ee1b4cde9b380a9a31006c1a@AcuMS.aculab.com/
Link: https://docs.kernel.org/core-api/kernel-api.html#c.snprintf
Fixes: 41e63c2baa11 ("net/mlx5e: Check return value of snprintf writing to fw_version buffer")
Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: Fix error codes in alloc_branch_attr()
Dan Carpenter [Wed, 13 Dec 2023 14:08:57 +0000 (17:08 +0300)]
net/mlx5e: Fix error codes in alloc_branch_attr()

[ Upstream commit d792e5f7f19b95f5ce41ac49df5ead4d280238f4 ]

Set the error code if set_branch_dest_ft() fails.

Fixes: ccbe33003b10 ("net/mlx5e: TC, Don't offload post action rule if not supported")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: Fix error code in mlx5e_tc_action_miss_mapping_get()
Dan Carpenter [Wed, 13 Dec 2023 14:08:17 +0000 (17:08 +0300)]
net/mlx5e: Fix error code in mlx5e_tc_action_miss_mapping_get()

[ Upstream commit 86d5922679f3b6d02a64df66cdd777fdd4ea5c0d ]

Preserve the error code if esw_add_restore_rule() fails.  Don't return
success.

Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5: Refactor mlx5_flow_destination->rep pointer to vport num
Vlad Buslov [Fri, 6 Oct 2023 13:22:22 +0000 (15:22 +0200)]
net/mlx5: Refactor mlx5_flow_destination->rep pointer to vport num

[ Upstream commit 04ad04e4fdd10f92ef4f2b3f6227ec9824682197 ]

Currently the destination rep pointer is only used for comparisons or to
obtain vport number from it. Since it is used both during flow creation and
deletion it may point to representor of another eswitch instance which can
be deallocated during driver unload even when there are rules pointing to
it[0]. Refactor the code to store vport number and 'valid' flag instead of
the representor pointer.

[0]:
[176805.886303] ==================================================================
[176805.889433] BUG: KASAN: slab-use-after-free in esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.892981] Read of size 2 at addr ffff888155090aa0 by task modprobe/27280

[176805.895462] CPU: 3 PID: 27280 Comm: modprobe Tainted: G    B              6.6.0-rc3+ #1
[176805.896771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[176805.898514] Call Trace:
[176805.899026]  <TASK>
[176805.899519]  dump_stack_lvl+0x33/0x50
[176805.900221]  print_report+0xc2/0x610
[176805.900893]  ? mlx5_chains_put_table+0x33d/0x8d0 [mlx5_core]
[176805.901897]  ? esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.902852]  kasan_report+0xac/0xe0
[176805.903509]  ? esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.904461]  esw_cleanup_dests+0x390/0x440 [mlx5_core]
[176805.905223]  __mlx5_eswitch_del_rule+0x1ae/0x460 [mlx5_core]
[176805.906044]  ? esw_cleanup_dests+0x440/0x440 [mlx5_core]
[176805.906822]  ? xas_find_conflict+0x420/0x420
[176805.907496]  ? down_read+0x11e/0x200
[176805.908046]  mlx5e_tc_rule_unoffload+0xc4/0x2a0 [mlx5_core]
[176805.908844]  mlx5e_tc_del_fdb_flow+0x7da/0xb10 [mlx5_core]
[176805.909597]  mlx5e_flow_put+0x4b/0x80 [mlx5_core]
[176805.910275]  mlx5e_delete_flower+0x5b4/0xb70 [mlx5_core]
[176805.911010]  tc_setup_cb_reoffload+0x27/0xb0
[176805.911648]  fl_reoffload+0x62d/0x900 [cls_flower]
[176805.912313]  ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core]
[176805.913151]  ? __fl_put+0x230/0x230 [cls_flower]
[176805.913768]  ? filter_irq_stacks+0x90/0x90
[176805.914335]  ? kasan_save_stack+0x1e/0x40
[176805.914893]  ? kasan_set_track+0x21/0x30
[176805.915484]  ? kasan_save_free_info+0x27/0x40
[176805.916105]  tcf_block_playback_offloads+0x79/0x1f0
[176805.916773]  ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core]
[176805.917647]  tcf_block_unbind+0x12d/0x330
[176805.918239]  tcf_block_offload_cmd.isra.0+0x24e/0x320
[176805.918953]  ? tcf_block_bind+0x770/0x770
[176805.919551]  ? _raw_read_unlock_irqrestore+0x30/0x30
[176805.920236]  ? mutex_lock+0x7d/0xd0
[176805.920735]  ? mutex_unlock+0x80/0xd0
[176805.921255]  tcf_block_offload_unbind+0xa5/0x120
[176805.921909]  __tcf_block_put+0xc2/0x2d0
[176805.922467]  ingress_destroy+0xf4/0x3d0 [sch_ingress]
[176805.923178]  __qdisc_destroy+0x9d/0x280
[176805.923741]  dev_shutdown+0x1c6/0x330
[176805.924295]  unregister_netdevice_many_notify+0x6ef/0x1500
[176805.925034]  ? netdev_freemem+0x50/0x50
[176805.925610]  ? _raw_spin_lock_irq+0x7b/0xd0
[176805.926235]  ? _raw_spin_lock_bh+0xe0/0xe0
[176805.926849]  unregister_netdevice_queue+0x1e0/0x280
[176805.927592]  ? unregister_netdevice_many+0x10/0x10
[176805.928275]  unregister_netdev+0x18/0x20
[176805.928835]  mlx5e_vport_rep_unload+0xc0/0x200 [mlx5_core]
[176805.929608]  mlx5_esw_offloads_unload_rep+0x9d/0xc0 [mlx5_core]
[176805.930492]  mlx5_eswitch_unload_vf_vports+0x108/0x1a0 [mlx5_core]
[176805.931422]  ? mlx5_eswitch_unload_sf_vport+0x50/0x50 [mlx5_core]
[176805.932304]  ? rwsem_down_write_slowpath+0x11f0/0x11f0
[176805.932987]  mlx5_eswitch_disable_sriov+0x6f9/0xa60 [mlx5_core]
[176805.933807]  ? mlx5_core_disable_hca+0xe1/0x130 [mlx5_core]
[176805.934576]  ? mlx5_eswitch_disable_locked+0x580/0x580 [mlx5_core]
[176805.935463]  mlx5_device_disable_sriov+0x138/0x490 [mlx5_core]
[176805.936308]  mlx5_sriov_disable+0x8c/0xb0 [mlx5_core]
[176805.937063]  remove_one+0x7f/0x210 [mlx5_core]
[176805.937711]  pci_device_remove+0x96/0x1c0
[176805.938289]  device_release_driver_internal+0x361/0x520
[176805.938981]  ? kobject_put+0x5c/0x330
[176805.939553]  driver_detach+0xd7/0x1d0
[176805.940101]  bus_remove_driver+0x11f/0x290
[176805.943847]  pci_unregister_driver+0x23/0x1f0
[176805.944505]  mlx5_cleanup+0xc/0x20 [mlx5_core]
[176805.945189]  __x64_sys_delete_module+0x2b3/0x450
[176805.945837]  ? module_flags+0x300/0x300
[176805.946377]  ? dput+0xc2/0x830
[176805.946848]  ? __kasan_record_aux_stack+0x9c/0xb0
[176805.947555]  ? __call_rcu_common.constprop.0+0x46c/0xb50
[176805.948338]  ? fpregs_assert_state_consistent+0x1d/0xa0
[176805.949055]  ? exit_to_user_mode_prepare+0x30/0x120
[176805.949713]  do_syscall_64+0x3d/0x90
[176805.950226]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[176805.950904] RIP: 0033:0x7f7f42c3f5ab
[176805.951462] Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48
[176805.953710] RSP: 002b:00007fff07dc9d08 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[176805.954691] RAX: ffffffffffffffda RBX: 000055b6e91c01e0 RCX: 00007f7f42c3f5ab
[176805.955691] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055b6e91c0248
[176805.956662] RBP: 000055b6e91c01e0 R08: 0000000000000000 R09: 0000000000000000
[176805.957601] R10: 00007f7f42d9eac0 R11: 0000000000000206 R12: 000055b6e91c0248
[176805.958593] R13: 0000000000000000 R14: 000055b6e91bfb38 R15: 0000000000000000
[176805.959599]  </TASK>

[176805.960324] Allocated by task 20490:
[176805.960893]  kasan_save_stack+0x1e/0x40
[176805.961463]  kasan_set_track+0x21/0x30
[176805.962019]  __kasan_kmalloc+0x77/0x90
[176805.962554]  esw_offloads_init+0x1bb/0x480 [mlx5_core]
[176805.963318]  mlx5_eswitch_init+0xc70/0x15c0 [mlx5_core]
[176805.964092]  mlx5_init_one_devl_locked+0x366/0x1230 [mlx5_core]
[176805.964902]  probe_one+0x6f7/0xc90 [mlx5_core]
[176805.965541]  local_pci_probe+0xd7/0x180
[176805.966075]  pci_device_probe+0x231/0x6f0
[176805.966631]  really_probe+0x1d4/0xb50
[176805.967179]  __driver_probe_device+0x18d/0x450
[176805.967810]  driver_probe_device+0x49/0x120
[176805.968431]  __driver_attach+0x1fb/0x490
[176805.968976]  bus_for_each_dev+0xed/0x170
[176805.969560]  bus_add_driver+0x21a/0x570
[176805.970124]  driver_register+0x133/0x460
[176805.970684]  0xffffffffa0678065
[176805.971180]  do_one_initcall+0x92/0x2b0
[176805.971744]  do_init_module+0x22d/0x720
[176805.972318]  load_module+0x58c3/0x63b0
[176805.972847]  init_module_from_file+0xd2/0x130
[176805.973441]  __x64_sys_finit_module+0x389/0x7c0
[176805.974045]  do_syscall_64+0x3d/0x90
[176805.974556]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

[176805.975566] Freed by task 27280:
[176805.976077]  kasan_save_stack+0x1e/0x40
[176805.976655]  kasan_set_track+0x21/0x30
[176805.977221]  kasan_save_free_info+0x27/0x40
[176805.977834]  ____kasan_slab_free+0x11a/0x1b0
[176805.978505]  __kmem_cache_free+0x163/0x2d0
[176805.979113]  esw_offloads_cleanup_reps+0xb8/0x120 [mlx5_core]
[176805.979963]  mlx5_eswitch_cleanup+0x182/0x270 [mlx5_core]
[176805.980763]  mlx5_cleanup_once+0x9a/0x1e0 [mlx5_core]
[176805.981477]  mlx5_uninit_one+0xa9/0x180 [mlx5_core]
[176805.982196]  remove_one+0x8f/0x210 [mlx5_core]
[176805.982868]  pci_device_remove+0x96/0x1c0
[176805.983461]  device_release_driver_internal+0x361/0x520
[176805.984169]  driver_detach+0xd7/0x1d0
[176805.984702]  bus_remove_driver+0x11f/0x290
[176805.985261]  pci_unregister_driver+0x23/0x1f0
[176805.985847]  mlx5_cleanup+0xc/0x20 [mlx5_core]
[176805.986483]  __x64_sys_delete_module+0x2b3/0x450
[176805.987126]  do_syscall_64+0x3d/0x90
[176805.987665]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

[176805.988667] Last potentially related work creation:
[176805.989305]  kasan_save_stack+0x1e/0x40
[176805.989839]  __kasan_record_aux_stack+0x9c/0xb0
[176805.990443]  kvfree_call_rcu+0x84/0xa30
[176805.990973]  clean_xps_maps+0x265/0x6e0
[176805.991547]  netif_reset_xps_queues.part.0+0x3f/0x80
[176805.992226]  unregister_netdevice_many_notify+0xfcf/0x1500
[176805.992966]  unregister_netdevice_queue+0x1e0/0x280
[176805.993638]  unregister_netdev+0x18/0x20
[176805.994205]  mlx5e_remove+0xba/0x1e0 [mlx5_core]
[176805.994872]  auxiliary_bus_remove+0x52/0x70
[176805.995490]  device_release_driver_internal+0x361/0x520
[176805.996196]  bus_remove_device+0x1e1/0x3d0
[176805.996767]  device_del+0x390/0x980
[176805.997270]  mlx5_rescan_drivers_locked.part.0+0x130/0x540 [mlx5_core]
[176805.998195]  mlx5_unregister_device+0x77/0xc0 [mlx5_core]
[176805.998989]  mlx5_uninit_one+0x41/0x180 [mlx5_core]
[176805.999719]  remove_one+0x8f/0x210 [mlx5_core]
[176806.000387]  pci_device_remove+0x96/0x1c0
[176806.000938]  device_release_driver_internal+0x361/0x520
[176806.001612]  unbind_store+0xd8/0xf0
[176806.002108]  kernfs_fop_write_iter+0x2c0/0x440
[176806.002748]  vfs_write+0x725/0xba0
[176806.003294]  ksys_write+0xed/0x1c0
[176806.003823]  do_syscall_64+0x3d/0x90
[176806.004357]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

[176806.005317] The buggy address belongs to the object at ffff888155090a80
                 which belongs to the cache kmalloc-64 of size 64
[176806.006774] The buggy address is located 32 bytes inside of
                 freed 64-byte region [ffff888155090a80ffff888155090ac0)

[176806.008773] The buggy address belongs to the physical page:
[176806.009480] page:00000000a407e0e6 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x155090
[176806.010633] flags: 0x200000000000800(slab|node=0|zone=2)
[176806.011352] page_type: 0xffffffff()
[176806.011905] raw: 0200000000000800 ffff888100042640 ffffea000422b1c0 dead000000000004
[176806.012949] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
[176806.013933] page dumped because: kasan: bad access detected

[176806.014935] Memory state around the buggy address:
[176806.015601]  ffff888155090980: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.016568]  ffff888155090a00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.017497] >ffff888155090a80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.018438]                                ^
[176806.019007]  ffff888155090b00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.020001]  ffff888155090b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[176806.020996] ==================================================================

Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5: Fix fw tracer first block check
Moshe Shemesh [Thu, 30 Nov 2023 09:30:34 +0000 (11:30 +0200)]
net/mlx5: Fix fw tracer first block check

[ Upstream commit 4261edf11cb7c9224af713a102e5616329306932 ]

While handling new traces, to verify it is not the first block being
written, last_timestamp is checked. But instead of checking it is non
zero it is verified to be zero. Fix to verify last_timestamp is not
zero.

Fixes: c71ad41ccb0c ("net/mlx5: FW tracer, events handling")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Feras Daoud <ferasda@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: XDP, Drop fragmented packets larger than MTU size
Carolina Jubran [Thu, 23 Nov 2023 14:11:20 +0000 (16:11 +0200)]
net/mlx5e: XDP, Drop fragmented packets larger than MTU size

[ Upstream commit bcaf109f794744c14da0e9123b31d1f4571b0a35 ]

XDP transmits fragmented packets that are larger than MTU size instead of
dropping those packets. The drop check that checks whether a packet is larger
than MTU is comparing MTU size against the linear part length only.

Adjust the drop check to compare MTU size against both linear and non-linear
part lengths to avoid transmitting fragmented packets larger than MTU size.

Fixes: 39a1665d16a2 ("net/mlx5e: Implement sending multi buffer XDP frames")
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: Decrease num_block_tc when unblock tc offload
Chris Mi [Wed, 29 Nov 2023 02:53:32 +0000 (04:53 +0200)]
net/mlx5e: Decrease num_block_tc when unblock tc offload

[ Upstream commit be86106fd74a145f24c56c9bc18d658e8fe6d4f4 ]

The cited commit increases num_block_tc when unblock tc offload.
Actually should decrease it.

Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: Fix overrun reported by coverity
Jianbo Liu [Tue, 14 Nov 2023 01:25:21 +0000 (01:25 +0000)]
net/mlx5e: Fix overrun reported by coverity

[ Upstream commit da75fa542873e5f7d7f615566c0b00042d8a0437 ]

Coverity Scan reports the following issue. But it's impossible that
mlx5_get_dev_index returns 7 for PF, even if the index is calculated
from PCI FUNC ID. So add the checking to make coverity slience.

CID 610894 (#2 of 2): Out-of-bounds write (OVERRUN)
Overrunning array esw->fdb_table.offloads.peer_miss_rules of 4 8-byte
elements at element index 7 (byte offset 63) using index
mlx5_get_dev_index(peer_dev) (which evaluates to 7).

Fixes: 9bee385a6e39 ("net/mlx5: E-switch, refactor FDB miss rule add/remove")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: fix a potential double-free in fs_udp_create_groups
Dinghao Liu [Tue, 28 Nov 2023 09:40:53 +0000 (17:40 +0800)]
net/mlx5e: fix a potential double-free in fs_udp_create_groups

[ Upstream commit e75efc6466ae289e599fb12a5a86545dff245c65 ]

When kcalloc() for ft->g succeeds but kvzalloc() for in fails,
fs_udp_create_groups() will free ft->g. However, its caller
fs_udp_create_table() will free ft->g again through calling
mlx5e_destroy_flow_table(), which will lead to a double-free.
Fix this by setting ft->g to NULL in fs_udp_create_groups().

Fixes: 1c80bd684388 ("net/mlx5e: Introduce Flow Steering UDP API")
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: Fix a race in command alloc flow
Shifeng Li [Sat, 2 Dec 2023 08:01:26 +0000 (00:01 -0800)]
net/mlx5e: Fix a race in command alloc flow

[ Upstream commit 8f5100da56b3980276234e812ce98d8f075194cd ]

Fix a cmd->ent use after free due to a race on command entry.
Such race occurs when one of the commands releases its last refcount and
frees its index and entry while another process running command flush
flow takes refcount to this command entry. The process which handles
commands flush may see this command as needed to be flushed if the other
process allocated a ent->idx but didn't set ent to cmd->ent_arr in
cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
the spin lock.

[70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
[70013.081968]
[70013.082028] Workqueue: events aer_isr
[70013.082053] Call Trace:
[70013.082067]  dump_stack+0x8b/0xbb
[70013.082086]  print_address_description+0x6a/0x270
[70013.082102]  kasan_report+0x179/0x2c0
[70013.082173]  mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.082267]  mlx5_cmd_flush+0x80/0x180 [mlx5_core]
[70013.082304]  mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
[70013.082338]  mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
[70013.082377]  remove_one+0x200/0x2b0 [mlx5_core]
[70013.082409]  pci_device_remove+0xf3/0x280
[70013.082439]  device_release_driver_internal+0x1c3/0x470
[70013.082453]  pci_stop_bus_device+0x109/0x160
[70013.082468]  pci_stop_and_remove_bus_device+0xe/0x20
[70013.082485]  pcie_do_fatal_recovery+0x167/0x550
[70013.082493]  aer_isr+0x7d2/0x960
[70013.082543]  process_one_work+0x65f/0x12d0
[70013.082556]  worker_thread+0x87/0xb50
[70013.082571]  kthread+0x2e9/0x3a0
[70013.082592]  ret_from_fork+0x1f/0x40

The logical relationship of this error is as follows:

             aer_recover_work              |          ent->work
-------------------------------------------+------------------------------
aer_recover_work_func                      |
|- pcie_do_recovery                        |
  |- report_error_detected                 |
    |- mlx5_pci_err_detected               |cmd_work_handler
      |- mlx5_enter_error_state            |  |- cmd_alloc_index
        |- enter_error_state               |    |- lock cmd->alloc_lock
          |- mlx5_cmd_flush                |    |- clear_bit
            |- mlx5_cmd_trigger_completions|    |- unlock cmd->alloc_lock
              |- lock cmd->alloc_lock      |
              |- vector = ~dev->cmd.vars.bitmask
              |- for_each_set_bit          |
                |- cmd_ent_get(cmd->ent_arr[i]) (UAF)
              |- unlock cmd->alloc_lock    |  |- cmd->ent_arr[ent->idx]=ent

The cmd->ent_arr[ent->idx] assignment and the bit clearing are not
protected by the cmd->alloc_lock in cmd_work_handler().

Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agonet/mlx5e: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()
Shifeng Li [Thu, 30 Nov 2023 09:46:56 +0000 (01:46 -0800)]
net/mlx5e: Fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()

[ Upstream commit ddb38ddff9c71026bad481b791a94d446ee37603 ]

Out_sz that the size of out buffer is calculated using query_nic_vport
_context_in structure when driver query the MAC list. However query_nic
_vport_context_in structure is smaller than query_nic_vport_context_out.
When allowed_list_size is greater than 96, calling ether_addr_copy() will
trigger an slab-out-of-bounds.

[ 1170.055866] BUG: KASAN: slab-out-of-bounds in mlx5_query_nic_vport_mac_list+0x481/0x4d0 [mlx5_core]
[ 1170.055869] Read of size 4 at addr ffff88bdbc57d912 by task kworker/u128:1/461
[ 1170.055870]
[ 1170.055932] Workqueue: mlx5_esw_wq esw_vport_change_handler [mlx5_core]
[ 1170.055936] Call Trace:
[ 1170.055949]  dump_stack+0x8b/0xbb
[ 1170.055958]  print_address_description+0x6a/0x270
[ 1170.055961]  kasan_report+0x179/0x2c0
[ 1170.056061]  mlx5_query_nic_vport_mac_list+0x481/0x4d0 [mlx5_core]
[ 1170.056162]  esw_update_vport_addr_list+0x2c5/0xcd0 [mlx5_core]
[ 1170.056257]  esw_vport_change_handle_locked+0xd08/0x1a20 [mlx5_core]
[ 1170.056377]  esw_vport_change_handler+0x6b/0x90 [mlx5_core]
[ 1170.056381]  process_one_work+0x65f/0x12d0
[ 1170.056383]  worker_thread+0x87/0xb50
[ 1170.056390]  kthread+0x2e9/0x3a0
[ 1170.056394]  ret_from_fork+0x1f/0x40

Fixes: e16aea2744ab ("net/mlx5: Introduce access functions to modify/query vport mac lists")
Cc: Ding Hui <dinghui@sangfor.com.cn>
Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoRevert "net/mlx5e: fix double free of encap_header"
Vlad Buslov [Tue, 21 Nov 2023 12:52:28 +0000 (13:52 +0100)]
Revert "net/mlx5e: fix double free of encap_header"

[ Upstream commit 5d089684dc434a31e08d32f0530066d0025c52e4 ]

This reverts commit 6f9b1a0731662648949a1c0587f6acb3b7f8acf1.

This patch is causing a null ptr issue, the proper fix is in the next
patch.

Fixes: 6f9b1a073166 ("net/mlx5e: fix double free of encap_header")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoRevert "net/mlx5e: fix double free of encap_header in update funcs"
Vlad Buslov [Tue, 21 Nov 2023 12:51:52 +0000 (13:51 +0100)]
Revert "net/mlx5e: fix double free of encap_header in update funcs"

[ Upstream commit 66ca8d4deca09bce3fc7bcf8ea7997fa1a51c33c ]

This reverts commit 3a4aa3cb83563df942be49d145ee3b7ddf17d6bb.

This patch is causing a null ptr issue, the proper fix is in the next
patch.

Fixes: 3a4aa3cb8356 ("net/mlx5e: fix double free of encap_header in update funcs")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agobpf: syzkaller found null ptr deref in unix_bpf proto add
John Fastabend [Fri, 1 Dec 2023 18:01:38 +0000 (10:01 -0800)]
bpf: syzkaller found null ptr deref in unix_bpf proto add

[ Upstream commit 8d6650646ce49e9a5b8c5c23eb94f74b1749f70f ]

I added logic to track the sock pair for stream_unix sockets so that we
ensure lifetime of the sock matches the time a sockmap could reference
the sock (see fixes tag). I forgot though that we allow af_unix unconnected
sockets into a sock{map|hash} map.

This is problematic because previous fixed expected sk_pair() to exist
and did not NULL check it. Because unconnected sockets have a NULL
sk_pair this resulted in the NULL ptr dereference found by syzkaller.

BUG: KASAN: null-ptr-deref in unix_stream_bpf_update_proto+0x72/0x430 net/unix/unix_bpf.c:171
Write of size 4 at addr 0000000000000080 by task syz-executor360/5073
Call Trace:
 <TASK>
 ...
 sock_hold include/net/sock.h:777 [inline]
 unix_stream_bpf_update_proto+0x72/0x430 net/unix/unix_bpf.c:171
 sock_map_init_proto net/core/sock_map.c:190 [inline]
 sock_map_link+0xb87/0x1100 net/core/sock_map.c:294
 sock_map_update_common+0xf6/0x870 net/core/sock_map.c:483
 sock_map_update_elem_sys+0x5b6/0x640 net/core/sock_map.c:577
 bpf_map_update_value+0x3af/0x820 kernel/bpf/syscall.c:167

We considered just checking for the null ptr and skipping taking a ref
on the NULL peer sock. But, if the socket is then connected() after
being added to the sockmap we can cause the original issue again. So
instead this patch blocks adding af_unix sockets that are not in the
ESTABLISHED state.

Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+e8030702aefd3444fb9e@syzkaller.appspotmail.com
Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock")
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20231201180139.328529-2-john.fastabend@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoice: fix theoretical out-of-bounds access in ethtool link modes
Michal Schmidt [Thu, 30 Nov 2023 16:58:06 +0000 (17:58 +0100)]
ice: fix theoretical out-of-bounds access in ethtool link modes

[ Upstream commit 91f9181c738101a276d9da333e0ab665ad806e6d ]

To map phy types reported by the hardware to ethtool link mode bits,
ice uses two lookup tables (phy_type_low_lkup, phy_type_high_lkup).
The "low" table has 64 elements to cover every possible bit the hardware
may report, but the "high" table has only 13. If the hardware reports a
higher bit in phy_types_high, the driver would access memory beyond the
lookup table's end.

Instead of iterating through all 64 bits of phy_types_{low,high}, use
the sizes of the respective lookup tables.

Fixes: 9136e1f1e5c3 ("ice: refactor PHY type to ethtool link mode")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agowifi: mac80211: mesh_plink: fix matches_local logic
Johannes Berg [Mon, 11 Dec 2023 07:05:31 +0000 (09:05 +0200)]
wifi: mac80211: mesh_plink: fix matches_local logic

[ Upstream commit 8c386b166e2517cf3a123018e77941ec22625d0f ]

During refactoring the "else" here got lost, add it back.

Fixes: c99a89edb106 ("mac80211: factor out plink event gathering")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231211085121.795480fa0e0b.I017d501196a5bbdcd9afd33338d342d6fe1edd79@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agowifi: mac80211: mesh: check element parsing succeeded
Johannes Berg [Mon, 11 Dec 2023 07:05:32 +0000 (09:05 +0200)]
wifi: mac80211: mesh: check element parsing succeeded

[ Upstream commit 1fc4a3eec50d726f4663ad3c0bb0158354d6647a ]

ieee802_11_parse_elems() can return NULL, so we must
check for the return value.

Fixes: 5d24828d05f3 ("mac80211: always allocate struct ieee802_11_elems")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231211085121.93dea364f3d3.Ie87781c6c48979fb25a744b90af4a33dc2d83a28@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agowifi: mac80211: check defragmentation succeeded
Johannes Berg [Mon, 11 Dec 2023 07:05:30 +0000 (09:05 +0200)]
wifi: mac80211: check defragmentation succeeded

[ Upstream commit 98849ba2aa9db46e62720fb686a9d63ed9887806 ]

We need to check that cfg80211_defragment_element()
didn't return an error, since it can fail due to bad
input, and we didn't catch that before.

Fixes: 8eb8dd2ffbbb ("wifi: mac80211: Support link removal using Reconfiguration ML element")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231211085121.8595a6b67fc0.I1225edd8f98355e007f96502e358e476c7971d8c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agowifi: mac80211: don't re-add debugfs during reconfig
Johannes Berg [Mon, 11 Dec 2023 07:05:19 +0000 (09:05 +0200)]
wifi: mac80211: don't re-add debugfs during reconfig

[ Upstream commit 63bafd9d5421959b2124dd940ed8d7462d99f449 ]

If we're doing reconfig, then we cannot add the debugfs
files that are already there from before the reconfig.
Skip that in drv_change_sta_links() during reconfig.

Fixes: d2caad527c19 ("wifi: mac80211: add API to show the link STAs in debugfs")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Gregory Greenman <gregory.greenman@intel.com>
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231211085121.88a950f43e16.Id71181780994649219685887c0fcad33d387cc78@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agowifi: mac80211: check if the existing link config remains unchanged
Edward Adam Davis [Wed, 29 Nov 2023 12:17:47 +0000 (20:17 +0800)]
wifi: mac80211: check if the existing link config remains unchanged

[ Upstream commit c1393c132b906fbdf91f6d1c9eb2ef7a00cce64e ]

[Syz report]
WARNING: CPU: 1 PID: 5067 at net/mac80211/rate.c:48 rate_control_rate_init+0x540/0x690 net/mac80211/rate.c:48
Modules linked in:
CPU: 1 PID: 5067 Comm: syz-executor413 Not tainted 6.7.0-rc3-syzkaller-00014-gdf60cee26a2e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:rate_control_rate_init+0x540/0x690 net/mac80211/rate.c:48
Code: 48 c7 c2 00 46 0c 8c be 08 03 00 00 48 c7 c7 c0 45 0c 8c c6 05 70 79 0b 05 01 e8 1b a0 6f f7 e9 e0 fd ff ff e8 61 b3 8f f7 90 <0f> 0b 90 e9 36 ff ff ff e8 53 b3 8f f7 e8 5e 0b 78 f7 31 ff 89 c3
RSP: 0018:ffffc90003c57248 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888016bc4000 RCX: ffffffff89f7d519
RDX: ffff888076d43b80 RSI: ffffffff89f7d6df RDI: 0000000000000005
RBP: ffff88801daaae20 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000001
R13: 0000000000000000 R14: ffff888020030e20 R15: ffff888078f08000
FS:  0000555556b94380(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000005fdeb8 CR3: 0000000076d22000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 sta_apply_auth_flags.constprop.0+0x4b7/0x510 net/mac80211/cfg.c:1674
 sta_apply_parameters+0xaf1/0x16c0 net/mac80211/cfg.c:2002
 ieee80211_add_station+0x3fa/0x6c0 net/mac80211/cfg.c:2068
 rdev_add_station net/wireless/rdev-ops.h:201 [inline]
 nl80211_new_station+0x13ba/0x1a70 net/wireless/nl80211.c:7603
 genl_family_rcv_msg_doit+0x1fc/0x2e0 net/netlink/genetlink.c:972
 genl_family_rcv_msg net/netlink/genetlink.c:1052 [inline]
 genl_rcv_msg+0x561/0x800 net/netlink/genetlink.c:1067
 netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2545
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
 netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
 netlink_unicast+0x53b/0x810 net/netlink/af_netlink.c:1368
 netlink_sendmsg+0x93c/0xe40 net/netlink/af_netlink.c:1910
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0xd5/0x180 net/socket.c:745
 ____sys_sendmsg+0x6ac/0x940 net/socket.c:2584
 ___sys_sendmsg+0x135/0x1d0 net/socket.c:2638
 __sys_sendmsg+0x117/0x1e0 net/socket.c:2667
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82
 entry_SYSCALL_64_after_hwframe+0x63/0x6b

[Analysis]
It is inappropriate to make a link configuration change judgment on an
non-existent and non new link.

[Fix]
Quickly exit when there is a existent link and the link configuration has not
changed.

Fixes: b303835dabe0 ("wifi: mac80211: accept STA changes without link changes")
Reported-and-tested-by: syzbot+62d7eef57b09bfebcd84@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Link: https://msgid.link/tencent_DE67FF86DB92ED465489A36ECD2EDDCC8C06@qq.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agowifi: iwlwifi: pcie: add another missing bh-disable for rxq->lock
Johannes Berg [Fri, 8 Dec 2023 16:32:02 +0000 (18:32 +0200)]
wifi: iwlwifi: pcie: add another missing bh-disable for rxq->lock

[ Upstream commit a4754182dc936b97ec7e9f6b08cdf7ed97ef9069 ]

Evidently I had only looked at all the ones in rx.c, and missed this.
Add bh-disable to this use of the rxq->lock as well.

Fixes: 25edc8f259c7 ("iwlwifi: pcie: properly implement NAPI")
Reported-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://msgid.link/20231208183100.e79ad3dae649.I8f19713c4383707f8be7fc20ff5cc1ecf12429bb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agowifi: ieee80211: don't require protected vendor action frames
Johannes Berg [Wed, 6 Dec 2023 21:37:57 +0000 (22:37 +0100)]
wifi: ieee80211: don't require protected vendor action frames

[ Upstream commit 98fb9b9680c9f3895ced02d6a73e27f5d7b5892b ]

For vendor action frames, whether a protected one should be
used or not is clearly up to the individual vendor and frame,
so even though a protected dual is defined, it may not get
used. Thus, don't require protection for vendor action frames
when they're used in a connection.

Since we obviously don't process frames unknown to the kernel
in the kernel, it may makes sense to invert this list to have
all the ones the kernel processes and knows to be requiring
protection, but that'd be a different change.

Fixes: 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames")
Reported-by: Jouni Malinen <j@w1.fi>
Link: https://msgid.link/20231206223801.f6a2cf4e67ec.Ifa6acc774bd67801d3dafb405278f297683187aa@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoSUNRPC: Revert 5f7fc5d69f6e92ec0b38774c387f5cf7812c5806
Chuck Lever [Mon, 18 Dec 2023 22:05:40 +0000 (17:05 -0500)]
SUNRPC: Revert 5f7fc5d69f6e92ec0b38774c387f5cf7812c5806

[ Upstream commit bd018b98ba84ca0c80abac1ef23ce726a809e58c ]

Guillaume says:
> I believe commit 5f7fc5d69f6e ("SUNRPC: Resupply rq_pages from
> node-local memory") in Linux 6.5+ is incorrect. It passes
> unconditionally rq_pool->sp_id as the NUMA node.
>
> While the comment in the svc_pool declaration in sunrpc/svc.h says
> that sp_id is also the NUMA node id, it might not be the case if
> the svc is created using svc_create_pooled(). svc_created_pooled()
> can use the per-cpu pool mode therefore in this case sp_id would
> be the cpu id.

Fix this by reverting now. At a later point this minor optimization,
and the deceptive labeling of the sp_id field, can be revisited.

Reported-by: Guillaume Morin <guillaume@morinfr.org>
Closes: https://lore.kernel.org/linux-nfs/ZYC9rsno8qYggVt9@bender.morinfr.org/T/#u
Fixes: 5f7fc5d69f6e ("SUNRPC: Resupply rq_pages from node-local memory")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoplatform/x86/intel/pmc: Fix hang in pmc_core_send_ltr_ignore()
Rajvi Jingar [Sat, 16 Dec 2023 01:16:50 +0000 (17:16 -0800)]
platform/x86/intel/pmc: Fix hang in pmc_core_send_ltr_ignore()

[ Upstream commit fbcf67ce5a9e2831c14bdfb895be05213e611724 ]

For input value 0, PMC stays unassigned which causes crash while trying
to access PMC for register read/write. Include LTR index 0 in pmc_index
and ltr_index calculation.

Fixes: 2bcef4529222 ("platform/x86:intel/pmc: Enable debugfs multiple PMC support")
Signed-off-by: Rajvi Jingar <rajvi.jingar@linux.intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20231216011650.1973941-1-rajvi.jingar@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agos390/vx: fix save/restore of fpu kernel context
Heiko Carstens [Fri, 8 Dec 2023 14:03:15 +0000 (15:03 +0100)]
s390/vx: fix save/restore of fpu kernel context

[ Upstream commit e6b2dab41888332bf83f592131e7ea07756770a4 ]

The KERNEL_FPR mask only contains a flag for the first eight vector
registers. However floating point registers overlay parts of the first
sixteen vector registers.

This could lead to vector register corruption if a kernel fpu context uses
any of the vector registers 8 to 15 and is interrupted or calls a
KERNEL_FPR context. If that context uses also vector registers 8 to 15,
their contents will be corrupted on return.

Luckily this is currently not a real bug, since the kernel has only one
KERNEL_FPR user with s390_adjust_jiffies() and it is only using floating
point registers 0 to 2.

Fix this by using the correct bits for KERNEL_FPR.

Fixes: 7f79695cc1b6 ("s390/fpu: improve kernel_fpu_[begin|end]")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoreset: Fix crash when freeing non-existent optional resets
Geert Uytterhoeven [Wed, 29 Nov 2023 16:55:33 +0000 (17:55 +0100)]
reset: Fix crash when freeing non-existent optional resets

[ Upstream commit 4a6756f56bcf8e64c87144a626ce53aea4899c0e ]

When obtaining one or more optional resets, non-existent resets are
stored as NULL pointers, and all related error and cleanup paths need to
take this into account.

Currently only reset_control_put() and reset_control_bulk_put()
get this right.  All of __reset_control_bulk_get(),
of_reset_control_array_get(), and reset_control_array_put() lack the
proper checking, causing NULL pointer dereferences on failure or
release.

Fix this by moving the existing check from reset_control_bulk_put() to
__reset_control_put_internal(), so it applies to all callers.
The double check in reset_control_put() doesn't hurt.

Fixes: 17c82e206d2a3cd8 ("reset: Add APIs to manage array of resets")
Fixes: 48d71395896d54ee ("reset: Add reset_control_bulk API")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/2440edae7ca8534628cdbaf559ded288f2998178.1701276806.git.geert+renesas@glider.be
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoARM: OMAP2+: Fix null pointer dereference and memory leak in omap_soc_device_init
Kunwu Chan [Thu, 23 Nov 2023 14:52:37 +0000 (22:52 +0800)]
ARM: OMAP2+: Fix null pointer dereference and memory leak in omap_soc_device_init

[ Upstream commit c72b9c33ef9695ad7ce7a6eb39a9df8a01b70796 ]

kasprintf() returns a pointer to dynamically allocated memory which can
be NULL upon failure. When 'soc_dev_attr->family' is NULL,it'll trigger
the null pointer dereference issue, such as in 'soc_info_show'.

And when 'soc_device_register' fails, it's necessary to release
'soc_dev_attr->family' to avoid memory leaks.

Fixes: 6770b2114325 ("ARM: OMAP2+: Export SoC information to userspace")
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Message-ID: <20231123145237.609442-1-chentao@kylinos.cn>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoARM: dts: dra7: Fix DRA7 L3 NoC node register size
Andrew Davis [Mon, 13 Nov 2023 18:16:04 +0000 (12:16 -0600)]
ARM: dts: dra7: Fix DRA7 L3 NoC node register size

[ Upstream commit 1e5caee2ba8f1426e8098afb4ca38dc40a0ca71b ]

This node can access any part of the L3 configuration registers space,
including CLK1 and CLK2 which are 0x800000 offset. Restore this area
size to include these areas.

Fixes: 7f2659ce657e ("ARM: dts: Move dra7 l3 noc to a separate node")
Signed-off-by: Andrew Davis <afd@ti.com>
Message-ID: <20231113181604.546444-1-afd@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agoarm64: dts: allwinner: h616: update emac for Orange Pi Zero 3
Chukun Pan [Sun, 29 Oct 2023 07:40:09 +0000 (15:40 +0800)]
arm64: dts: allwinner: h616: update emac for Orange Pi Zero 3

[ Upstream commit b9622937d95809ef89904583191571a9fa326402 ]

The current emac setting is not suitable for Orange Pi Zero 3,
move it back to Orange Pi Zero 2 DT. Also update phy mode and
delay values for emac on Orange Pi Zero 3.
With these changes, Ethernet now looks stable.

Fixes: 322bf103204b ("arm64: dts: allwinner: h616: Split Orange Pi Zero 2 DT")
Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20231029074009.7820-2-amadeus@jmu.edu.cn
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agospi: spi-imx: correctly configure burst length when using dma
Benjamin Bigler [Sat, 9 Dec 2023 22:23:26 +0000 (23:23 +0100)]
spi: spi-imx: correctly configure burst length when using dma

[ Upstream commit e9b220aeacf109684cce36a94fc24ed37be92b05 ]

If DMA is used, burst length should be set to the bus width of the DMA.
Otherwise, the SPI hardware will transmit/receive one word per DMA
request.
Since this issue affects both transmission and reception, it cannot be
detected with a loopback test.
Replace magic numbers 512 and 0xfff with MX51_ECSPI_CTRL_MAX_BURST.

Reported-by Stefan Bigler <linux@bigler.io>

Signed-off-by: Benjamin Bigler <benjamin@bigler.one>
Fixes: 15a6af94a277 ("spi: Increase imx51 ecspi burst length based on transfer length")
Link: https://lore.kernel.org/r/8a415902c751cdbb4b20ce76569216ed@mail.infomaniak.com
Link: https://lore.kernel.org/r/20231209222338.5564-1-benjamin@bigler.one
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agodrm: Fix FD ownership check in drm_master_check_perm()
Lingkai Dong [Wed, 6 Dec 2023 13:51:58 +0000 (13:51 +0000)]
drm: Fix FD ownership check in drm_master_check_perm()

[ Upstream commit 5a6c9a05e55cb2972396cc991af9d74c8c15029a ]

The DRM subsystem keeps a record of the owner of a DRM device file
descriptor using thread group ID (TGID) instead of process ID (PID), to
ensures all threads within the same userspace process are considered the
owner. However, the DRM master ownership check compares the current
thread's PID against the record, so the thread is incorrectly considered to
be not the FD owner if the PID is not equal to the TGID. This causes DRM
ioctls to be denied master privileges, even if the same thread that opened
the FD performs an ioctl. Fix this by checking TGID.

Fixes: 4230cea89cafb ("drm: Track clients by tgid and not tid")
Signed-off-by: Lingkai Dong <lingkai.dong@arm.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v6.4+
Link: https://patchwork.freedesktop.org/patch/msgid/PA6PR08MB107665920BE9A96658CDA04CE8884A@PA6PR08MB10766.eurprd08.prod.outlook.com
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agodrm: Update file owner during use
Tvrtko Ursulin [Wed, 21 Jun 2023 09:48:24 +0000 (10:48 +0100)]
drm: Update file owner during use

[ Upstream commit 1c7a387ffef894b1ab3942f0482dac7a6e0a909c ]

With the typical model where the display server opens the file descriptor
and then hands it over to the client(*), we were showing stale data in
debugfs.

Fix it by updating the drm_file->pid on ioctl access from a different
process.

The field is also made RCU protected to allow for lockless readers. Update
side is protected with dev->filelist_mutex.

Before:

$ cat /sys/kernel/debug/dri/0/clients
             command   pid dev master a   uid      magic
                Xorg  2344   0   y    y     0          0
                Xorg  2344   0   n    y     0          2
                Xorg  2344   0   n    y     0          3
                Xorg  2344   0   n    y     0          4

After:

$ cat /sys/kernel/debug/dri/0/clients
             command  tgid dev master a   uid      magic
                Xorg   830   0   y    y     0          0
       xfce4-session   880   0   n    y     0          1
               xfwm4   943   0   n    y     0          2
           neverball  1095   0   n    y     0          3

*)
More detailed and historically accurate description of various handover
implementation kindly provided by Emil Velikov:

"""
The traditional model, the server was the orchestrator managing the
primary device node. From the fd, to the master status and
authentication. But looking at the fd alone, this has varied across
the years.

IIRC in the DRI1 days, Xorg (libdrm really) would have a list of open
fd(s) and reuse those whenever needed, DRI2 the client was responsible
for open() themselves and with DRI3 the fd was passed to the client.

Around the inception of DRI3 and systemd-logind, the latter became
another possible orchestrator. Whereby Xorg and Wayland compositors
could ask it for the fd. For various reasons (hysterical and genuine
ones) Xorg has a fallback path going the open(), whereas Wayland
compositors are moving to solely relying on logind... some never had
fallback even.

Over the past few years, more projects have emerged which provide
functionality similar (be that on API level, Dbus, or otherwise) to
systemd-logind.
"""

v2:
 * Fixed typo in commit text and added a fine historical explanation
   from Emil.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230621094824.2348732-1-tvrtko.ursulin@linux.intel.com
Signed-off-by: Christian König <christian.koenig@amd.com>
Stable-dep-of: 5a6c9a05e55c ("drm: Fix FD ownership check in drm_master_check_perm()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agodrm/i915/edp: don't write to DP_LINK_BW_SET when using rate select
Jani Nikula [Tue, 5 Dec 2023 18:05:51 +0000 (20:05 +0200)]
drm/i915/edp: don't write to DP_LINK_BW_SET when using rate select

[ Upstream commit e6861d8264cd43c5eb20196e53df36fd71ec5698 ]

The eDP 1.5 spec adds a clarification for eDP 1.4x:

> For eDP v1.4x, if the Source device chooses the Main-Link rate by way
> of DPCD 00100h, the Sink device shall ignore DPCD 00115h[2:0].

We write 0 to DP_LINK_BW_SET (DPCD 100h) even when using
DP_LINK_RATE_SET (DPCD 114h). Stop doing that, as it can cause the panel
to ignore the rate set method.

Moreover, 0 is a reserved value for DP_LINK_BW_SET, and should not be
used.

v2: Improve the comments (Ville)

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9081
Tested-by: Animesh Manna <animesh.manna@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231205180551.2476228-1-jani.nikula@intel.com
(cherry picked from commit 23b392b94acb0499f69706c5808c099f590ebcf4)
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agodrm/i915: Introduce crtc_state->enhanced_framing
Ville Syrjälä [Wed, 3 May 2023 11:36:59 +0000 (14:36 +0300)]
drm/i915: Introduce crtc_state->enhanced_framing

[ Upstream commit 3072a24c778a7102d70692af5556e47363114c67 ]

Track DP enhanced framing properly in the crtc state instead
of relying just on the cached DPCD everywhere, and hook it
up into the state check and dump.

v2: Actually set enhanced_framing in .compute_config()

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230503113659.16305-1-ville.syrjala@linux.intel.com
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Stable-dep-of: e6861d8264cd ("drm/i915/edp: don't write to DP_LINK_BW_SET when using rate select")
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agodrm/i915: Fix FEC state dump
Ville Syrjälä [Tue, 2 May 2023 14:39:01 +0000 (17:39 +0300)]
drm/i915: Fix FEC state dump

[ Upstream commit 3dfeb80b308882cc6e1f5f6c36fd9a7f4cae5fc6 ]

Stop dumping state while reading it out. We have a proper
place for that stuff.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230502143906.2401-7-ville.syrjala@linux.intel.com
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Stable-dep-of: e6861d8264cd ("drm/i915/edp: don't write to DP_LINK_BW_SET when using rate select")
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agodrm/amd/display: fix hw rotated modes when PSR-SU is enabled
Hamza Mahfooz [Tue, 5 Dec 2023 19:55:04 +0000 (14:55 -0500)]
drm/amd/display: fix hw rotated modes when PSR-SU is enabled

[ Upstream commit f528ee145bd0076cd0ed7e7b2d435893e6329e98 ]

We currently don't support dirty rectangles on hardware rotated modes.
So, if a user is using hardware rotated modes with PSR-SU enabled,
use PSR-SU FFU for all rotated planes (including cursor planes).

Cc: stable@vger.kernel.org
Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Bin Li <binli@gnome.org>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agobtrfs: free qgroup pertrans reserve on transaction abort
Boris Burkov [Fri, 1 Dec 2023 21:00:11 +0000 (13:00 -0800)]
btrfs: free qgroup pertrans reserve on transaction abort

[ Upstream commit b321a52cce062ec7ed385333a33905d22159ce36 ]

If we abort a transaction, we never run the code that frees the pertrans
qgroup reservation. This results in warnings on unmount as that
reservation has been leaked. The leak isn't a huge issue since the fs is
read-only, but it's better to clean it up when we know we can/should. Do
it during the cleanup_transaction step of aborting.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agobtrfs: qgroup: use qgroup_iterator in qgroup_convert_meta()
Qu Wenruo [Sat, 2 Sep 2023 00:13:54 +0000 (08:13 +0800)]
btrfs: qgroup: use qgroup_iterator in qgroup_convert_meta()

[ Upstream commit 0913445082496c2b29668ee26521401b273838b8 ]

With the new qgroup_iterator_add() and qgroup_iterator_clean(), we can
get rid of the ulist and its GFP_ATOMIC memory allocation.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: b321a52cce06 ("btrfs: free qgroup pertrans reserve on transaction abort")
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agobtrfs: qgroup: iterate qgroups without memory allocation for qgroup_reserve()
Qu Wenruo [Sat, 2 Sep 2023 00:13:52 +0000 (08:13 +0800)]
btrfs: qgroup: iterate qgroups without memory allocation for qgroup_reserve()

[ Upstream commit 686c4a5a42635e0d2889e3eb461c554fd0b616b4 ]

Qgroup heavily relies on ulist to go through all the involved
qgroups, but since we're using ulist inside fs_info->qgroup_lock
spinlock, this means we're doing a lot of GFP_ATOMIC allocations.

This patch reduces the GFP_ATOMIC usage for qgroup_reserve() by
eliminating the memory allocation completely.

This is done by moving the needed memory to btrfs_qgroup::iterator
list_head, so that we can put all the involved qgroup into a on-stack
list, thus eliminating the need to allocate memory while holding
spinlock.

The only cost is the slightly higher memory usage, but considering the
reduce GFP_ATOMIC during a hot path, it should still be acceptable.

Function qgroup_reserve() is the perfect start point for this
conversion.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: b321a52cce06 ("btrfs: free qgroup pertrans reserve on transaction abort")
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agomm/damon/core: make damon_start() waits until kdamond_fn() starts
SeongJae Park [Fri, 8 Dec 2023 17:50:18 +0000 (17:50 +0000)]
mm/damon/core: make damon_start() waits until kdamond_fn() starts

[ Upstream commit 6376a824595607e99d032a39ba3394988b4fce96 ]

The cleanup tasks of kdamond threads including reset of corresponding
DAMON context's ->kdamond field and decrease of global nr_running_ctxs
counter is supposed to be executed by kdamond_fn().  However, commit
0f91d13366a4 ("mm/damon: simplify stop mechanism") made neither
damon_start() nor damon_stop() ensure the corresponding kdamond has
started the execution of kdamond_fn().

As a result, the cleanup can be skipped if damon_stop() is called fast
enough after the previous damon_start().  Especially the skipped reset
of ->kdamond could cause a use-after-free.

Fix it by waiting for start of kdamond_fn() execution from
damon_start().

Link: https://lkml.kernel.org/r/20231208175018.63880-1-sj@kernel.org
Fixes: 0f91d13366a4 ("mm/damon: simplify stop mechanism")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: Jakub Acs <acsjakub@amazon.de>
Cc: Changbin Du <changbin.du@intel.com>
Cc: Jakub Acs <acsjakub@amazon.de>
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agomm/damon/core: use number of passed access sampling as a timer
SeongJae Park [Thu, 14 Sep 2023 02:15:23 +0000 (02:15 +0000)]
mm/damon/core: use number of passed access sampling as a timer

[ Upstream commit 4472edf63d6630e6cf65e205b4fc8c3c94d0afe5 ]

DAMON sleeps for sampling interval after each sampling, and check if the
aggregation interval and the ops update interval have passed using
ktime_get_coarse_ts64() and baseline timestamps for the intervals.  That
design is for making the operations occur at deterministic timing
regardless of the time that spend for each work.  However, it turned out
it is not that useful, and incur not-that-intuitive results.

After all, timer functions, and especially sleep functions that DAMON uses
to wait for specific timing, are not necessarily strictly accurate.  It is
legal design, so no problem.  However, depending on such inaccuracies, the
nr_accesses can be larger than aggregation interval divided by sampling
interval.  For example, with the default setting (5 ms sampling interval
and 100 ms aggregation interval) we frequently show regions having
nr_accesses larger than 20.  Also, if the execution of a DAMOS scheme
takes a long time, next aggregation could happen before enough number of
samples are collected.  This is not what usual users would intuitively
expect.

Since access check sampling is the smallest unit work of DAMON, using the
number of passed sampling intervals as the DAMON-internal timer can easily
avoid these problems.  That is, convert aggregation and ops update
intervals to numbers of sampling intervals that need to be passed before
those operations be executed, count the number of passed sampling
intervals, and invoke the operations as soon as the specific amount of
sampling intervals passed.  Make the change.

Note that this could make a behavioral change to settings that using
intervals that not aligned by the sampling interval.  For example, if the
sampling interval is 5 ms and the aggregation interval is 12 ms, DAMON
effectively uses 15 ms as its aggregation interval, because it checks
whether the aggregation interval after sleeping the sampling interval.
This change will make DAMON to effectively use 10 ms as aggregation
interval, since it uses 'aggregation interval / sampling interval *
sampling interval' as the effective aggregation interval, and we don't use
floating point types.  Usual users would have used aligned intervals, so
this behavioral change is not expected to make any meaningful impact, so
just make this change.

Link: https://lkml.kernel.org/r/20230914021523.60649-1-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stable-dep-of: 6376a8245956 ("mm/damon/core: make damon_start() waits until kdamond_fn() starts")
Signed-off-by: Sasha Levin <sashal@kernel.org>
11 months agobpf: Fix prog_array_map_poke_run map poke update
Jiri Olsa [Wed, 6 Dec 2023 08:30:40 +0000 (09:30 +0100)]
bpf: Fix prog_array_map_poke_run map poke update

commit 4b7de801606e504e69689df71475d27e35336fb3 upstream.

Lee pointed out issue found by syscaller [0] hitting BUG in prog array
map poke update in prog_array_map_poke_run function due to error value
returned from bpf_arch_text_poke function.

There's race window where bpf_arch_text_poke can fail due to missing
bpf program kallsym symbols, which is accounted for with check for
-EINVAL in that BUG_ON call.

The problem is that in such case we won't update the tail call jump
and cause imbalance for the next tail call update check which will
fail with -EBUSY in bpf_arch_text_poke.

I'm hitting following race during the program load:

  CPU 0                             CPU 1

  bpf_prog_load
    bpf_check
      do_misc_fixups
        prog_array_map_poke_track

                                    map_update_elem
                                      bpf_fd_array_map_update_elem
                                        prog_array_map_poke_run

                                          bpf_arch_text_poke returns -EINVAL

    bpf_prog_kallsyms_add

After bpf_arch_text_poke (CPU 1) fails to update the tail call jump, the next
poke update fails on expected jump instruction check in bpf_arch_text_poke
with -EBUSY and triggers the BUG_ON in prog_array_map_poke_run.

Similar race exists on the program unload.

Fixing this by moving the update to bpf_arch_poke_desc_update function which
makes sure we call __bpf_arch_text_poke that skips the bpf address check.

Each architecture has slightly different approach wrt looking up bpf address
in bpf_arch_text_poke, so instead of splitting the function or adding new
'checkip' argument in previous version, it seems best to move the whole
map_poke_run update as arch specific code.

  [0] https://syzkaller.appspot.com/bug?extid=97a4fe20470e9bc30810

Fixes: ebf7d1f508a7 ("bpf, x64: rework pro/epilogue and tailcall handling in JIT")
Reported-by: syzbot+97a4fe20470e9bc30810@syzkaller.appspotmail.com
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Cc: Lee Jones <lee@kernel.org>
Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20231206083041.1306660-2-jolsa@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoLinux 6.6.8 v6.6.8
Greg Kroah-Hartman [Wed, 20 Dec 2023 16:02:07 +0000 (17:02 +0100)]
Linux 6.6.8

Link: https://lore.kernel.org/r/20231218135104.927894164@linuxfoundation.org
Tested-by: Ronald Warsow <rwarsow@gmx.de>
Tested-by: SeongJae Park <sj@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Ricardo B. Marliere <ricardo@marliere.net>
Tested-by: Salvatore Bonaccorso <carnil@debian.org>
Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
Tested-by: Justin M. Forbes <jforbes@fedoraproject.org>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Kelsey Steele <kelseysteele@linux.microsoft.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoRDMA/mlx5: Change the key being sent for MPV device affiliation
Patrisious Haddad [Thu, 19 Oct 2023 08:47:05 +0000 (11:47 +0300)]
RDMA/mlx5: Change the key being sent for MPV device affiliation

commit 02e7d139e5e24abb5fde91934fc9dc0344ac1926 upstream.

Change the key that we send from IB driver to EN driver regarding the
MPV device affiliation, since at that stage the IB device is not yet
initialized, so its index would be zero for different IB devices and
cause wrong associations between unrelated master and slave devices.

Instead use a unique value from inside the core device which is already
initialized at this stage.

Fixes: 0d293714ac32 ("RDMA/mlx5: Send events from IB driver about device affiliation state")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/ac7e66357d963fc68d7a419515180212c96d137d.1697705185.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agox86/speculation, objtool: Use absolute relocations for annotations
Fangrui Song [Wed, 20 Sep 2023 00:17:28 +0000 (17:17 -0700)]
x86/speculation, objtool: Use absolute relocations for annotations

commit b8ec60e1186cdcfce41e7db4c827cb107e459002 upstream.

.discard.retpoline_safe sections do not have the SHF_ALLOC flag.  These
sections referencing text sections' STT_SECTION symbols with PC-relative
relocations like R_386_PC32 [0] is conceptually not suitable.  Newer
LLD will report warnings for REL relocations even for relocatable links [1]:

    ld.lld: warning: vmlinux.a(drivers/i2c/busses/i2c-i801.o):(.discard.retpoline_safe+0x120): has non-ABS relocation R_386_PC32 against symbol ''

Switch to absolute relocations instead, which indicate link-time
addresses.  In a relocatable link, these addresses are also output
section offsets, used by checks in tools/objtool/check.c.  When linking
vmlinux, these .discard.* sections will be discarded, therefore it is
not a problem that R_X86_64_32 cannot represent a kernel address.

Alternatively, we could set the SHF_ALLOC flag for .discard.* sections,
but I think non-SHF_ALLOC for sections to be discarded makes more sense.

Note: if we decide to never support REL architectures (e.g. arm, i386),
we can utilize R_*_NONE relocations (.reloc ., BFD_RELOC_NONE, sym),
making .discard.* sections zero-sized.  That said, the section content
waste is 4 bytes per entry, much smaller than sizeof(Elf{32,64}_Rel).

  [0] commit 1c0c1faf5692 ("objtool: Use relative pointers for annotations")
  [1] https://github.com/ClangBuiltLinux/linux/issues/1937

Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20230920001728.1439947-1-maskray@google.com
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoring-buffer: Have rb_time_cmpxchg() set the msb counter too
Steven Rostedt (Google) [Fri, 15 Dec 2023 13:41:14 +0000 (08:41 -0500)]
ring-buffer: Have rb_time_cmpxchg() set the msb counter too

commit 0aa0e5289cfe984a8a9fdd79ccf46ccf080151f7 upstream.

The rb_time_cmpxchg() on 32-bit architectures requires setting three
32-bit words to represent the 64-bit timestamp, with some salt for
synchronization. Those are: msb, top, and bottom

The issue is, the rb_time_cmpxchg() did not properly salt the msb portion,
and the msb that was written was stale.

Link: https://lore.kernel.org/linux-trace-kernel/20231215084114.20899342@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: f03f2abce4f39 ("ring-buffer: Have 32 bit time stamps use all 64 bits")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoring-buffer: Do not try to put back write_stamp
Steven Rostedt (Google) [Fri, 15 Dec 2023 03:29:21 +0000 (22:29 -0500)]
ring-buffer: Do not try to put back write_stamp

commit dd939425707898da992e59ab0fcfae4652546910 upstream.

If an update to an event is interrupted by another event between the time
the initial event allocated its buffer and where it wrote to the
write_stamp, the code try to reset the write stamp back to the what it had
just overwritten. It knows that it was overwritten via checking the
before_stamp, and if it didn't match what it wrote to the before_stamp
before it allocated its space, it knows it was overwritten.

To put back the write_stamp, it uses the before_stamp it read. The problem
here is that by writing the before_stamp to the write_stamp it makes the
two equal again, which means that the write_stamp can be considered valid
as the last timestamp written to the ring buffer. But this is not
necessarily true. The event that interrupted the event could have been
interrupted in a way that it was interrupted as well, and can end up
leaving with an invalid write_stamp. But if this happens and returns to
this context that uses the before_stamp to update the write_stamp again,
it can possibly incorrectly make it valid, causing later events to have in
correct time stamps.

As it is OK to leave this function with an invalid write_stamp (one that
doesn't match the before_stamp), there's no reason to try to make it valid
again in this case. If this race happens, then just leave with the invalid
write_stamp and the next event to come along will just add a absolute
timestamp and validate everything again.

Bonus points: This gets rid of another cmpxchg64!

Link: https://lore.kernel.org/linux-trace-kernel/20231214222921.193037a7@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs
Steven Rostedt (Google) [Tue, 12 Dec 2023 16:53:01 +0000 (11:53 -0500)]
ring-buffer: Fix a race in rb_time_cmpxchg() for 32 bit archs

commit fff88fa0fbc7067ba46dde570912d63da42c59a9 upstream.

Mathieu Desnoyers pointed out an issue in the rb_time_cmpxchg() for 32 bit
architectures. That is:

 static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set)
 {
unsigned long cnt, top, bottom, msb;
unsigned long cnt2, top2, bottom2, msb2;
u64 val;

/* The cmpxchg always fails if it interrupted an update */
 if (!__rb_time_read(t, &val, &cnt2))
 return false;

 if (val != expect)
 return false;

<<<< interrupted here!

 cnt = local_read(&t->cnt);

The problem is that the synchronization counter in the rb_time_t is read
*after* the value of the timestamp is read. That means if an interrupt
were to come in between the value being read and the counter being read,
it can change the value and the counter and the interrupted process would
be clueless about it!

The counter needs to be read first and then the value. That way it is easy
to tell if the value is stale or not. If the counter hasn't been updated,
then the value is still good.

Link: https://lore.kernel.org/linux-trace-kernel/20231211201324.652870-1-mathieu.desnoyers@efficios.com/
Link: https://lore.kernel.org/linux-trace-kernel/20231212115301.7a9c9a64@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Fixes: 10464b4aa605e ("ring-buffer: Add rb_time_t 64 bit operations for speeding up 32 bit")
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoring-buffer: Fix writing to the buffer with max_data_size
Steven Rostedt (Google) [Tue, 12 Dec 2023 16:16:17 +0000 (11:16 -0500)]
ring-buffer: Fix writing to the buffer with max_data_size

commit b3ae7b67b87fed771fa5bf95389df06b0433603e upstream.

The maximum ring buffer data size is the maximum size of data that can be
recorded on the ring buffer. Events must be smaller than the sub buffer
data size minus any meta data. This size is checked before trying to
allocate from the ring buffer because the allocation assumes that the size
will fit on the sub buffer.

The maximum size was calculated as the size of a sub buffer page (which is
currently PAGE_SIZE minus the sub buffer header) minus the size of the
meta data of an individual event. But it missed the possible adding of a
time stamp for events that are added long enough apart that the event meta
data can't hold the time delta.

When an event is added that is greater than the current BUF_MAX_DATA_SIZE
minus the size of a time stamp, but still less than or equal to
BUF_MAX_DATA_SIZE, the ring buffer would go into an infinite loop, looking
for a page that can hold the event. Luckily, there's a check for this loop
and after 1000 iterations and a warning is emitted and the ring buffer is
disabled. But this should never happen.

This can happen when a large event is added first, or after a long period
where an absolute timestamp is prefixed to the event, increasing its size
by 8 bytes. This passes the check and then goes into the algorithm that
causes the infinite loop.

For events that are the first event on the sub-buffer, it does not need to
add a timestamp, because the sub-buffer itself contains an absolute
timestamp, and adding one is redundant.

The fix is to check if the event is to be the first event on the
sub-buffer, and if it is, then do not add a timestamp.

This also fixes 32 bit adding a timestamp when a read of before_stamp or
write_stamp is interrupted. There's still no need to add that timestamp if
the event is going to be the first event on the sub buffer.

Also, if the buffer has "time_stamp_abs" set, then also check if the
length plus the timestamp is greater than the BUF_MAX_DATA_SIZE.

Link: https://lore.kernel.org/all/20231212104549.58863438@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20231212071837.5fdd6c13@gandalf.local.home
Link: https://lore.kernel.org/linux-trace-kernel/20231212111617.39e02849@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: a4543a2fa9ef3 ("ring-buffer: Get timestamp after event is allocated")
Fixes: 58fbc3c63275c ("ring-buffer: Consolidate add_timestamp to remove some branches")
Reported-by: Kent Overstreet <kent.overstreet@linux.dev> # (on IRC)
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoring-buffer: Have saved event hold the entire event
Steven Rostedt (Google) [Tue, 12 Dec 2023 12:25:58 +0000 (07:25 -0500)]
ring-buffer: Have saved event hold the entire event

commit b049525855fdd0024881c9b14b8fbec61c3f53d3 upstream.

For the ring buffer iterator (non-consuming read), the event needs to be
copied into the iterator buffer to make sure that a writer does not
overwrite it while the user is reading it. If a write happens during the
copy, the buffer is simply discarded.

But the temp buffer itself was not big enough. The allocation of the
buffer was only BUF_MAX_DATA_SIZE, which is the maximum data size that can
be passed into the ring buffer and saved. But the temp buffer needs to
hold the meta data as well. That would be BUF_PAGE_SIZE and not
BUF_MAX_DATA_SIZE.

Link: https://lore.kernel.org/linux-trace-kernel/20231212072558.61f76493@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 785888c544e04 ("ring-buffer: Have rb_iter_head_event() handle concurrent writer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoring-buffer: Do not update before stamp when switching sub-buffers
Steven Rostedt (Google) [Mon, 11 Dec 2023 16:44:20 +0000 (11:44 -0500)]
ring-buffer: Do not update before stamp when switching sub-buffers

commit 9e45e39dc249c970d99d2681f6bcb55736fd725c upstream.

The ring buffer timestamps are synchronized by two timestamp placeholders.
One is the "before_stamp" and the other is the "write_stamp" (sometimes
referred to as the "after stamp" but only in the comments. These two
stamps are key to knowing how to handle nested events coming in with a
lockless system.

When moving across sub-buffers, the before stamp is updated but the write
stamp is not. There's an effort to put back the before stamp to something
that seems logical in case there's nested events. But as the current event
is about to cross sub-buffers, and so will any new nested event that happens,
updating the before stamp is useless, and could even introduce new race
conditions.

The first event on a sub-buffer simply uses the sub-buffer's timestamp
and keeps a "delta" of zero. The "before_stamp" and "write_stamp" are not
used in the algorithm in this case. There's no reason to try to fix the
before_stamp when this happens.

As a bonus, it removes a cmpxchg() when crossing sub-buffers!

Link: https://lore.kernel.org/linux-trace-kernel/20231211114420.36dde01b@gandalf.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: a389d86f7fd09 ("ring-buffer: Have nested events still record running time stamp")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agotracing: Update snapshot buffer on resize if it is allocated
Steven Rostedt (Google) [Mon, 11 Dec 2023 03:54:47 +0000 (22:54 -0500)]
tracing: Update snapshot buffer on resize if it is allocated

commit d06aff1cb13d2a0d52b48e605462518149c98c81 upstream.

The snapshot buffer is to mimic the main buffer so that when a snapshot is
needed, the snapshot and main buffer are swapped. When the snapshot buffer
is allocated, it is set to the minimal size that the ring buffer may be at
and still functional. When it is allocated it becomes the same size as the
main ring buffer, and when the main ring buffer changes in size, it should
do.

Currently, the resize only updates the snapshot buffer if it's used by the
current tracer (ie. the preemptirqsoff tracer). But it needs to be updated
anytime it is allocated.

When changing the size of the main buffer, instead of looking to see if
the current tracer is utilizing the snapshot buffer, just check if it is
allocated to know if it should be updated or not.

Also fix typo in comment just above the code change.

Link: https://lore.kernel.org/linux-trace-kernel/20231210225447.48476a6a@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: ad909e21bbe69 ("tracing: Add internal tracing_snapshot() functions")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoring-buffer: Fix memory leak of free page
Steven Rostedt (Google) [Mon, 11 Dec 2023 03:12:50 +0000 (22:12 -0500)]
ring-buffer: Fix memory leak of free page

commit 17d801758157bec93f26faaf5ff1a8b9a552d67a upstream.

Reading the ring buffer does a swap of a sub-buffer within the ring buffer
with a empty sub-buffer. This allows the reader to have full access to the
content of the sub-buffer that was swapped out without having to worry
about contention with the writer.

The readers call ring_buffer_alloc_read_page() to allocate a page that
will be used to swap with the ring buffer. When the code is finished with
the reader page, it calls ring_buffer_free_read_page(). Instead of freeing
the page, it stores it as a spare. Then next call to
ring_buffer_alloc_read_page() will return this spare instead of calling
into the memory management system to allocate a new page.

Unfortunately, on freeing of the ring buffer, this spare page is not
freed, and causes a memory leak.

Link: https://lore.kernel.org/linux-trace-kernel/20231210221250.7b9cc83c@rorschach.local.home
Cc: stable@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agosmb: client: fix OOB in smb2_query_reparse_point()
Paulo Alcantara [Mon, 11 Dec 2023 13:26:43 +0000 (10:26 -0300)]
smb: client: fix OOB in smb2_query_reparse_point()

commit 3a42709fa909e22b0be4bb1e2795aa04ada732a3 upstream.

Validate @ioctl_rsp->OutputOffset and @ioctl_rsp->OutputCount so that
their sum does not wrap to a number that is smaller than @reparse_buf
and we end up with a wild pointer as follows:

  BUG: unable to handle page fault for address: ffff88809c5cd45f
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 4a01067 P4D 4a01067 PUD 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 1260 Comm: mount.cifs Not tainted 6.7.0-rc4 #2
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:smb2_query_reparse_point+0x3e0/0x4c0 [cifs]
  Code: ff ff e8 f3 51 fe ff 41 89 c6 58 5a 45 85 f6 0f 85 14 fe ff ff
  49 8b 57 48 8b 42 60 44 8b 42 64 42 8d 0c 00 49 39 4f 50 72 40 <8b>
  04 02 48 8b 9d f0 fe ff ff 49 8b 57 50 89 03 48 8b 9d e8 fe ff
  RSP: 0018:ffffc90000347a90 EFLAGS: 00010212
  RAX: 000000008000001f RBX: ffff88800ae11000 RCX: 00000000000000ec
  RDX: ffff88801c5cd440 RSI: 0000000000000000 RDI: ffffffff82004aa4
  RBP: ffffc90000347bb0 R08: 00000000800000cd R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000024 R12: ffff8880114d4100
  R13: ffff8880114d4198 R14: 0000000000000000 R15: ffff8880114d4000
  FS: 00007f02c07babc0(0000) GS:ffff88806ba00000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff88809c5cd45f CR3: 0000000011750000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __die+0x23/0x70
   ? page_fault_oops+0x181/0x480
   ? search_module_extables+0x19/0x60
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? exc_page_fault+0x1b6/0x1c0
   ? asm_exc_page_fault+0x26/0x30
   ? _raw_spin_unlock_irqrestore+0x44/0x60
   ? smb2_query_reparse_point+0x3e0/0x4c0 [cifs]
   cifs_get_fattr+0x16e/0xa50 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? lock_acquire+0xbf/0x2b0
   cifs_root_iget+0x163/0x5f0 [cifs]
   cifs_smb3_do_mount+0x5bd/0x780 [cifs]
   smb3_get_tree+0xd9/0x290 [cifs]
   vfs_get_tree+0x2c/0x100
   ? capable+0x37/0x70
   path_mount+0x2d7/0xb80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? _raw_spin_unlock_irqrestore+0x44/0x60
   __x64_sys_mount+0x11a/0x150
   do_syscall_64+0x47/0xf0
   entry_SYSCALL_64_after_hwframe+0x6f/0x77
  RIP: 0033:0x7f02c08d5b1e

Fixes: 2e4564b31b64 ("smb3: add support for stat of WSL reparse points for special file types")
Cc: stable@vger.kernel.org
Reported-by: Robert Morris <rtm@csail.mit.edu>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agosmb: client: fix NULL deref in asn1_ber_decoder()
Paulo Alcantara [Mon, 11 Dec 2023 13:26:42 +0000 (10:26 -0300)]
smb: client: fix NULL deref in asn1_ber_decoder()

commit 90d025c2e953c11974e76637977c473200593a46 upstream.

If server replied SMB2_NEGOTIATE with a zero SecurityBufferOffset,
smb2_get_data_area() sets @len to non-zero but return NULL, so
decode_negTokeninit() ends up being called with a NULL @security_blob:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 2 PID: 871 Comm: mount.cifs Not tainted 6.7.0-rc4 #2
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:asn1_ber_decoder+0x173/0xc80
  Code: 01 4c 39 2c 24 75 09 45 84 c9 0f 85 2f 03 00 00 48 8b 14 24 4c 29 ea 48 83 fa 01 0f 86 1e 07 00 00 48 8b 74 24 28 4d 8d 5d 01 <42> 0f b6 3c 2e 89 fa 40 88 7c 24 5c f7 d2 83 e2 1f 0f 84 3d 07 00
  RSP: 0018:ffffc9000063f950 EFLAGS: 00010202
  RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000004a
  RDX: 000000000000004a RSI: 0000000000000000 RDI: 0000000000000000
  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000002 R11: 0000000000000001 R12: 0000000000000000
  R13: 0000000000000000 R14: 000000000000004d R15: 0000000000000000
  FS:  00007fce52b0fbc0(0000) GS:ffff88806ba00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 000000001ae64000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __die+0x23/0x70
   ? page_fault_oops+0x181/0x480
   ? __stack_depot_save+0x1e6/0x480
   ? exc_page_fault+0x6f/0x1c0
   ? asm_exc_page_fault+0x26/0x30
   ? asn1_ber_decoder+0x173/0xc80
   ? check_object+0x40/0x340
   decode_negTokenInit+0x1e/0x30 [cifs]
   SMB2_negotiate+0xc99/0x17c0 [cifs]
   ? smb2_negotiate+0x46/0x60 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   smb2_negotiate+0x46/0x60 [cifs]
   cifs_negotiate_protocol+0xae/0x130 [cifs]
   cifs_get_smb_ses+0x517/0x1040 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? queue_delayed_work_on+0x5d/0x90
   cifs_mount_get_session+0x78/0x200 [cifs]
   dfs_mount_share+0x13a/0x9f0 [cifs]
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? lock_acquire+0xbf/0x2b0
   ? find_nls+0x16/0x80
   ? srso_alias_return_thunk+0x5/0xfbef5
   cifs_mount+0x7e/0x350 [cifs]
   cifs_smb3_do_mount+0x128/0x780 [cifs]
   smb3_get_tree+0xd9/0x290 [cifs]
   vfs_get_tree+0x2c/0x100
   ? capable+0x37/0x70
   path_mount+0x2d7/0xb80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? _raw_spin_unlock_irqrestore+0x44/0x60
   __x64_sys_mount+0x11a/0x150
   do_syscall_64+0x47/0xf0
   entry_SYSCALL_64_after_hwframe+0x6f/0x77
  RIP: 0033:0x7fce52c2ab1e

Fix this by setting @len to zero when @off == 0 so callers won't
attempt to dereference non-existing data areas.

Reported-by: Robert Morris <rtm@csail.mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agosmb: client: fix potential OOBs in smb2_parse_contexts()
Paulo Alcantara [Mon, 11 Dec 2023 13:26:41 +0000 (10:26 -0300)]
smb: client: fix potential OOBs in smb2_parse_contexts()

commit af1689a9b7701d9907dfc84d2a4b57c4bc907144 upstream.

Validate offsets and lengths before dereferencing create contexts in
smb2_parse_contexts().

This fixes following oops when accessing invalid create contexts from
server:

  BUG: unable to handle page fault for address: ffff8881178d8cc3
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 4a01067 P4D 4a01067 PUD 0
  Oops: 0000 [#1] PREEMPT SMP NOPTI
  CPU: 3 PID: 1736 Comm: mount.cifs Not tainted 6.7.0-rc4 #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
  rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
  RIP: 0010:smb2_parse_contexts+0xa0/0x3a0 [cifs]
  Code: f8 10 75 13 48 b8 93 ad 25 50 9c b4 11 e7 49 39 06 0f 84 d2 00
  00 00 8b 45 00 85 c0 74 61 41 29 c5 48 01 c5 41 83 fd 0f 76 55 <0f> b7
  7d 04 0f b7 45 06 4c 8d 74 3d 00 66 83 f8 04 75 bc ba 04 00
  RSP: 0018:ffffc900007939e0 EFLAGS: 00010216
  RAX: ffffc90000793c78 RBX: ffff8880180cc000 RCX: ffffc90000793c90
  RDX: ffffc90000793cc0 RSI: ffff8880178d8cc0 RDI: ffff8880180cc000
  RBP: ffff8881178d8cbf R08: ffffc90000793c22 R09: 0000000000000000
  R10: ffff8880180cc000 R11: 0000000000000024 R12: 0000000000000000
  R13: 0000000000000020 R14: 0000000000000000 R15: ffffc90000793c22
  FS: 00007f873753cbc0(0000) GS:ffff88806bc00000(0000)
  knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffff8881178d8cc3 CR3: 00000000181ca000 CR4: 0000000000750ef0
  PKRU: 55555554
  Call Trace:
   <TASK>
   ? __die+0x23/0x70
   ? page_fault_oops+0x181/0x480
   ? search_module_extables+0x19/0x60
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? exc_page_fault+0x1b6/0x1c0
   ? asm_exc_page_fault+0x26/0x30
   ? smb2_parse_contexts+0xa0/0x3a0 [cifs]
   SMB2_open+0x38d/0x5f0 [cifs]
   ? smb2_is_path_accessible+0x138/0x260 [cifs]
   smb2_is_path_accessible+0x138/0x260 [cifs]
   cifs_is_path_remote+0x8d/0x230 [cifs]
   cifs_mount+0x7e/0x350 [cifs]
   cifs_smb3_do_mount+0x128/0x780 [cifs]
   smb3_get_tree+0xd9/0x290 [cifs]
   vfs_get_tree+0x2c/0x100
   ? capable+0x37/0x70
   path_mount+0x2d7/0xb80
   ? srso_alias_return_thunk+0x5/0xfbef5
   ? _raw_spin_unlock_irqrestore+0x44/0x60
   __x64_sys_mount+0x11a/0x150
   do_syscall_64+0x47/0xf0
   entry_SYSCALL_64_after_hwframe+0x6f/0x77
  RIP: 0033:0x7f8737657b1e

Reported-by: Robert Morris <rtm@csail.mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agosmb: client: fix OOB in receive_encrypted_standard()
Paulo Alcantara [Mon, 11 Dec 2023 13:26:40 +0000 (10:26 -0300)]
smb: client: fix OOB in receive_encrypted_standard()

commit eec04ea119691e65227a97ce53c0da6b9b74b0b7 upstream.

Fix potential OOB in receive_encrypted_standard() if server returned a
large shdr->NextCommand that would end up writing off the end of
@next_buffer.

Fixes: b24df3e30cbf ("cifs: update receive_encrypted_standard to handle compounded responses")
Cc: stable@vger.kernel.org
Reported-by: Robert Morris <rtm@csail.mit.edu>
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodrm/i915: Fix remapped stride with CCS on ADL+
Ville Syrjälä [Tue, 5 Dec 2023 18:03:08 +0000 (20:03 +0200)]
drm/i915: Fix remapped stride with CCS on ADL+

commit 0ccd963fe555451b1f84e6d14d2b3ef03dd5c947 upstream.

On ADL+ the hardware automagically calculates the CCS AUX surface
stride from the main surface stride, so when remapping we can't
really play a lot of tricks with the main surface stride, or else
the AUX surface stride would get miscalculated and no longer
match the actual data layout in memory.

Supposedly we could remap in 256 main surface tile units
(AUX page(4096)/cachline(64)*4(4x1 main surface tiles per
AUX cacheline)=256 main surface tiles), but the extra complexity
is probably not worth the hassle.

So let's just make sure our mapping stride is calculated from
the full framebuffer stride (instead of the framebuffer width).
This way the stride we program into PLANE_STRIDE will be the
original framebuffer stride, and thus there will be no change
to the AUX stride/layout.

Cc: stable@vger.kernel.org
Cc: Imre Deak <imre.deak@intel.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231205180308.7505-1-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
(cherry picked from commit 2c12eb36f849256f5eb00ffaee9bf99396fd3814)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodrm/i915: Fix intel_atomic_setup_scalers() plane_state handling
Ville Syrjälä [Thu, 7 Dec 2023 19:34:34 +0000 (21:34 +0200)]
drm/i915: Fix intel_atomic_setup_scalers() plane_state handling

commit c3070f080f9ba18dea92eaa21730f7ab85b5c8f4 upstream.

Since the plane_state variable is declared outside the scaler_users
loop in intel_atomic_setup_scalers(), and it's never reset back to
NULL inside the loop we may end up calling intel_atomic_setup_scaler()
with a non-NULL plane state for the pipe scaling case. That is bad
because intel_atomic_setup_scaler() determines whether we are doing
plane scaling or pipe scaling based on plane_state!=NULL. The end
result is that we may miscalculate the scaler mode for pipe scaling.

The hardware becomes somewhat upset if we end up in this situation
when scanning out a planar format on a SDR plane. We end up
programming the pipe scaler into planar mode as well, and the
result is a screenfull of garbage.

Fix the situation by making sure we pass the correct plane_state==NULL
when calculating the scaler mode for pipe scaling.

Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231207193441.20206-2-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit e81144106e21271c619f0c722a09e27ccb8c043d)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodrm/i915: Fix ADL+ tiled plane stride when the POT stride is smaller than the original
Ville Syrjälä [Mon, 4 Dec 2023 20:24:43 +0000 (22:24 +0200)]
drm/i915: Fix ADL+ tiled plane stride when the POT stride is smaller than the original

commit 324b70e997aab0a7deab8cb90711faccda4e98c8 upstream.

plane_view_scanout_stride() currently assumes that we had to pad the
mapping stride with dummy pages in order to align it. But that is not
the case if the original fb stride exceeds the aligned stride used
to populate the remapped view, which is calculated from the user
specified framebuffer width rather than the user specified framebuffer
stride.

Ignore the original fb stride in this case and just stick to the POT
aligned stride. Getting this wrong will cause the plane to fetch the
wrong data, and can lead to fault errors if the page tables at the
bogus location aren't even populated.

TODO: figure out if this is OK for CCS, or if we should instead increase
the width of the view to cover the entire user specified fb stride
instead...

Cc: Imre Deak <imre.deak@intel.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231204202443.31247-1-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
(cherry picked from commit 01a39f1c4f1220a4e6a25729fae87ff5794cbc52)
Cc: stable@vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodrm/amd/display: Disable PSR-SU on Parade 0803 TCON again
Mario Limonciello [Mon, 19 Jun 2023 20:04:24 +0000 (15:04 -0500)]
drm/amd/display: Disable PSR-SU on Parade 0803 TCON again

commit e7ab758741672acb21c5d841a9f0309d30e48a06 upstream.

When screen brightness is rapidly changed and PSR-SU is enabled the
display hangs on panels with this TCON even on the latest DCN 3.1.4
microcode (0x8002a81 at this time).

This was disabled previously as commit 072030b17830 ("drm/amd: Disable
PSR-SU on Parade 0803 TCON") but reverted as commit 1e66a17ce546 ("Revert
"drm/amd: Disable PSR-SU on Parade 0803 TCON"") in favor of testing for
a new enough microcode (commit cd2e31a9ab93 ("drm/amd/display: Set minimum
requirement for using PSR-SU on Phoenix")).

As hangs are still happening specifically with this TCON, disable PSR-SU
again for it until it can be root caused.

Cc: stable@vger.kernel.org
Cc: aaron.ma@canonical.com
Cc: binli@gnome.org
Cc: Marc Rossi <Marc.Rossi@amd.com>
Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2046131
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodrm/amd/display: Restore guard against default backlight value < 1 nit
Mario Limonciello [Wed, 6 Dec 2023 18:08:26 +0000 (12:08 -0600)]
drm/amd/display: Restore guard against default backlight value < 1 nit

commit b96ab339ee50470d13a1faa6ad94d2218a7cd49f upstream.

Mark reports that brightness is not restored after Xorg dpms screen blank.

This behavior was introduced by commit d9e865826c20 ("drm/amd/display:
Simplify brightness initialization") which dropped the cached backlight
value in display code, but also removed code for when the default value
read back was less than 1 nit.

Restore this code so that the backlight brightness is restored to the
correct default value in this circumstance.

Reported-by: Mark Herbert <mark.herbert42@gmail.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3031
Cc: stable@vger.kernel.org
Cc: Camille Cho <camille.cho@amd.com>
Cc: Krunoslav Kovac <krunoslav.kovac@amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz@amd.com>
Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodrm/edid: also call add modes in EDID connector update fallback
Jani Nikula [Thu, 7 Dec 2023 09:38:21 +0000 (11:38 +0200)]
drm/edid: also call add modes in EDID connector update fallback

commit 759f14e20891de72e676d9d738eb2c573aa15f52 upstream.

When the separate add modes call was added back in commit c533b5167c7e
("drm/edid: add separate drm_edid_connector_add_modes()"), it failed to
address drm_edid_override_connector_update(). Also call add modes there.

Reported-by: bbaa <bbaa@bbaa.fun>
Closes: https://lore.kernel.org/r/930E9B4C7D91FDFF+29b34d89-8658-4910-966a-c772f320ea03@bbaa.fun
Fixes: c533b5167c7e ("drm/edid: add separate drm_edid_connector_add_modes()")
Cc: <stable@vger.kernel.org> # v6.3+
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231207093821.2654267-1-jani.nikula@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodrm/amdgpu: fix tear down order in amdgpu_vm_pt_free
Christian König [Fri, 8 Dec 2023 12:43:09 +0000 (13:43 +0100)]
drm/amdgpu: fix tear down order in amdgpu_vm_pt_free

commit ceb9a321e7639700844aa3bf234a4e0884f13b77 upstream.

When freeing PD/PT with shadows it can happen that the shadow
destruction races with detaching the PD/PT from the VM causing a NULL
pointer dereference in the invalidation code.

Fix this by detaching the the PD/PT from the VM first and then
freeing the shadow instead.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: https://gitlab.freedesktop.org/drm/amd/-/issues/2867
Cc: <stable@vger.kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agobtrfs: don't clear qgroup reserved bit in release_folio
Boris Burkov [Fri, 1 Dec 2023 21:00:12 +0000 (13:00 -0800)]
btrfs: don't clear qgroup reserved bit in release_folio

commit a86805504b88f636a6458520d85afdf0634e3c6b upstream.

The EXTENT_QGROUP_RESERVED bit is used to "lock" regions of the file for
duplicate reservations. That is two writes to that range in one
transaction shouldn't create two reservations, as the reservation will
only be freed once when the write finally goes down. Therefore, it is
never OK to clear that bit without freeing the associated qgroup
reserve. At this point, we don't want to be freeing the reserve, so mask
off the bit.

CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agobtrfs: fix qgroup_free_reserved_data int overflow
Boris Burkov [Fri, 1 Dec 2023 21:00:10 +0000 (13:00 -0800)]
btrfs: fix qgroup_free_reserved_data int overflow

commit 9e65bfca24cf1d77e4a5c7a170db5867377b3fe7 upstream.

The reserved data counter and input parameter is a u64, but we
inadvertently accumulate it in an int. Overflowing that int results in
freeing the wrong amount of data and breaking reserve accounting.

Unfortunately, this overflow rot spreads from there, as the qgroup
release/free functions rely on returning an int to take advantage of
negative values for error codes.

Therefore, the full fix is to return the "released" or "freed" amount by
a u64 argument and to return 0 or negative error code via the return
value.

Most of the call sites simply ignore the return value, though some
of them handle the error and count the returned bytes. Change all of
them accordingly.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agobtrfs: free qgroup reserve when ORDERED_IOERR is set
Boris Burkov [Fri, 1 Dec 2023 21:00:09 +0000 (13:00 -0800)]
btrfs: free qgroup reserve when ORDERED_IOERR is set

commit f63e1164b90b385cd832ff0fdfcfa76c3cc15436 upstream.

An ordered extent completing is a critical moment in qgroup reserve
handling, as the ownership of the reservation is handed off from the
ordered extent to the delayed ref. In the happy path we release (unlock)
but do not free (decrement counter) the reservation, and the delayed ref
drives the free. However, on an error, we don't create a delayed ref,
since there is no ref to add. Therefore, free on the error path.

CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agokexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP
Ignat Korchagin [Wed, 29 Nov 2023 22:04:09 +0000 (22:04 +0000)]
kexec: drop dependency on ARCH_SUPPORTS_KEXEC from CRASH_DUMP

commit c41bd2514184d75db087fe4c1221237fb7922875 upstream.

In commit f8ff23429c62 ("kernel/Kconfig.kexec: drop select of KEXEC for
CRASH_DUMP") we tried to fix a config regression, where CONFIG_CRASH_DUMP
required CONFIG_KEXEC.

However, it was not enough at least for arm64 platforms.  While further
testing the patch with our arm64 config I noticed that CONFIG_CRASH_DUMP
is unavailable in menuconfig.  This is because CONFIG_CRASH_DUMP still
depends on the new CONFIG_ARCH_SUPPORTS_KEXEC introduced in commit
91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec") and on
arm64 CONFIG_ARCH_SUPPORTS_KEXEC requires CONFIG_PM_SLEEP_SMP=y, which in
turn requires either CONFIG_SUSPEND=y or CONFIG_HIBERNATION=y neither of
which are set in our config.

Given that we already established that CONFIG_KEXEC (which is a switch for
kexec system call itself) is not required for CONFIG_CRASH_DUMP drop
CONFIG_ARCH_SUPPORTS_KEXEC dependency as well.  The arm64 kernel builds
just fine with CONFIG_CRASH_DUMP=y and with both CONFIG_KEXEC=n and
CONFIG_KEXEC_FILE=n after f8ff23429c62 ("kernel/Kconfig.kexec: drop select
of KEXEC for CRASH_DUMP") and this patch are applied given that the
necessary shared bits are included via CONFIG_KEXEC_CORE dependency.

[bhe@redhat.com: don't export some symbols when CONFIG_MMU=n]
Link: https://lkml.kernel.org/r/ZW03ODUKGGhP1ZGU@MiWiFi-R3L-srv
[bhe@redhat.com: riscv, kexec: fix dependency of two items]
Link: https://lkml.kernel.org/r/ZW04G/SKnhbE5mnX@MiWiFi-R3L-srv
Link: https://lkml.kernel.org/r/20231129220409.55006-1-ignat@cloudflare.com
Fixes: 91506f7e5d21 ("arm64/kexec: refactor for kernel/Kconfig.kexec")
Signed-off-by: Ignat Korchagin <ignat@cloudflare.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: <stable@vger.kernel.org> # 6.6+: f8ff234: kernel/Kconfig.kexec: drop select of KEXEC for CRASH_DUMP
Cc: <stable@vger.kernel.org> # 6.6+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agomm/shmem: fix race in shmem_undo_range w/THP
David Stevens [Tue, 18 Apr 2023 08:40:31 +0000 (17:40 +0900)]
mm/shmem: fix race in shmem_undo_range w/THP

commit 55ac8bbe358bdd2f3c044c12f249fd22d48fe015 upstream.

Split folios during the second loop of shmem_undo_range.  It's not
sufficient to only split folios when dealing with partial pages, since
it's possible for a THP to be faulted in after that point.  Calling
truncate_inode_folio in that situation can result in throwing away data
outside of the range being targeted.

[akpm@linux-foundation.org: tidy up comment layout]
Link: https://lkml.kernel.org/r/20230418084031.3439795-1-stevensd@google.com
Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios")
Signed-off-by: David Stevens <stevensd@chromium.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agomm/mglru: reclaim offlined memcgs harder
Yu Zhao [Fri, 8 Dec 2023 06:14:07 +0000 (23:14 -0700)]
mm/mglru: reclaim offlined memcgs harder

commit 4376807bf2d5371c3e00080c972be568c3f8a7d1 upstream.

In the effort to reduce zombie memcgs [1], it was discovered that the
memcg LRU doesn't apply enough pressure on offlined memcgs.  Specifically,
instead of rotating them to the tail of the current generation
(MEMCG_LRU_TAIL) for a second attempt, it moves them to the next
generation (MEMCG_LRU_YOUNG) after the first attempt.

Not applying enough pressure on offlined memcgs can cause them to build
up, and this can be particularly harmful to memory-constrained systems.

On Pixel 8 Pro, launching apps for 50 cycles:
                 Before  After  Change
  Zombie memcgs  45      35     -22%

[1] https://lore.kernel.org/CABdmKX2M6koq4Q0Cmp_-=wbP0Qa190HdEGGaHfxNS05gAkUtPA@mail.gmail.com/

Link: https://lkml.kernel.org/r/20231208061407.2125867-4-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: T.J. Mercier <tjmercier@google.com>
Tested-by: T.J. Mercier <tjmercier@google.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agomm/mglru: respect min_ttl_ms with memcgs
Yu Zhao [Fri, 8 Dec 2023 06:14:06 +0000 (23:14 -0700)]
mm/mglru: respect min_ttl_ms with memcgs

commit 8aa420617918d12d1f5d55030a503c9418e73c2c upstream.

While investigating kswapd "consuming 100% CPU" [1] (also see "mm/mglru:
try to stop at high watermarks"), it was discovered that the memcg LRU can
breach the thrashing protection imposed by min_ttl_ms.

Before the memcg LRU:
  kswapd()
    shrink_node_memcgs()
      mem_cgroup_iter()
        inc_max_seq()  // always hit a different memcg
    lru_gen_age_node()
      mem_cgroup_iter()
        check the timestamp of the oldest generation

After the memcg LRU:
  kswapd()
    shrink_many()
      restart:
        iterate the memcg LRU:
          inc_max_seq()  // occasionally hit the same memcg
          if raced with lru_gen_rotate_memcg():
            goto restart
    lru_gen_age_node()
      mem_cgroup_iter()
        check the timestamp of the oldest generation

Specifically, when the restart happens in shrink_many(), it needs to stick
with the (memcg LRU) generation it began with.  In other words, it should
neither re-read memcg_lru->seq nor age an lruvec of a different
generation.  Otherwise it can hit the same memcg multiple times without
giving lru_gen_age_node() a chance to check the timestamp of that memcg's
oldest generation (against min_ttl_ms).

[1] https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/

Link: https://lkml.kernel.org/r/20231208061407.2125867-3-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Tested-by: T.J. Mercier <tjmercier@google.com>
Cc: Charan Teja Kalla <quic_charante@quicinc.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agomm/mglru: try to stop at high watermarks
Yu Zhao [Fri, 8 Dec 2023 06:14:05 +0000 (23:14 -0700)]
mm/mglru: try to stop at high watermarks

commit 5095a2b23987d3c3c47dd16b3d4080e2733b8bb9 upstream.

The initial MGLRU patchset didn't include the memcg LRU support, and it
relied on should_abort_scan(), added by commit f76c83378851 ("mm:
multi-gen LRU: optimize multiple memcgs"), to "backoff to avoid
overshooting their aggregate reclaim target by too much".

Later on when the memcg LRU was added, should_abort_scan() was deemed
unnecessary, and the test results [1] showed no side effects after it was
removed by commit a579086c99ed ("mm: multi-gen LRU: remove eviction
fairness safeguard").

However, that test used memory.reclaim, which sets nr_to_reclaim to
SWAP_CLUSTER_MAX.  So it can overshoot only by SWAP_CLUSTER_MAX-1 pages,
i.e., from nr_reclaimed=nr_to_reclaim-1 to
nr_reclaimed=nr_to_reclaim+SWAP_CLUSTER_MAX-1.  Compared with the batch
size kswapd sets to nr_to_reclaim, SWAP_CLUSTER_MAX is tiny.  Therefore
that test isn't able to reproduce the worst case scenario, i.e., kswapd
overshooting GBs on large systems and "consuming 100% CPU" (see the Closes
tag).

Bring back a simplified version of should_abort_scan() on top of the memcg
LRU, so that kswapd stops when all eligible zones are above their
respective high watermarks plus a small delta to lower the chance of
KSWAPD_HIGH_WMARK_HIT_QUICKLY.  Note that this only applies to order-0
reclaim, meaning compaction-induced reclaim can still run wild (which is a
different problem).

On Android, launching 55 apps sequentially:
           Before     After      Change
  pgpgin   838377172  802955040  -4%
  pgpgout  38037080   34336300   -10%

[1] https://lore.kernel.org/20221222041905.2431096-1-yuzhao@google.com/

Link: https://lkml.kernel.org/r/20231208061407.2125867-2-yuzhao@google.com
Fixes: a579086c99ed ("mm: multi-gen LRU: remove eviction fairness safeguard")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Closes: https://lore.kernel.org/CAK8fFZ4DY+GtBA40Pm7Nn5xCHy+51w3sfxPqkqpqakSXYyX+Wg@mail.gmail.com/
Tested-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Tested-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agomm/mglru: fix underprotected page cache
Yu Zhao [Fri, 8 Dec 2023 06:14:04 +0000 (23:14 -0700)]
mm/mglru: fix underprotected page cache

commit 081488051d28d32569ebb7c7a23572778b2e7d57 upstream.

Unmapped folios accessed through file descriptors can be underprotected.
Those folios are added to the oldest generation based on:

1. The fact that they are less costly to reclaim (no need to walk the
   rmap and flush the TLB) and have less impact on performance (don't
   cause major PFs and can be non-blocking if needed again).
2. The observation that they are likely to be single-use. E.g., for
   client use cases like Android, its apps parse configuration files
   and store the data in heap (anon); for server use cases like MySQL,
   it reads from InnoDB files and holds the cached data for tables in
   buffer pools (anon).

However, the oldest generation can be very short lived, and if so, it
doesn't provide the PID controller with enough time to respond to a surge
of refaults.  (Note that the PID controller uses weighted refaults and
those from evicted generations only take a half of the whole weight.) In
other words, for a short lived generation, the moving average smooths out
the spike quickly.

To fix the problem:
1. For folios that are already on LRU, if they can be beyond the
   tracking range of tiers, i.e., five accesses through file
   descriptors, move them to the second oldest generation to give them
   more time to age. (Note that tiers are used by the PID controller
   to statistically determine whether folios accessed multiple times
   through file descriptors are worth protecting.)
2. When adding unmapped folios to LRU, adjust the placement of them so
   that they are not too close to the tail. The effect of this is
   similar to the above.

On Android, launching 55 apps sequentially:
                           Before     After      Change
  workingset_refault_anon  25641024   25598972   0%
  workingset_refault_file  115016834  106178438  -8%

Link: https://lkml.kernel.org/r/20231208061407.2125867-1-yuzhao@google.com
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
Tested-by: Kalesh Singh <kaleshsingh@google.com>
Cc: T.J. Mercier <tjmercier@google.com>
Cc: Kairui Song <ryncsn@gmail.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodmaengine: fsl-edma: fix DMA channel leak in eDMAv4
Frank Li [Mon, 27 Nov 2023 21:43:25 +0000 (16:43 -0500)]
dmaengine: fsl-edma: fix DMA channel leak in eDMAv4

commit 4ee632c82d2dbb9e2dcc816890ef182a151cbd99 upstream.

Allocate channel count consistently increases due to a missing source ID
(srcid) cleanup in the fsl_edma_free_chan_resources() function at imx93
eDMAv4.

Reset 'srcid' at fsl_edma_free_chan_resources().

Cc: stable@vger.kernel.org
Fixes: 72f5801a4e2b ("dmaengine: fsl-edma: integrate v3 support")
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Link: https://lore.kernel.org/r/20231127214325.2477247-1-Frank.Li@nxp.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodmaengine: stm32-dma: avoid bitfield overflow assertion
Amelie Delaunay [Mon, 6 Nov 2023 13:48:32 +0000 (14:48 +0100)]
dmaengine: stm32-dma: avoid bitfield overflow assertion

commit 54bed6bafa0f38daf9697af50e3aff5ff1354fe1 upstream.

stm32_dma_get_burst() returns a negative error for invalid input, which
gets turned into a large u32 value in stm32_dma_prep_dma_memcpy() that
in turn triggers an assertion because it does not fit into a two-bit field:
drivers/dma/stm32-dma.c: In function 'stm32_dma_prep_dma_memcpy':
include/linux/compiler_types.h:354:38: error: call to '__compiletime_assert_282' declared with attribute error: FIELD_PREP: value too large for the field
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                         ^
   include/linux/compiler_types.h:335:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler_types.h:354:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/bitfield.h:68:3: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ?  \
      ^~~~~~~~~~~~~~~~
   include/linux/bitfield.h:114:3: note: in expansion of macro '__BF_FIELD_CHECK'
      __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_PREP: "); \
      ^~~~~~~~~~~~~~~~
   drivers/dma/stm32-dma.c:1237:4: note: in expansion of macro 'FIELD_PREP'
       FIELD_PREP(STM32_DMA_SCR_PBURST_MASK, dma_burst) |
       ^~~~~~~~~~

As an easy workaround, assume the error can happen, so try to handle this
by failing stm32_dma_prep_dma_memcpy() before the assertion. It replicates
what is done in stm32_dma_set_xfer_param() where stm32_dma_get_burst() is
also used.

Fixes: 1c32d6c37cc2 ("dmaengine: stm32-dma: use bitfield helpers")
Fixes: a2b6103b7a8a ("dmaengine: stm32-dma: Improve memory burst management")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Amelie Delaunay <amelie.delaunay@foss.st.com>
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311060135.Q9eMnpCL-lkp@intel.com/
Link: https://lore.kernel.org/r/20231106134832.1470305-1-amelie.delaunay@foss.st.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodrm/mediatek: Fix access violation in mtk_drm_crtc_dma_dev_get
Stuart Lee [Fri, 10 Nov 2023 01:29:14 +0000 (09:29 +0800)]
drm/mediatek: Fix access violation in mtk_drm_crtc_dma_dev_get

commit b6961d187fcd138981b8707dac87b9fcdbfe75d1 upstream.

Add error handling to check NULL input in
mtk_drm_crtc_dma_dev_get function.

While display path is not configured correctly, none of crtc is
established. So the caller of mtk_drm_crtc_dma_dev_get may pass
input parameter *crtc as NULL, Which may cause coredump when
we try to get the container of NULL pointer.

Fixes: cb1d6bcca542 ("drm/mediatek: Add dma dev get function")
Signed-off-by: Stuart Lee <stuart.lee@mediatek.com>
Cc: stable@vger.kernel.org
Reviewed-by: AngeloGioacchino DEl Regno <angelogioacchino.delregno@collabora.com>
Tested-by: Macpaul Lin <macpaul.lin@mediatek.com>
Link: https://patchwork.kernel.org/project/dri-devel/patch/20231110012914.14884-2-stuart.lee@mediatek.com/
Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agodrm/amdgpu/sdma5.2: add begin/end_use ring callbacks
Alex Deucher [Thu, 7 Dec 2023 15:14:41 +0000 (10:14 -0500)]
drm/amdgpu/sdma5.2: add begin/end_use ring callbacks

commit ab4750332dbe535243def5dcebc24ca00c1f98ac upstream.

Add begin/end_use ring callbacks to disallow GFXOFF when
SDMA work is submitted and allow it again afterward.

This should avoid corner cases where GFXOFF is erroneously
entered when SDMA is still active.  For now just allow/disallow
GFXOFF in the begin and end helpers until we root cause the
issue.  This should not impact power as SDMA usage is pretty
minimal and GFXOSS should not be active when SDMA is active
anyway, this just makes it explicit.

v2: move everything into sdma5.2 code.  No reason for this
to be generic at this point.
v3: Add comments in new code

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2220
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> (v1)
Tested-by: Mario Limonciello <mario.limonciello@amd.com> (v1)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.15+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoteam: Fix use-after-free when an option instance allocation fails
Florent Revest [Wed, 6 Dec 2023 12:37:18 +0000 (13:37 +0100)]
team: Fix use-after-free when an option instance allocation fails

commit c12296bbecc488623b7d1932080e394d08f3226b upstream.

In __team_options_register, team_options are allocated and appended to
the team's option_list.
If one option instance allocation fails, the "inst_rollback" cleanup
path frees the previously allocated options but doesn't remove them from
the team's option_list.
This leaves dangling pointers that can be dereferenced later by other
parts of the team driver that iterate over options.

This patch fixes the cleanup path to remove the dangling pointers from
the list.

As far as I can tell, this uaf doesn't have much security implications
since it would be fairly hard to exploit (an attacker would need to make
the allocation of that specific small object fail) but it's still nice
to fix.

Cc: stable@vger.kernel.org
Fixes: 80f7c6683fe0 ("team: add support for per-port options")
Signed-off-by: Florent Revest <revest@chromium.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20231206123719.1963153-1-revest@chromium.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoarm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify
James Houghton [Mon, 4 Dec 2023 17:26:46 +0000 (17:26 +0000)]
arm64: mm: Always make sw-dirty PTEs hw-dirty in pte_modify

commit 3c0696076aad60a2f04c019761921954579e1b0e upstream.

It is currently possible for a userspace application to enter an
infinite page fault loop when using HugeTLB pages implemented with
contiguous PTEs when HAFDBS is not available. This happens because:

1. The kernel may sometimes write PTEs that are sw-dirty but hw-clean
   (PTE_DIRTY | PTE_RDONLY | PTE_WRITE).

2. If, during a write, the CPU uses a sw-dirty, hw-clean PTE in handling
   the memory access on a system without HAFDBS, we will get a page
   fault.

3. HugeTLB will check if it needs to update the dirty bits on the PTE.
   For contiguous PTEs, it will check to see if the pgprot bits need
   updating. In this case, HugeTLB wants to write a sequence of
   sw-dirty, hw-dirty PTEs, but it finds that all the PTEs it is about
   to overwrite are all pte_dirty() (pte_sw_dirty() => pte_dirty()),
   so it thinks no update is necessary.

We can get the kernel to write a sw-dirty, hw-clean PTE with the
following steps (showing the relevant VMA flags and pgprot bits):

i.   Create a valid, writable contiguous PTE.
       VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
       VMA pgprot bits: PTE_RDONLY | PTE_WRITE
       PTE pgprot bits: PTE_DIRTY | PTE_WRITE

ii.  mprotect the VMA to PROT_NONE.
       VMA vmflags:     VM_SHARED
       VMA pgprot bits: PTE_RDONLY
       PTE pgprot bits: PTE_DIRTY | PTE_RDONLY

iii. mprotect the VMA back to PROT_READ | PROT_WRITE.
       VMA vmflags:     VM_SHARED | VM_READ | VM_WRITE
       VMA pgprot bits: PTE_RDONLY | PTE_WRITE
       PTE pgprot bits: PTE_DIRTY | PTE_WRITE | PTE_RDONLY

Make it impossible to create a writeable sw-dirty, hw-clean PTE with
pte_modify(). Such a PTE should be impossible to create, and there may
be places that assume that pte_dirty() implies pte_hw_dirty().

Signed-off-by: James Houghton <jthoughton@google.com>
Fixes: 031e6e6b4e12 ("arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags")
Cc: <stable@vger.kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20231204172646.2541916-3-jthoughton@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoRevert "selftests: error out if kernel header files are not yet built"
John Hubbard [Sat, 9 Dec 2023 02:01:44 +0000 (18:01 -0800)]
Revert "selftests: error out if kernel header files are not yet built"

commit 43e8832fed08438e2a27afed9bac21acd0ceffe5 upstream.

This reverts commit 9fc96c7c19df ("selftests: error out if kernel header
files are not yet built").

It turns out that requiring the kernel headers to be built as a
prerequisite to building selftests, does not work in many cases. For
example, Peter Zijlstra writes:

"My biggest beef with the whole thing is that I simply do not want to use
'make headers', it doesn't work for me.

I have a ton of output directories and I don't care to build tools into
the output dirs, in fact some of them flat out refuse to work that way
(bpf comes to mind)." [1]

Therefore, stop erroring out on the selftests build. Additional patches
will be required in order to change over to not requiring the kernel
headers.

[1] https://lore.kernel.org/20231208221007.GO28727@noisy.programming.kicks-ass.net

Link: https://lkml.kernel.org/r/20231209020144.244759-1-jhubbard@nvidia.com
Fixes: 9fc96c7c19df ("selftests: error out if kernel header files are not yet built")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Marcos Paulo de Souza <mpdesouza@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agoext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS
Baokun Li [Mon, 27 Nov 2023 06:33:13 +0000 (14:33 +0800)]
ext4: prevent the normalized size from exceeding EXT_MAX_BLOCKS

commit 2dcf5fde6dffb312a4bfb8ef940cea2d1f402e32 upstream.

For files with logical blocks close to EXT_MAX_BLOCKS, the file size
predicted in ext4_mb_normalize_request() may exceed EXT_MAX_BLOCKS.
This can cause some blocks to be preallocated that will not be used.
And after [Fixes], the following issue may be triggered:

=========================================================
 kernel BUG at fs/ext4/mballoc.c:4653!
 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
 CPU: 1 PID: 2357 Comm: xfs_io 6.7.0-rc2-00195-g0f5cc96c367f
 Hardware name: linux,dummy-virt (DT)
 pc : ext4_mb_use_inode_pa+0x148/0x208
 lr : ext4_mb_use_inode_pa+0x98/0x208
 Call trace:
  ext4_mb_use_inode_pa+0x148/0x208
  ext4_mb_new_inode_pa+0x240/0x4a8
  ext4_mb_use_best_found+0x1d4/0x208
  ext4_mb_try_best_found+0xc8/0x110
  ext4_mb_regular_allocator+0x11c/0xf48
  ext4_mb_new_blocks+0x790/0xaa8
  ext4_ext_map_blocks+0x7cc/0xd20
  ext4_map_blocks+0x170/0x600
  ext4_iomap_begin+0x1c0/0x348
=========================================================

Here is a calculation when adjusting ac_b_ex in ext4_mb_new_inode_pa():

ex.fe_logical = orig_goal_end - EXT4_C2B(sbi, ex.fe_len);
if (ac->ac_o_ex.fe_logical >= ex.fe_logical)
goto adjust_bex;

The problem is that when orig_goal_end is subtracted from ac_b_ex.fe_len
it is still greater than EXT_MAX_BLOCKS, which causes ex.fe_logical to
overflow to a very small value, which ultimately triggers a BUG_ON in
ext4_mb_new_inode_pa() because pa->pa_free < len.

The last logical block of an actual write request does not exceed
EXT_MAX_BLOCKS, so in ext4_mb_normalize_request() also avoids normalizing
the last logical block to exceed EXT_MAX_BLOCKS to avoid the above issue.

The test case in [Link] can reproduce the above issue with 64k block size.

Link: https://patchwork.kernel.org/project/fstests/list/?series=804003
Cc: <stable@kernel.org> # 6.4
Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20231127063313.3734294-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agosoundwire: stream: fix NULL pointer dereference for multi_link
Krzysztof Kozlowski [Fri, 24 Nov 2023 18:01:36 +0000 (19:01 +0100)]
soundwire: stream: fix NULL pointer dereference for multi_link

commit e199bf52ffda8f98f129728d57244a9cd9ad5623 upstream.

If bus is marked as multi_link, but number of masters in the stream is
not higher than bus->hw_sync_min_links (bus->multi_link && m_rt_count >=
bus->hw_sync_min_links), bank switching should not happen.  The first
part of do_bank_switch() code properly takes these conditions into
account, but second part (sdw_ml_sync_bank_switch()) relies purely on
bus->multi_link property.  This is not balanced and leads to NULL
pointer dereference:

  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
  ...
  Call trace:
   wait_for_completion_timeout+0x124/0x1f0
   do_bank_switch+0x370/0x6f8
   sdw_prepare_stream+0x2d0/0x438
   qcom_snd_sdw_prepare+0xa0/0x118
   sm8450_snd_prepare+0x128/0x148
   snd_soc_link_prepare+0x5c/0xe8
   __soc_pcm_prepare+0x28/0x1ec
   dpcm_be_dai_prepare+0x1e0/0x2c0
   dpcm_fe_dai_prepare+0x108/0x28c
   snd_pcm_do_prepare+0x44/0x68
   snd_pcm_action_single+0x54/0xc0
   snd_pcm_action_nonatomic+0xe4/0xec
   snd_pcm_prepare+0xc4/0x114
   snd_pcm_common_ioctl+0x1154/0x1cc0
   snd_pcm_ioctl+0x54/0x74

Fixes: ce6e74d008ff ("soundwire: Add support for multi link bank switch")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20231124180136.390621-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
12 months agocxl/hdm: Fix dpa translation locking
Dan Williams [Thu, 7 Dec 2023 03:11:14 +0000 (19:11 -0800)]
cxl/hdm: Fix dpa translation locking

commit 6f5c4eca48ffe18307b4e1d375817691c9005c87 upstream.

The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an
endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is
sufficient to assert that cxl_dpa_rwsem is held rather than acquire it
in the helper. Otherwise, it triggers multiple lockdep reports:

1/ Tracing callbacks are in an atomic context that can not acquire sleeping
locks:

    BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash
    preempt_count: 2, expected: 0
    RCU nest depth: 0, expected: 0
    [..]
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
    Call Trace:
     <TASK>
     dump_stack_lvl+0x71/0x90
     __might_resched+0x1b2/0x2c0
     down_read+0x1a/0x190
     cxl_dpa_resource_start+0x15/0x50 [cxl_core]
     cxl_trace_hpa+0x122/0x300 [cxl_core]
     trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]

2/ The rwsem is already held in the inject poison path:

    WARNING: possible recursive locking detected
    6.7.0-rc2+ #12 Tainted: G        W  OE    N
    --------------------------------------------
    bash/1288 is trying to acquire lock:
    ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]

    but task is already holding lock:
    ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core]
    [..]
    Call Trace:
     <TASK>
     dump_stack_lvl+0x71/0x90
     __might_resched+0x1b2/0x2c0
     down_read+0x1a/0x190
     cxl_dpa_resource_start+0x15/0x50 [cxl_core]
     cxl_trace_hpa+0x122/0x300 [cxl_core]
     trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
     __traceiter_cxl_poison+0x5c/0x80 [cxl_core]
     cxl_inject_poison+0x1bc/0x1e0 [cxl_core]

This appears to have been an issue since the initial implementation and
uncovered by the new cxl-poison.sh test [1]. That test is now passing with
these changes.

Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com
Cc: <stable@vger.kernel.org>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>