Jakub Kicinski [Fri, 25 Feb 2022 05:54:56 +0000 (21:54 -0800)]
Merge branch 'mptcp-fixes-for-5-17'
Mat Martineau says:
====================
mptcp: Fixes for 5.17
Patch 1 fixes an issue with the SIOCOUTQ ioctl in MPTCP sockets that
have performed a fallback to TCP.
Patch 2 is a selftest fix to correctly remove temp files.
Patch 3 fixes a shift-out-of-bounds issue found by syzkaller.
====================
Link: https://lore.kernel.org/r/20220225005259.318898-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mat Martineau [Fri, 25 Feb 2022 00:52:59 +0000 (16:52 -0800)]
mptcp: Correctly set DATA_FIN timeout when number of retransmits is large
Syzkaller with UBSAN uncovered a scenario where a large number of
DATA_FIN retransmits caused a shift-out-of-bounds in the DATA_FIN
timeout calculation:
================================================================================
UBSAN: shift-out-of-bounds in net/mptcp/protocol.c:470:29
shift exponent 32 is too large for 32-bit type 'unsigned int'
CPU: 1 PID: 13059 Comm: kworker/1:0 Not tainted
5.17.0-rc2-00630-g5fbf21c90c60 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
ubsan_epilogue+0xb/0x5a lib/ubsan.c:151
__ubsan_handle_shift_out_of_bounds.cold+0xb2/0x20e lib/ubsan.c:330
mptcp_set_datafin_timeout net/mptcp/protocol.c:470 [inline]
__mptcp_retrans.cold+0x72/0x77 net/mptcp/protocol.c:2445
mptcp_worker+0x58a/0xa70 net/mptcp/protocol.c:2528
process_one_work+0x9df/0x16d0 kernel/workqueue.c:2307
worker_thread+0x95/0xe10 kernel/workqueue.c:2454
kthread+0x2f4/0x3b0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
</TASK>
================================================================================
This change limits the maximum timeout by limiting the size of the
shift, which keeps all intermediate values in-bounds.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/259
Fixes: 6477dd39e62c ("mptcp: Retransmit DATA_FIN")
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 25 Feb 2022 00:52:58 +0000 (16:52 -0800)]
selftests: mptcp: do complete cleanup at exit
After commit
05be5e273c84 ("selftests: mptcp: add disconnect tests")
the mptcp selftests leave behind a couple of tmp files after
each run. run_tests_disconnect() misnames a few variables used to
track them. Address the issue setting the appropriate global variables
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Fri, 25 Feb 2022 00:52:57 +0000 (16:52 -0800)]
mptcp: accurate SIOCOUTQ for fallback socket
The MPTCP SIOCOUTQ implementation is not very accurate in
case of fallback: it only measures the data in the MPTCP-level
write queue, but it does not take in account the subflow
write queue utilization. In case of fallback the first can be
empty, while the latter is not.
The above produces sporadic self-tests issues and can foul
legit user-space application.
Fix the issue additionally querying the subflow in case of fallback.
Fixes: 644807e3e462 ("mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/260
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 25 Feb 2022 02:13:30 +0000 (18:13 -0800)]
Merge tag 'for-net-2022-02-24' of git://git./linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- Fix regression with RFCOMM
- Fix regression with LE devices using Privacy (RPA)
- Fix regression with LE devices not waiting proper timeout to
establish connections
- Fix race in smp
* tag 'for-net-2022-02-24' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: hci_sync: Fix not using conn_timeout
Bluetooth: hci_sync: Fix hci_update_accept_list_sync
Bluetooth: assign len after null check
Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
Bluetooth: fix data races in smp_unregister(), smp_del_chan()
Bluetooth: hci_core: Fix leaking sent_cmd skb
====================
Link: https://lore.kernel.org/r/20220224210838.197787-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 24 Feb 2022 21:19:57 +0000 (13:19 -0800)]
Merge tag 'pci-v5.17-fixes-5' of git://git./linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Fix a merge error that broke PCI device enumeration on mvebu
platforms, including Turris Omnia (Armada 385) (Pali Rohár)
- Avoid using ATS on all AMD Navi10 and Navi14 GPUs because some
VBIOSes don't account for "harvested" (disabled) parts of the chip
when initializing caches (Alex Deucher)
* tag 'pci-v5.17-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
PCI: mvebu: Fix device enumeration regression
Linus Torvalds [Thu, 24 Feb 2022 20:45:32 +0000 (12:45 -0800)]
Merge tag 'net-5.17-rc6' of git://git./linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
Current release - regressions:
- bpf: fix crash due to out of bounds access into reg2btf_ids
- mvpp2: always set port pcs ops, avoid null-deref
- eth: marvell: fix driver load from initrd
- eth: intel: revert "Fix reset bw limit when DCB enabled with 1 TC"
Current release - new code bugs:
- mptcp: fix race in overlapping signal events
Previous releases - regressions:
- xen-netback: revert hotplug-status changes causing devices to not
be configured
- dsa:
- avoid call to __dev_set_promiscuity() while rtnl_mutex isn't
held
- fix panic when removing unoffloaded port from bridge
- dsa: microchip: fix bridging with more than two member ports
Previous releases - always broken:
- bpf:
- fix crash due to incorrect copy_map_value when both spin lock
and timer are present in a single value
- fix a bpf_timer initialization issue with clang
- do not try bpf_msg_push_data with len 0
- add schedule points in batch ops
- nf_tables:
- unregister flowtable hooks on netns exit
- correct flow offload action array size
- fix a couple of memory leaks
- vsock: don't check owner in vhost_vsock_stop() while releasing
- gso: do not skip outer ip header in case of ipip and net_failover
- smc: use a mutex for locking "struct smc_pnettable"
- openvswitch: fix setting ipv6 fields causing hw csum failure
- mptcp: fix race in incoming ADD_ADDR option processing
- sysfs: add check for netdevice being present to speed_show
- sched: act_ct: fix flow table lookup after ct clear or switching
zones
- eth: intel: fixes for SR-IOV forwarding offloads
- eth: broadcom: fixes for selftests and error recovery
- eth: mellanox: flow steering and SR-IOV forwarding fixes
Misc:
- make __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor
friends not report freed skbs as drops
- force inlining of checksum functions in net/checksum.h"
* tag 'net-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: mv643xx_eth: process retval from of_get_mac_address
ping: remove pr_err from ping_lookup
Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
openvswitch: Fix setting ipv6 fields causing hw csum failure
ipv6: prevent a possible race condition with lifetimes
net/smc: Use a mutex for locking "struct smc_pnettable"
bnx2x: fix driver load from initrd
Revert "xen-netback: Check for hotplug-status existence before watching"
Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
net/mlx5e: Fix VF min/max rate parameters interchange mistake
net/mlx5e: Add missing increment of count
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
net/mlx5e: Add feature check for set fec counters
net/mlx5e: TC, Skip redundant ct clear actions
net/mlx5e: TC, Reject rules with forward and drop actions
net/mlx5e: TC, Reject rules with drop and modify hdr action
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
net/mlx5: Fix possible deadlock on rule deletion
...
Luiz Augusto von Dentz [Thu, 17 Feb 2022 21:10:38 +0000 (13:10 -0800)]
Bluetooth: hci_sync: Fix not using conn_timeout
When using hci_le_create_conn_sync it shall wait for the conn_timeout
since the connection complete may take longer than just 2 seconds.
Also fix the masking of HCI_EV_LE_ENHANCED_CONN_COMPLETE and
HCI_EV_LE_CONN_COMPLETE so they are never both set so we can predict
which one the controller will use in case of HCI_OP_LE_CREATE_CONN.
Fixes: 6cd29ec6ae5e3 ("Bluetooth: hci_sync: Wait for proper events when connecting LE")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Luiz Augusto von Dentz [Thu, 24 Feb 2022 15:11:47 +0000 (07:11 -0800)]
Bluetooth: hci_sync: Fix hci_update_accept_list_sync
hci_update_accept_list_sync is returning the filter based on the error
but that gets overwritten by hci_le_set_addr_resolution_enable_sync
return instead of using the actual result of the likes of
hci_le_add_accept_list_sync which was intended.
Fixes: ad383c2c65a5b ("Bluetooth: hci_sync: Enable advertising when LL privacy is enabled")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Wang Qing [Tue, 15 Feb 2022 02:01:56 +0000 (18:01 -0800)]
Bluetooth: assign len after null check
len should be assigned after a null check
Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Luiz Augusto von Dentz [Tue, 15 Feb 2022 01:59:38 +0000 (17:59 -0800)]
Bluetooth: Fix bt_skb_sendmmsg not allocating partial chunks
Since bt_skb_sendmmsg can be used with the likes of SOCK_STREAM it
shall return the partial chunks it could allocate instead of freeing
everything as otherwise it can cause problems like bellow.
Fixes: 81be03e026dc ("Bluetooth: RFCOMM: Replace use of memcpy_from_msg with bt_skb_sendmmsg")
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/r/d7206e12-1b99-c3be-84f4-df22af427ef5@molgen.mpg.de
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215594
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> (Nokia N9 (MeeGo/Harmattan)
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Lin Ma [Wed, 16 Feb 2022 04:37:14 +0000 (12:37 +0800)]
Bluetooth: fix data races in smp_unregister(), smp_del_chan()
Previous commit
e04480920d1e ("Bluetooth: defer cleanup of resources
in hci_unregister_dev()") defers all destructive actions to
hci_release_dev() to prevent cocurrent problems like NPD, UAF.
However, there are still some exceptions that are ignored.
The smp_unregister() in hci_dev_close_sync() (previously in
hci_dev_do_close) will release resources like the sensitive channel
and the smp_dev objects. Consider the situations the device is detaching
or power down while the kernel is still operating on it, the following
data race could take place.
thread-A hci_dev_close_sync | thread-B read_local_oob_ext_data
|
hci_dev_unlock() |
... | hci_dev_lock()
if (hdev->smp_data) |
chan = hdev->smp_data |
| chan = hdev->smp_data (3)
|
hdev->smp_data = NULL (1) | if (!chan || !chan->data) (4)
... |
smp = chan->data | smp = chan->data
if (smp) |
chan->data = NULL (2) |
... |
kfree_sensitive(smp) |
| // dereference smp trigger UFA
That is, the objects hdev->smp_data and chan->data both suffer from the
data races. In a preempt-enable kernel, the above schedule (when (3) is
before (1) and (4) is before (2)) leads to UAF bugs. It can be
reproduced in the latest kernel and below is part of the report:
[ 49.097146] ================================================================
[ 49.097611] BUG: KASAN: use-after-free in smp_generate_oob+0x2dd/0x570
[ 49.097611] Read of size 8 at addr
ffff888006528360 by task generate_oob/155
[ 49.097611]
[ 49.097611] Call Trace:
[ 49.097611] <TASK>
[ 49.097611] dump_stack_lvl+0x34/0x44
[ 49.097611] print_address_description.constprop.0+0x1f/0x150
[ 49.097611] ? smp_generate_oob+0x2dd/0x570
[ 49.097611] ? smp_generate_oob+0x2dd/0x570
[ 49.097611] kasan_report.cold+0x7f/0x11b
[ 49.097611] ? smp_generate_oob+0x2dd/0x570
[ 49.097611] smp_generate_oob+0x2dd/0x570
[ 49.097611] read_local_oob_ext_data+0x689/0xc30
[ 49.097611] ? hci_event_packet+0xc80/0xc80
[ 49.097611] ? sysvec_apic_timer_interrupt+0x9b/0xc0
[ 49.097611] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 49.097611] ? mgmt_init_hdev+0x1c/0x240
[ 49.097611] ? mgmt_init_hdev+0x28/0x240
[ 49.097611] hci_sock_sendmsg+0x1880/0x1e70
[ 49.097611] ? create_monitor_event+0x890/0x890
[ 49.097611] ? create_monitor_event+0x890/0x890
[ 49.097611] sock_sendmsg+0xdf/0x110
[ 49.097611] __sys_sendto+0x19e/0x270
[ 49.097611] ? __ia32_sys_getpeername+0xa0/0xa0
[ 49.097611] ? kernel_fpu_begin_mask+0x1c0/0x1c0
[ 49.097611] __x64_sys_sendto+0xd8/0x1b0
[ 49.097611] ? syscall_exit_to_user_mode+0x1d/0x40
[ 49.097611] do_syscall_64+0x3b/0x90
[ 49.097611] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 49.097611] RIP: 0033:0x7f5a59f51f64
...
[ 49.097611] RAX:
ffffffffffffffda RBX:
0000000000000000 RCX:
00007f5a59f51f64
[ 49.097611] RDX:
0000000000000007 RSI:
00007f5a59d6ac70 RDI:
0000000000000006
[ 49.097611] RBP:
0000000000000000 R08:
0000000000000000 R09:
0000000000000000
[ 49.097611] R10:
0000000000000040 R11:
0000000000000246 R12:
00007ffec26916ee
[ 49.097611] R13:
00007ffec26916ef R14:
00007f5a59d6afc0 R15:
00007f5a59d6b700
To solve these data races, this patch places the smp_unregister()
function in the protected area by the hci_dev_lock(). That is, the
smp_unregister() function can not be concurrently executed when
operating functions (most of them are mgmt operations in mgmt.c) hold
the device lock.
This patch is tested with kernel LOCK DEBUGGING enabled. The price from
the extended holding time of the device lock is supposed to be low as the
smp_unregister() function is fairly short and efficient.
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Luiz Augusto von Dentz [Fri, 4 Feb 2022 21:12:35 +0000 (13:12 -0800)]
Bluetooth: hci_core: Fix leaking sent_cmd skb
sent_cmd memory is not freed before freeing hci_dev causing it to leak
it contents.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Linus Torvalds [Thu, 24 Feb 2022 19:15:10 +0000 (11:15 -0800)]
Merge tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request:
- send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
- fix passthrough to namespaces with unsupported features (Christoph
Hellwig)
- Clear iocb->private at poll completion (Stefano)
* tag 'block-5.17-2022-02-24' of git://git.kernel.dk/linux-block:
nvme-tcp: send H2CData PDUs based on MAXH2CDATA
nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
nvme: don't return an error from nvme_configure_metadata
block: clear iocb->private in blkdev_bio_end_io_async()
Linus Torvalds [Thu, 24 Feb 2022 19:08:15 +0000 (11:08 -0800)]
Merge tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Add a conditional schedule point in io_add_buffers() (Eric)
- Fix for a quiesce speedup merged in this release (Dylan)
- Don't convert to jiffies for event timeout waiting, it's way too
coarse when we accept a timespec as input (me)
* tag 'io_uring-5.17-2022-02-23' of git://git.kernel.dk/linux-block:
io_uring: disallow modification of rsrc_data during quiesce
io_uring: don't convert to jiffies for waiting on timeouts
io_uring: add a schedule point in io_add_buffers()
Linus Torvalds [Thu, 24 Feb 2022 18:42:20 +0000 (10:42 -0800)]
Merge tag 'platform-drivers-x86-v5.17-4' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull more x86 platform driver fixes from Hans de Goede:
"Two more fixes:
- Fix suspend/resume regression on AMD Cezanne APUs in >= 5.16
- Fix Microsoft Surface 3 battery readings"
* tag 'platform-drivers-x86-v5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
surface: surface3_power: Fix battery readings on batteries without a serial number
platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup
Mauri Sandberg [Wed, 23 Feb 2022 14:23:37 +0000 (16:23 +0200)]
net: mv643xx_eth: process retval from of_get_mac_address
Obtaining a MAC address may be deferred in cases when the MAC is stored
in an NVMEM block, for example, and it may not be ready upon the first
retrieval attempt and return EPROBE_DEFER.
It is also possible that a port that does not rely on NVMEM has been
already created when getting the defer request. Thus, also the resources
allocated previously must be freed when doing a roll-back.
Fixes: 76723bca2802 ("net: mv643xx_eth: add DT parsing support")
Signed-off-by: Mauri Sandberg <maukka@ext.kapsi.fi>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20220223142337.41757-1-maukka@ext.kapsi.fi
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Thu, 24 Feb 2022 03:41:08 +0000 (22:41 -0500)]
ping: remove pr_err from ping_lookup
As Jakub noticed, prints should be avoided on the datapath.
Also, as packets would never come to the else branch in
ping_lookup(), remove pr_err() from ping_lookup().
Fixes: 35a79e64de29 ("ping: fix the dif and sdif check in ping_lookup")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/1ef3f2fcd31bd681a193b1fcf235eee1603819bd.1645674068.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mateusz Palczewski [Wed, 23 Feb 2022 17:53:47 +0000 (09:53 -0800)]
Revert "i40e: Fix reset bw limit when DCB enabled with 1 TC"
Revert of a patch that instead of fixing a AQ error when trying
to reset BW limit introduced several regressions related to
creation and managing TC. Currently there are errors when creating
a TC on both PF and VF.
Error log:
[17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391
[17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14
[17428.783259] i40e 0000:3b:00.1: Unable to configure TC map 0 for VSI 391
This reverts commit
3d2504663c41104b4359a15f35670cfa82de1bbf.
Fixes: 3d2504663c41 (i40e: Fix reset bw limit when DCB enabled with 1 TC)
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paul Blakey [Wed, 23 Feb 2022 16:34:16 +0000 (18:34 +0200)]
openvswitch: Fix setting ipv6 fields causing hw csum failure
Ipv6 ttl, label and tos fields are modified without first
pulling/pushing the ipv6 header, which would have updated
the hw csum (if available). This might cause csum validation
when sending the packet to the stack, as can be seen in
the trace below.
Fix this by updating skb->csum if available.
Trace resulted by ipv6 ttl dec and then sending packet
to conntrack [actions: set(ipv6(hlimit=63)),ct(zone=99)]:
[295241.900063] s_pf0vf2: hw csum failure
[295241.923191] Call Trace:
[295241.925728] <IRQ>
[295241.927836] dump_stack+0x5c/0x80
[295241.931240] __skb_checksum_complete+0xac/0xc0
[295241.935778] nf_conntrack_tcp_packet+0x398/0xba0 [nf_conntrack]
[295241.953030] nf_conntrack_in+0x498/0x5e0 [nf_conntrack]
[295241.958344] __ovs_ct_lookup+0xac/0x860 [openvswitch]
[295241.968532] ovs_ct_execute+0x4a7/0x7c0 [openvswitch]
[295241.979167] do_execute_actions+0x54a/0xaa0 [openvswitch]
[295242.001482] ovs_execute_actions+0x48/0x100 [openvswitch]
[295242.006966] ovs_dp_process_packet+0x96/0x1d0 [openvswitch]
[295242.012626] ovs_vport_receive+0x6c/0xc0 [openvswitch]
[295242.028763] netdev_frame_hook+0xc0/0x180 [openvswitch]
[295242.034074] __netif_receive_skb_core+0x2ca/0xcb0
[295242.047498] netif_receive_skb_internal+0x3e/0xc0
[295242.052291] napi_gro_receive+0xba/0xe0
[295242.056231] mlx5e_handle_rx_cqe_mpwrq_rep+0x12b/0x250 [mlx5_core]
[295242.062513] mlx5e_poll_rx_cq+0xa0f/0xa30 [mlx5_core]
[295242.067669] mlx5e_napi_poll+0xe1/0x6b0 [mlx5_core]
[295242.077958] net_rx_action+0x149/0x3b0
[295242.086762] __do_softirq+0xd7/0x2d6
[295242.090427] irq_exit+0xf7/0x100
[295242.093748] do_IRQ+0x7f/0xd0
[295242.096806] common_interrupt+0xf/0xf
[295242.100559] </IRQ>
[295242.102750] RIP: 0033:0x7f9022e88cbd
[295242.125246] RSP: 002b:
00007f9022282b20 EFLAGS:
00000246 ORIG_RAX:
ffffffffffffffda
[295242.132900] RAX:
0000000000000005 RBX:
0000000000000010 RCX:
0000000000000000
[295242.140120] RDX:
00007f9022282ba8 RSI:
00007f9022282a30 RDI:
00007f9014005c30
[295242.147337] RBP:
00007f9014014d60 R08:
0000000000000020 R09:
00007f90254a8340
[295242.154557] R10:
00007f9022282a28 R11:
0000000000000246 R12:
0000000000000000
[295242.161775] R13:
00007f902308c000 R14:
000000000000002b R15:
00007f9022b71f40
Fixes: 3fdbd1ce11e5 ("openvswitch: add ipv6 'set' action")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Link: https://lore.kernel.org/r/20220223163416.24096-1-paulb@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Niels Dossche [Wed, 23 Feb 2022 13:19:56 +0000 (14:19 +0100)]
ipv6: prevent a possible race condition with lifetimes
valid_lft, prefered_lft and tstamp are always accessed under the lock
"lock" in other places. Reading these without taking the lock may result
in inconsistencies regarding the calculation of the valid and preferred
variables since decisions are taken on these fields for those variables.
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Niels Dossche <niels.dossche@ugent.be>
Link: https://lore.kernel.org/r/20220223131954.6570-1-niels.dossche@ugent.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Fabio M. De Francesco [Wed, 23 Feb 2022 10:02:52 +0000 (11:02 +0100)]
net/smc: Use a mutex for locking "struct smc_pnettable"
smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib()
which, in turn, calls mutex_lock(&smc_ib_devices.mutex).
read_lock() disables preemption. Therefore, the code acquires a mutex while in
atomic context and it leads to a SAC bug.
Fix this bug by replacing the rwlock with a mutex.
Reported-and-tested-by: syzbot+4f322a6d84e991c38775@syzkaller.appspotmail.com
Fixes: 64e28b52c7a6 ("net/smc: add pnet table namespace support")
Confirmed-by: Tony Lu <tonylu@linux.alibaba.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
Link: https://lore.kernel.org/r/20220223100252.22562-1-fmdefrancesco@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Manish Chopra [Wed, 23 Feb 2022 08:57:20 +0000 (00:57 -0800)]
bnx2x: fix driver load from initrd
Commit
b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0") added
new firmware support in the driver with maintaining older firmware
compatibility. However, older firmware was not added in MODULE_FIRMWARE()
which caused missing firmware files in initrd image leading to driver load
failure from initrd. This patch adds MODULE_FIRMWARE() for older firmware
version to have firmware files included in initrd.
Fixes: b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215627
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Alok Prasad <palok@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Link: https://lore.kernel.org/r/20220223085720.12021-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek Marczykowski-Górecki [Tue, 22 Feb 2022 00:18:17 +0000 (01:18 +0100)]
Revert "xen-netback: Check for hotplug-status existence before watching"
This reverts commit
2afeec08ab5c86ae21952151f726bfe184f6b23d.
The reasoning in the commit was wrong - the code expected to setup the
watch even if 'hotplug-status' didn't exist. In fact, it relied on the
watch being fired the first time - to check if maybe 'hotplug-status' is
already set to 'connected'. Not registering a watch for non-existing
path (which is the case if hotplug script hasn't been executed yet),
made the backend not waiting for the hotplug script to execute. This in
turns, made the netfront think the interface is fully operational, while
in fact it was not (the vif interface on xen-netback side might not be
configured yet).
This was a workaround for 'hotplug-status' erroneously being removed.
But since that is reverted now, the workaround is not necessary either.
More discussion at
https://lore.kernel.org/xen-devel/
afedd7cb-a291-e773-8b0d-
4db9b291fa98@ipxe.org/T/#u
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Michael Brown <mbrown@fensystems.co.uk>
Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek Marczykowski-Górecki [Tue, 22 Feb 2022 00:18:16 +0000 (01:18 +0100)]
Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"
This reverts commit
1f2565780e9b7218cf92c7630130e82dcc0fe9c2.
The 'hotplug-status' node should not be removed as long as the vif
device remains configured. Otherwise the xen-netback would wait for
re-running the network script even if it was already called (in case of
the frontent re-connecting). But also, it _should_ be removed when the
vif device is destroyed (for example when unbinding the driver) -
otherwise hotplug script would not configure the device whenever it
re-appear.
Moving removal of the 'hotplug-status' node was a workaround for nothing
calling network script after xen-netback module is reloaded. But when
vif interface is re-created (on xen-netback unbind/bind for example),
the script should be called, regardless of who does that - currently
this case is not handled by the toolstack, and requires manual
script call. Keeping hotplug-status=connected to skip the call is wrong
and leads to not configured interface.
More discussion at
https://lore.kernel.org/xen-devel/
afedd7cb-a291-e773-8b0d-
4db9b291fa98@ipxe.org/T/#u
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jens Axboe [Thu, 24 Feb 2022 14:02:15 +0000 (07:02 -0700)]
Merge tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme into block-5.17
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.17
- send H2CData PDUs based on MAXH2CDATA (Varun Prakash)
- fix passthrough to namespaces with unsupported features (me)"
* tag 'nvme-5.17-2022-02-24' of git://git.infradead.org/nvme:
nvme-tcp: send H2CData PDUs based on MAXH2CDATA
nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
nvme: don't return an error from nvme_configure_metadata
Hans de Goede [Thu, 24 Feb 2022 10:18:48 +0000 (11:18 +0100)]
surface: surface3_power: Fix battery readings on batteries without a serial number
The battery on the 2nd hand Surface 3 which I recently bought appears to
not have a serial number programmed in. This results in any I2C reads from
the registers containing the serial number failing with an I2C NACK.
This was causing mshw0011_bix() to fail causing the battery readings to
not work at all.
Ignore EREMOTEIO (I2C NACK) errors when retrieving the serial number and
continue with an empty serial number to fix this.
Fixes: b1f81b496b0d ("platform/x86: surface3_power: MSHW0011 rev-eng implementation")
BugLink: https://github.com/linux-surface/linux-surface/issues/608
Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220224101848.7219-1-hdegoede@redhat.com
Mario Limonciello [Wed, 23 Feb 2022 17:52:37 +0000 (11:52 -0600)]
platform/x86: amd-pmc: Set QOS during suspend on CZN w/ timer wakeup
commit
59348401ebed ("platform/x86: amd-pmc: Add special handling for
timer based S0i3 wakeup") adds support for using another platform timer
in lieu of the RTC which doesn't work properly on some systems. This path
was validated and worked well before submission. During the 5.16-rc1 merge
window other patches were merged that caused this to stop working properly.
When this feature was used with 5.16-rc1 or later some OEM laptops with the
matching firmware requirements from that commit would shutdown instead of
program a timer based wakeup.
This was bisected to commit
8d89835b0467 ("PM: suspend: Do not pause
cpuidle in the suspend-to-idle path"). This wasn't supposed to cause any
negative impacts and also tested well on both Intel and ARM platforms.
However this changed the semantics of when CPUs are allowed to be in the
deepest state. For the AMD systems in question it appears this causes a
firmware crash for timer based wakeup.
It's hypothesized to be caused by the `amd-pmc` driver sending `OS_HINT`
and all the CPUs going into a deep state while the timer is still being
programmed. It's likely a firmware bug, but to avoid it don't allow setting
CPUs into the deepest state while using CZN timer wakeup path.
If later it's discovered that this also occurs from "regular" suspends
without a timer as well or on other silicon, this may be later expanded to
run in the suspend path for more scenarios.
Cc: stable@vger.kernel.org # 5.16+
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/linux-acpi/BL1PR12MB51570F5BD05980A0DCA1F3F4E23A9@BL1PR12MB5157.namprd12.prod.outlook.com/T/#mee35f39c41a04b624700ab2621c795367f19c90e
Fixes: 8d89835b0467 ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path")
Fixes: 23f62d7ab25b ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions")
Fixes: 59348401ebed ("platform/x86: amd-pmc: Add special handling for timer based S0i3 wakeup"
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20220223175237.6209-1-mario.limonciello@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Jakub Kicinski [Thu, 24 Feb 2022 04:30:00 +0000 (20:30 -0800)]
Merge tag 'mlx5-fixes-2022-02-23' of git://git./linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2022-02-22
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
* tag 'mlx5-fixes-2022-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix VF min/max rate parameters interchange mistake
net/mlx5e: Add missing increment of count
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
net/mlx5e: Add feature check for set fec counters
net/mlx5e: TC, Skip redundant ct clear actions
net/mlx5e: TC, Reject rules with forward and drop actions
net/mlx5e: TC, Reject rules with drop and modify hdr action
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
net/mlx5: Fix possible deadlock on rule deletion
net/mlx5: Fix tc max supported prio for nic mode
net/mlx5: Fix wrong limitation of metadata match on ecpf
net/mlx5: Update log_max_qp value to be 17 at most
net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version
net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte
net/mlx5: DR, Cache STE shadow memory
net/mlx5: Update the list of the PCI supported devices
====================
Link: https://lore.kernel.org/r/20220224001123.365265-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 24 Feb 2022 01:25:22 +0000 (17:25 -0800)]
Merge tag 'devicetree-fixes-for-5.17-2' of git://git./linux/kernel/git/robh/linux
Pull devicetree fixes from Rob Herring:
- Update some maintainers email addresses
- Fix handling of elfcorehdr reservation for crash dump kernel
- Fix unittest expected warnings text
* tag 'devicetree-fixes-for-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: update Roger Quadros email
MAINTAINERS: sifive: drop Yash Shah
of/fdt: move elfcorehdr reservation early for crash dump kernel
of: unittest: update text of expected warnings
Linus Torvalds [Thu, 24 Feb 2022 01:19:55 +0000 (17:19 -0800)]
Merge tag 'selinux-pr-
20220223' of git://git./linux/kernel/git/pcmoore/selinux
Pull selinux fix from Paul Moore:
"A second small SELinux fix which addresses an incorrect
mutex_is_locked() check"
* tag 'selinux-pr-
20220223' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix misuse of mutex_is_locked()
Gal Pressman [Mon, 21 Feb 2022 15:54:34 +0000 (17:54 +0200)]
net/mlx5e: Fix VF min/max rate parameters interchange mistake
The VF min and max rate were passed incorrectly and resulted in wrongly
interchanging them. Fix the order of parameters in
mlx5_esw_qos_set_vport_rate().
Fixes: d7df09f5e7b4 ("net/mlx5: E-switch, Enable vport QoS on demand")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Lama Kayal [Mon, 21 Feb 2022 10:26:11 +0000 (12:26 +0200)]
net/mlx5e: Add missing increment of count
Add mistakenly missing increment of count variable when looping over
output buffer in mlx5e_self_test().
This resolves the issue of garbage values output when querying with self
test via ethtool.
before:
$ ethtool -t eth2
The test result is PASS
The test extra info:
Link Test 0
Speed Test
1768697188
Health Test
758528120
Loopback Test
3288687
after:
$ ethtool -t eth2
The test result is PASS
The test extra info:
Link Test 0
Speed Test 0
Health Test 0
Loopback Test 0
Fixes: 7990b1b5e8bd ("net/mlx5e: loopback test is not supported in switchdev mode")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Dickman [Thu, 6 Jan 2022 12:46:24 +0000 (14:46 +0200)]
net/mlx5e: MPLSoUDP decap, fix check for unsupported matches
Currently offload of rule on bareudp device require tunnel key
in order to match on mpls fields and without it the mpls fields
are ignored, this is incorrect due to the fact udp tunnel doesn't
have key to match on.
Fix by returning error in case flow is matching on tunnel key.
Fixes: 72046a91d134 ("net/mlx5e: Allow to match on mpls parameters")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Dickman [Thu, 6 Jan 2022 12:10:18 +0000 (14:10 +0200)]
net/mlx5e: Fix MPLSoUDP encap to use MPLS action information
Currently the MPLSoUDP encap builds the MPLS header using encap action
information (tunnel id, ttl and tos) instead of the MPLS action
information (label, ttl, tc and bos) which is wrong.
Fix by storing the MPLS action information during the flow action
parse and later using it to create the encap MPLS header.
Fixes: f828ca6a2fb6 ("net/mlx5e: Add support for hw encapsulation of MPLS over UDP")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Lama Kayal [Tue, 1 Feb 2022 09:24:41 +0000 (11:24 +0200)]
net/mlx5e: Add feature check for set fec counters
Fec counters support is checked via the PCAM feature_cap_mask,
bit 0: PPCNT_counter_group_Phy_statistical_counter_group.
Add feature check to avoid faulty behavior.
Fixes: 0a1498ebfa55 ("net/mlx5e: Expose FEC counters via ethtool")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Thu, 3 Feb 2022 07:42:19 +0000 (09:42 +0200)]
net/mlx5e: TC, Skip redundant ct clear actions
Offload of ct clear action is just resetting the reg_c register.
It's done by allocating modify hdr resources which is limited.
Doing it multiple times is redundant and wasting modify hdr resources
and if resources depleted the driver will fail offloading the rule.
Ignore redundant ct clear actions after the first one.
Fixes: 806401c20a0f ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Mon, 17 Jan 2022 13:00:30 +0000 (15:00 +0200)]
net/mlx5e: TC, Reject rules with forward and drop actions
Such rules are redundant but allowed and passed to the driver.
The driver does not support offloading such rules so return an error.
Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Roi Dayan [Tue, 4 Jan 2022 08:38:02 +0000 (10:38 +0200)]
net/mlx5e: TC, Reject rules with drop and modify hdr action
This kind of action is not supported by firmware and generates a
syndrome.
kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions")
Signed-off-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Tariq Toukan [Mon, 31 Jan 2022 08:26:19 +0000 (10:26 +0200)]
net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets
For RX TLS device-offloaded packets, the HW spec guarantees checksum
validation for the offloaded packets, but does not define whether the
CQE.checksum field matches the original packet (ciphertext) or
the decrypted one (plaintext). This latitude allows architetctural
improvements between generations of chips, resulting in different decisions
regarding the value type of CQE.checksum.
Hence, for these packets, the device driver should not make use of this CQE
field. Here we block CHECKSUM_COMPLETE usage for RX TLS device-offloaded
packets, and use CHECKSUM_UNNECESSARY instead.
Value of the packet's tcp_hdr.csum is not modified by the HW, and it always
matches the original ciphertext.
Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Gal Pressman [Wed, 2 Feb 2022 14:07:21 +0000 (16:07 +0200)]
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
The ioctl EEPROM query wrongly returns success on read failures, fix
that by returning the appropriate error code.
Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maor Gottlieb [Mon, 24 Jan 2022 19:25:04 +0000 (21:25 +0200)]
net/mlx5: Fix possible deadlock on rule deletion
Add missing call to up_write_ref_node() which releases the semaphore
in case the FTE doesn't have destinations, such in drop rule case.
Fixes: 465e7baab6d9 ("net/mlx5: Fix deletion of duplicate rules")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Chris Mi [Tue, 14 Dec 2021 01:52:53 +0000 (03:52 +0200)]
net/mlx5: Fix tc max supported prio for nic mode
Only prio 1 is supported if firmware doesn't support ignore flow
level for nic mode. The offending commit removed the check wrongly.
Add it back.
Fixes: 9a99c8f1253a ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Ariel Levkovich [Fri, 28 Jan 2022 23:39:24 +0000 (01:39 +0200)]
net/mlx5: Fix wrong limitation of metadata match on ecpf
Match metadata support check returns false for ecpf device.
However, this support does exist for ecpf and therefore this
limitation should be removed to allow feature such as stacked
devices and internal port offloaded to be supported.
Fixes: 92ab1eb392c6 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it")
Signed-off-by: Ariel Levkovich <lariel@nvidia.com>
Reviewed-by: Maor Dickman <maord@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Maher Sanalla [Wed, 16 Feb 2022 09:01:04 +0000 (11:01 +0200)]
net/mlx5: Update log_max_qp value to be 17 at most
Currently, log_max_qp value is dependent on what FW reports as its max capability.
In reality, due to a bug, some FWs report a value greater than 17, even though they
don't support log_max_qp > 17.
This FW issue led the driver to exhaust memory on startup.
Thus, log_max_qp value is set to be no more than 17 regardless
of what FW reports, as it was before the cited commit.
Fixes: f79a609ea6bf ("net/mlx5: Update log_max_qp value to FW max capability")
Signed-off-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Yevgeny Kliteynik [Wed, 29 Dec 2021 20:22:05 +0000 (22:22 +0200)]
net/mlx5: DR, Fix the threshold that defines when pool sync is initiated
When deciding whether to start syncing and actually free all the "hot"
ICM chunks, we need to consider the type of the ICM chunks that we're
dealing with. For instance, the amount of available ICM for MODIFY_ACTION
is significantly lower than the usual STE ICM, so the threshold should
account for that - otherwise we can deplete MODIFY_ACTION memory just by
creating and deleting the same modify header action in a continuous loop.
This patch replaces the hard-coded threshold with a dynamic value.
Fixes: 1c58651412bb ("net/mlx5: DR, ICM memory pools sync optimization")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Yevgeny Kliteynik [Thu, 13 Jan 2022 12:52:48 +0000 (14:52 +0200)]
net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version
Currently SMFS allows adding rule with matching on src/dst IP w/o matching
on full ethertype or ip_version, which is not supported by HW.
This patch fixes this issue and adds the check as it is done in DMFS.
Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Yevgeny Kliteynik [Tue, 11 Jan 2022 01:00:03 +0000 (03:00 +0200)]
net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte
When adding a rule with 32 destinations, we hit the following out-of-band
access issue:
BUG: KASAN: slab-out-of-bounds in mlx5_cmd_dr_create_fte+0x18ee/0x1e70
This patch fixes the issue by both increasing the allocated buffers to
accommodate for the needed actions and by checking the number of actions
to prevent this issue when a rule with too many actions is provided.
Fixes: 1ffd498901c1 ("net/mlx5: DR, Increase supported num of actions to 32")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Yevgeny Kliteynik [Thu, 23 Dec 2021 23:07:30 +0000 (01:07 +0200)]
net/mlx5: DR, Cache STE shadow memory
During rule insertion on each ICM memory chunk we also allocate shadow memory
used for management. This includes the hw_ste, dr_ste and miss list per entry.
Since the scale of these allocations is large we noticed a performance hiccup
that happens once malloc and free are stressed.
In extreme usecases when ~1M chunks are freed at once, it might take up to 40
seconds to complete this, up to the point the kernel sees this as self-detected
stall on CPU:
rcu: INFO: rcu_sched self-detected stall on CPU
To resolve this we will increase the reuse of shadow memory.
Doing this we see that a time in the aforementioned usecase dropped from ~40
seconds to ~8-10 seconds.
Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator")
Signed-off-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Meir Lichtinger [Mon, 10 Jan 2022 08:14:41 +0000 (10:14 +0200)]
net/mlx5: Update the list of the PCI supported devices
Add the upcoming BlueField-4 and ConnectX-8 device IDs.
Fixes: 2e9d3e83ab82 ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Meir Lichtinger <meirl@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Linus Torvalds [Wed, 23 Feb 2022 20:06:23 +0000 (12:06 -0800)]
Merge tag 'for-5.17/parisc-4' of git://git./linux/kernel/git/deller/parisc-linux
Pull parisc unaligned handler fixes from Helge Deller:
"Two patches which fix a few bugs in the unalignment handlers.
The fldd and fstd instructions weren't handled at all on 32-bit
kernels, the stw instruction didn't check for fault errors and the
fldw_l and ldw_m were handled wrongly as integer vs floating point
instructions.
Both patches are tagged for stable series"
* tag 'for-5.17/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc/unaligned: Fix ldw() and stw() unalignment handlers
parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
Linus Torvalds [Wed, 23 Feb 2022 19:51:35 +0000 (11:51 -0800)]
Merge tag 'hwmon-for-v5.17-rc6' of git://git./linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix two old bugs and one new bug in the hwmon subsystem:
- In pmbus core, clear pmbus fault/warning status bits after read to
follow PMBus standard
- In hwmon core, handle failure to register sensor with thermal zone
correctly
- In ntc_thermal driver, use valid thermistor names for Samsung
thermistors"
* tag 'hwmon-for-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (pmbus) Clear pmbus fault/warning bits after read
hwmon: Handle failure to register sensor with thermal zone correctly
hwmon: (ntc_thermistor) Underscore Samsung thermistor
Linus Torvalds [Wed, 23 Feb 2022 19:33:12 +0000 (11:33 -0800)]
Merge tag 'slab-for-5.17-rc6' of git://git./linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- Build fix (workaround) for clang.
- Fix a /proc/kcore based slabinfo script broken by struct slab changes
in 5.17-rc1.
* tag 'slab-for-5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
tools/cgroup/slabinfo: update to work with struct slab
slab: remove __alloc_size attribute from __kmalloc_track_caller
Alex Deucher [Tue, 22 Feb 2022 16:08:01 +0000 (11:08 -0500)]
PCI: Mark all AMD Navi10 and Navi14 GPU ATS as broken
There are enough VBIOS escapes without the proper workaround that some
users still hit this. Microsoft never productized ATS on Windows so OEM
platforms that were Windows-only didn't always validate ATS.
The advantages of ATS are not worth it compared to the potential
instabilities on harvested boards. Disable ATS on all Navi10 and Navi14
boards.
Symptoms include:
amdgpu 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0007 address=0xffffc02000 flags=0x0000]
AMD-Vi: Event logged [IO_PAGE_FAULT device=07:00.0 domain=0x0007 address=0xffffc02000 flags=0x0000]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=6047, emitted seq=6049
amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
amdgpu 0000:07:00.0: amdgpu: GPU reset(1) failed
Related commits:
e8946a53e2a6 ("PCI: Mark AMD Navi14 GPU ATS as broken")
a2da5d8cc0b0 ("PCI: Mark AMD Raven iGPU ATS as broken in some platforms")
45beb31d3afb ("PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken")
5e89cd303e3a ("PCI: Mark AMD Navi14 GPU rev 0xc5 ATS as broken")
d28ca864c493 ("PCI: Mark AMD Stoney Radeon R7 GPU ATS as broken")
9b44b0b09dec ("PCI: Mark AMD Stoney GPU ATS as broken")
[bhelgaas: add symptoms and related commits]
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1760
Link: https://lore.kernel.org/r/20220222160801.841643-1-alexander.deucher@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Helge Deller [Fri, 18 Feb 2022 22:40:14 +0000 (23:40 +0100)]
parisc/unaligned: Fix ldw() and stw() unalignment handlers
Fix 3 bugs:
a) emulate_stw() doesn't return the error code value, so faulting
instructions are not reported and aborted.
b) Tell emulate_ldw() to handle fldw_l as floating point instruction
c) Tell emulate_ldw() to handle ldw_m as integer instruction
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Helge Deller [Fri, 18 Feb 2022 08:25:20 +0000 (09:25 +0100)]
parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
Usually the kernel provides fixup routines to emulate the fldd and fstd
floating-point instructions if they load or store 8-byte from/to a not
natuarally aligned memory location.
On a 32-bit kernel I noticed that those unaligned handlers didn't worked and
instead the application got a SEGV.
While checking the code I found two problems:
First, the OPCODE_FLDD_L and OPCODE_FSTD_L cases were ifdef'ed out by the
CONFIG_PA20 option, and as such those weren't built on a pure 32-bit kernel.
This is now fixed by moving the CONFIG_PA20 #ifdef to prevent the compilation
of OPCODE_LDD_L and OPCODE_FSTD_L only, and handling the fldd and fstd
instructions.
The second problem are two bugs in the 32-bit inline assembly code, where the
wrong registers where used. The calculation of the natural alignment used %2
(vall) instead of %3 (ior), and the first word was stored back to address %1
(valh) instead of %3 (ior).
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Varun Prakash [Sat, 22 Jan 2022 16:57:44 +0000 (22:27 +0530)]
nvme-tcp: send H2CData PDUs based on MAXH2CDATA
As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3)
Maximum Host to Controller Data length (MAXH2CDATA): Specifies the
maximum number of PDU-Data bytes per H2CData PDU in bytes. This value
is a multiple of dwords and should be no less than 4,096.
Current code sets H2CData PDU data_length to r2t_length,
it does not check MAXH2CDATA value. Fix this by setting H2CData PDU
data_length to min(req->h2cdata_left, queue->maxh2cdata).
Also validate MAXH2CDATA value returned by target in ICResp PDU,
if it is not a multiple of dword or if it is less than 4096 return
-EINVAL from nvme_tcp_init_connection().
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Christoph Hellwig [Wed, 16 Feb 2022 13:14:58 +0000 (14:14 +0100)]
nvme: also mark passthrough-only namespaces ready in nvme_update_ns_info
Commit
e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
introduced the NVME_NS_READY flag, which nvme_path_is_disabled() uses
to check if a path can be used or not. We also need to set this flag
for devices that fail the ZNS feature validation and which are available
through passthrough devices only to that they can be used in multipathing
setups.
Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
Reported-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Kanchan Joshi <joshi.k@samsung.com>
Christoph Hellwig [Wed, 16 Feb 2022 14:07:15 +0000 (15:07 +0100)]
nvme: don't return an error from nvme_configure_metadata
When a fabrics controller claims to support an invalidate metadata
configuration we already warn and disable metadata support. No need to
also return an error during revalidation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Kanchan Joshi <joshi.k@samsung.com>
David S. Miller [Wed, 23 Feb 2022 12:50:19 +0000 (12:50 +0000)]
Merge branch 'ftgmac100-fixes'
Heyi Guo says:
====================
drivers/net/ftgmac100: fix occasional DHCP failure
This patch set is to fix the issues discussed in the mail thread:
https://lore.kernel.org/netdev/
51f5b7a7-330f-6b3c-253d-
10e45cdb6805@linux.alibaba.com/
and follows the advice from Andrew Lunn.
The first 2 patches refactors the code to enable adjust_link calling reset
function directly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heyi Guo [Wed, 23 Feb 2022 03:14:36 +0000 (11:14 +0800)]
drivers/net/ftgmac100: fix DHCP potential failure with systemd
DHCP failures were observed with systemd 247.6. The issue could be
reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
down/up.
It is caused by below procedures in the driver:
1. ftgmac100_open() enables net interface and call phy_start()
2. When PHY is link up, it calls netif_carrier_on() and then
adjust_link callback
3. ftgmac100_adjust_link() will schedule the reset task
4. ftgmac100_reset_task() will then reset the MAC in another schedule
After step 2, systemd will be notified to send DHCP discover packet,
while the packet might be corrupted by MAC reset operation in step 4.
Call ftgmac100_reset() directly instead of scheduling task to fix the
issue.
Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heyi Guo [Wed, 23 Feb 2022 03:14:35 +0000 (11:14 +0800)]
drivers/net/ftgmac100: adjust code place for function call dependency
This is to prepare for ftgmac100_adjust_link() to call
ftgmac100_reset() directly. Only code places are changed.
Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heyi Guo [Wed, 23 Feb 2022 03:14:34 +0000 (11:14 +0800)]
drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call
This is to prepare for ftgmac100_adjust_link() to call reset function
directly, instead of task schedule.
Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wan Jiabing [Wed, 23 Feb 2022 02:34:19 +0000 (10:34 +0800)]
net: sched: avoid newline at end of message in NL_SET_ERR_MSG_MOD
Fix following coccicheck warning:
./net/sched/act_api.c:277:7-49: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alvin Šipraga [Tue, 22 Feb 2022 16:14:08 +0000 (17:14 +0100)]
MAINTAINERS: add myself as co-maintainer for Realtek DSA switch drivers
Adding myself (Alvin Šipraga) as another maintainer for the Realtek DSA
switch drivers. I intend to help Linus out with reviewing and testing
changes to these drivers, particularly the rtl8365mb driver which I
authored and have hardware access to.
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 22 Feb 2022 13:43:12 +0000 (16:43 +0300)]
tipc: Fix end of loop tests for list_for_each_entry()
These tests are supposed to check if the loop exited via a break or not.
However the tests are wrong because if we did not exit via a break then
"p" is not a valid pointer. In that case, it's the equivalent of
"if (*(u32 *)sr == *last_key) {". That's going to work most of the time,
but there is a potential for those to be equal.
Fixes: 1593123a6a49 ("tipc: add name table dump to new netlink api")
Fixes: 1a1a143daf84 ("tipc: add publication dump to new netlink api")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 22 Feb 2022 13:42:51 +0000 (16:42 +0300)]
udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister()
This test is checking if we exited the list via break or not. However
if it did not exit via a break then "node" does not point to a valid
udp_tunnel_nic_shared_node struct. It will work because of the way
the structs are laid out it's the equivalent of
"if (info->shared->udp_tunnel_nic_info != dev)" which will always be
true, but it's not the right way to test.
Fixes: 74cc6d182d03 ("udp_tunnel: add the ability to share port tables")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Garzarella [Tue, 22 Feb 2022 09:47:42 +0000 (10:47 +0100)]
vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
vhost_vsock_stop() calls vhost_dev_check_owner() to check the device
ownership. It expects current->mm to be valid.
vhost_vsock_stop() is also called by vhost_vsock_dev_release() when
the user has not done close(), so when we are in do_exit(). In this
case current->mm is invalid and we're releasing the device, so we
should clean it anyway.
Let's check the owner only when vhost_vsock_stop() is called
by an ioctl.
When invoked from release we can not fail so we don't check return
code of vhost_vsock_stop(). We need to stop vsock even if it's not
the owner.
Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko")
Cc: stable@vger.kernel.org
Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sukadev Bhattiprolu [Mon, 21 Feb 2022 21:05:45 +0000 (15:05 -0600)]
ibmvnic: schedule failover only if vioctl fails
If client is unable to initiate a failover reset via H_VIOCTL hcall, then
it should schedule a failover reset as a last resort. Otherwise, there is
no need to do a last resort.
Fixes: 334c42414729 ("ibmvnic: improve failover sysfs entry")
Reported-by: Cris Forno <cforno12@outlook.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Link: https://lore.kernel.org/r/20220221210545.115283-1-drt@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alvin Šipraga [Mon, 21 Feb 2022 20:35:38 +0000 (21:35 +0100)]
net: dsa: fix panic when removing unoffloaded port from bridge
If a bridged port is not offloaded to the hardware - either because the
underlying driver does not implement the port_bridge_{join,leave} ops,
or because the operation failed - then its dp->bridge pointer will be
NULL when dsa_port_bridge_leave() is called. Avoid dereferncing NULL.
This fixes the following splat when removing a port from a bridge:
Unable to handle kernel access to user memory outside uaccess routines at virtual address
0000000000000000
Internal error: Oops:
96000004 [#1] PREEMPT_RT SMP
CPU: 3 PID: 1119 Comm: brctl Tainted: G O 5.17.0-rc4-rt4 #1
Call trace:
dsa_port_bridge_leave+0x8c/0x1e4
dsa_slave_changeupper+0x40/0x170
dsa_slave_netdevice_event+0x494/0x4d4
notifier_call_chain+0x80/0xe0
raw_notifier_call_chain+0x1c/0x24
call_netdevice_notifiers_info+0x5c/0xac
__netdev_upper_dev_unlink+0xa4/0x200
netdev_upper_dev_unlink+0x38/0x60
del_nbp+0x1b0/0x300
br_del_if+0x38/0x114
add_del_if+0x60/0xa0
br_ioctl_stub+0x128/0x2dc
br_ioctl_call+0x68/0xb0
dev_ifsioc+0x390/0x554
dev_ioctl+0x128/0x400
sock_do_ioctl+0xb4/0xf4
sock_ioctl+0x12c/0x4e0
__arm64_sys_ioctl+0xa8/0xf0
invoke_syscall+0x4c/0x110
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x28/0x84
el0_svc+0x1c/0x50
el0t_64_sync_handler+0xa8/0xb0
el0t_64_sync+0x17c/0x180
Code:
f9402f00 f0002261 f9401302 913cc021 (
a9401404)
---[ end trace
0000000000000000 ]---
Fixes: d3eed0e57d5d ("net: dsa: keep the bridge_dev and bridge_num as part of the same structure")
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220221203539.310690-1-alvin@pqrs.dk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Sun, 20 Feb 2022 15:40:52 +0000 (07:40 -0800)]
net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends
Whenever one of these functions pull all data from an skb in a frag_list,
use consume_skb() instead of kfree_skb() to avoid polluting drop
monitoring.
Fixes: 6fa01ccd8830 ("skbuff: Add pskb_extract() helper function")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220220154052.1308469-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Wed, 23 Feb 2022 00:14:35 +0000 (16:14 -0800)]
Merge branch 'for-5.17-fixes' of git://git./linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix for a subtle bug in the recent release_agent permission check
update
- Fix for a long-standing race condition between cpuset and cpu hotplug
- Comment updates
* 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cpuset: Fix kernel-doc
cgroup-v1: Correct privileges check in release_agent writes
cgroup: clarify cgroup_css_set_fork()
cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
Ondrej Mosnacek [Mon, 21 Feb 2022 14:06:49 +0000 (15:06 +0100)]
selinux: fix misuse of mutex_is_locked()
mutex_is_locked() tests whether the mutex is locked *by any task*, while
here we want to test if it is held *by the current task*. To avoid
false/missed WARNINGs, use lockdep_assert_is_held() and
lockdep_assert_is_not_held() instead, which do the right thing (though
they are a no-op if CONFIG_LOCKDEP=n).
Cc: stable@vger.kernel.org
Fixes: 2554a48f4437 ("selinux: measure state and policy capabilities")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Krzysztof Kozlowski [Mon, 21 Feb 2022 10:07:01 +0000 (11:07 +0100)]
dt-bindings: update Roger Quadros email
Emails to Roger Quadros TI account bounce with:
550 Invalid recipient <rogerq@ti.com> (#5.1.1)
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Acked-by: Roger Quadros <rogerq@kernel.org>
Acked-By: Vinod Koul <vkoul@kernel.org>
Acked-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220221100701.48593-1-krzysztof.kozlowski@canonical.com
Krzysztof Kozlowski [Mon, 14 Feb 2022 08:23:49 +0000 (09:23 +0100)]
MAINTAINERS: sifive: drop Yash Shah
Emails to Yash Shah bounce with "The email account that you tried to
reach does not exist.", so drop him from all maintainer entries.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20220214082349.162973-1-krzysztof.kozlowski@canonical.com
Jiapeng Chong [Wed, 16 Feb 2022 03:17:53 +0000 (11:17 +0800)]
cpuset: Fix kernel-doc
Fix the following W=1 kernel warnings:
kernel/cgroup/cpuset.c:3718: warning: expecting prototype for
cpuset_memory_pressure_bump(). Prototype was for
__cpuset_memory_pressure_bump() instead.
kernel/cgroup/cpuset.c:3568: warning: expecting prototype for
cpuset_node_allowed(). Prototype was for __cpuset_node_allowed()
instead.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Linus Torvalds [Tue, 22 Feb 2022 18:31:53 +0000 (10:31 -0800)]
Merge branch 'fixes' of git://git./linux/kernel/git/viro/vfs
Pull ITER_PIPE fix from Al Viro:
"Fix for old sloppiness in pipe_buffer reuse"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
lib/iov_iter: initialize "flags" in new pipe_buffer
Michal Koutný [Thu, 17 Feb 2022 16:11:28 +0000 (17:11 +0100)]
cgroup-v1: Correct privileges check in release_agent writes
The idea is to check: a) the owning user_ns of cgroup_ns, b)
capabilities in init_user_ns.
The commit
24f600856418 ("cgroup-v1: Require capabilities to set
release_agent") got this wrong in the write handler of release_agent
since it checked user_ns of the opener (may be different from the owning
user_ns of cgroup_ns).
Secondly, to avoid possibly confused deputy, the capability of the
opener must be checked.
Fixes: 24f600856418 ("cgroup-v1: Require capabilities to set release_agent")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/stable/20220216121142.GB30035@blackbody.suse.cz/
Signed-off-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Masami Ichikawa(CIP) <masami.ichikawa@cybertrust.co.jp>
Signed-off-by: Tejun Heo <tj@kernel.org>
Christian Brauner [Mon, 21 Feb 2022 15:16:39 +0000 (16:16 +0100)]
cgroup: clarify cgroup_css_set_fork()
With recent fixes for the permission checking when moving a task into a cgroup
using a file descriptor to a cgroup's cgroup.procs file and calling write() it
seems a good idea to clarify CLONE_INTO_CGROUP permission checking with a
comment.
Cc: Tejun Heo <tj@kernel.org>
Cc: <cgroups@vger.kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Dylan Yudaken [Tue, 22 Feb 2022 16:17:51 +0000 (08:17 -0800)]
io_uring: disallow modification of rsrc_data during quiesce
io_rsrc_ref_quiesce will unlock the uring while it waits for references to
the io_rsrc_data to be killed.
There are other places to the data that might add references to data via
calls to io_rsrc_node_switch.
There is a race condition where this reference can be added after the
completion has been signalled. At this point the io_rsrc_ref_quiesce call
will wake up and relock the uring, assuming the data is unused and can be
freed - although it is actually being used.
To fix this check in io_rsrc_ref_quiesce if a resource has been revived.
Reported-by: syzbot+ca8bf833622a1662745b@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220222161751.995746-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Vikash Chandola [Tue, 22 Feb 2022 13:12:53 +0000 (13:12 +0000)]
hwmon: (pmbus) Clear pmbus fault/warning bits after read
Almost all fault/warning bits in pmbus status registers remain set even
after fault/warning condition are removed. As per pmbus specification
these faults must be cleared by user.
Modify hwmon behavior to clear fault/warning bit after fetching data if
fault/warning bit was set. This allows to get fresh data in next read.
Signed-off-by: Vikash Chandola <vikash.chandola@linux.intel.com>
Link: https://lore.kernel.org/r/20220222131253.2426834-1-vikash.chandola@linux.intel.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Guenter Roeck [Mon, 21 Feb 2022 16:32:14 +0000 (08:32 -0800)]
hwmon: Handle failure to register sensor with thermal zone correctly
If an attempt is made to a sensor with a thermal zone and it fails,
the call to devm_thermal_zone_of_sensor_register() may return -ENODEV.
This may result in crashes similar to the following.
Unable to handle kernel NULL pointer dereference at virtual address
00000000000003cd
...
Internal error: Oops:
96000021 [#1] PREEMPT SMP
...
pstate:
60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mutex_lock+0x18/0x60
lr : thermal_zone_device_update+0x40/0x2e0
sp :
ffff800014c4fc60
x29:
ffff800014c4fc60 x28:
ffff365ee3f6e000 x27:
ffffdde218426790
x26:
ffff365ee3f6e000 x25:
0000000000000000 x24:
ffff365ee3f6e000
x23:
ffffdde218426870 x22:
ffff365ee3f6e000 x21:
00000000000003cd
x20:
ffff365ee8bf3308 x19:
ffffffffffffffed x18:
0000000000000000
x17:
ffffdde21842689c x16:
ffffdde1cb7a0b7c x15:
0000000000000040
x14:
ffffdde21a4889a0 x13:
0000000000000228 x12:
0000000000000000
x11:
0000000000000000 x10:
0000000000000000 x9 :
0000000000000000
x8 :
0000000001120000 x7 :
0000000000000001 x6 :
0000000000000000
x5 :
0068000878e20f07 x4 :
0000000000000000 x3 :
00000000000003cd
x2 :
ffff365ee3f6e000 x1 :
0000000000000000 x0 :
00000000000003cd
Call trace:
mutex_lock+0x18/0x60
hwmon_notify_event+0xfc/0x110
0xffffdde1cb7a0a90
0xffffdde1cb7a0b7c
irq_thread_fn+0x2c/0xa0
irq_thread+0x134/0x240
kthread+0x178/0x190
ret_from_fork+0x10/0x20
Code:
d503201f d503201f d2800001 aa0103e4 (
c8e47c02)
Jon Hunter reports that the exact call sequence is:
hwmon_notify_event()
--> hwmon_thermal_notify()
--> thermal_zone_device_update()
--> update_temperature()
--> mutex_lock()
The hwmon core needs to handle all errors returned from calls
to devm_thermal_zone_of_sensor_register(). If the call fails
with -ENODEV, report that the sensor was not attached to a
thermal zone but continue to register the hwmon device.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Cc: Dmitry Osipenko <digetx@gmail.com>
Fixes: 1597b374af222 ("hwmon: Add notification support")
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Stefano Garzarella [Fri, 11 Feb 2022 09:01:36 +0000 (10:01 +0100)]
block: clear iocb->private in blkdev_bio_end_io_async()
iocb_bio_iopoll() expects iocb->private to be cleared before
releasing the bio.
We already do this in blkdev_bio_end_io(), but we forgot in the
recently added blkdev_bio_end_io_async().
Fixes: 54a88eb838d3 ("block: add single bio async direct IO helper")
Cc: asml.silence@gmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220211090136.44471-1-sgarzare@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
David S. Miller [Tue, 22 Feb 2022 11:00:51 +0000 (11:00 +0000)]
Merge git://git./linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
This is fixing up the use without proper initialization in patch 5/5
-o-
Hi,
The following patchset contains Netfilter fixes for net:
1) Missing #ifdef CONFIG_IP6_NF_IPTABLES in recent xt_socket fix.
2) Fix incorrect flow action array size in nf_tables.
3) Unregister flowtable hooks from netns exit path.
4) Fix missing limit object release, from Florian Westphal.
5) Memleak in nf_tables object update path, also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Mon, 21 Feb 2022 12:31:49 +0000 (13:31 +0100)]
netfilter: nf_tables: fix memory leak during stateful obj update
stateful objects can be updated from the control plane.
The transaction logic allocates a temporary object for this purpose.
The ->init function was called for this object, so plain kfree() leaks
resources. We must call ->destroy function of the object.
nft_obj_destroy does this, but it also decrements the module refcount,
but the update path doesn't increment it.
To avoid special-casing the update object release, do module_get for
the update case too and release it via nft_obj_destroy().
Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation")
Cc: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Mon, 21 Feb 2022 17:10:53 +0000 (09:10 -0800)]
Merge tag 'platform-drivers-x86-v5.17-3' of git://git./linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Two small fixes and one hardware-id addition"
* tag 'platform-drivers-x86-v5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: int3472: Add terminator to gpiod_lookup_table
platform/x86: asus-wmi: Fix regression when probing for fan curve control
platform/x86: thinkpad_acpi: Add dual-fan quirk for T15g (2nd gen)
Max Kellermann [Mon, 21 Feb 2022 10:03:13 +0000 (11:03 +0100)]
lib/iov_iter: initialize "flags" in new pipe_buffer
The functions copy_page_to_iter_pipe() and push_pipe() can both
allocate a new pipe_buffer, but the "flags" member initializer is
missing.
Fixes: 241699cd72a8 ("new iov_iter flavour: pipe-backed")
To: Alexander Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
To: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Florian Westphal [Fri, 18 Feb 2022 12:17:05 +0000 (13:17 +0100)]
netfilter: nft_limit: fix stateful object memory leak
We need to provide a destroy callback to release the extra fields.
Fixes: 3b9e2ea6c11b ("netfilter: nft_limit: move stateful fields out of expression data")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo Neira Ayuso [Fri, 18 Feb 2022 11:45:32 +0000 (12:45 +0100)]
netfilter: nf_tables: unregister flowtable hooks on netns exit
Unregister flowtable hooks before they are releases via
nf_tables_flowtable_destroy() otherwise hook core reports UAF.
BUG: KASAN: use-after-free in nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
Read of size 4 at addr
ffff8880736f7438 by task syz-executor579/3666
CPU: 0 PID: 3666 Comm: syz-executor579 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
__dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106
dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 lib/dump_stack.c:106
print_address_description+0x65/0x380 mm/kasan/report.c:247 mm/kasan/report.c:247
__kasan_report mm/kasan/report.c:433 [inline]
__kasan_report mm/kasan/report.c:433 [inline] mm/kasan/report.c:450
kasan_report+0x19a/0x1f0 mm/kasan/report.c:450 mm/kasan/report.c:450
nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142
__nf_register_net_hook+0x27e/0x8d0 net/netfilter/core.c:429 net/netfilter/core.c:429
nf_register_net_hook+0xaa/0x180 net/netfilter/core.c:571 net/netfilter/core.c:571
nft_register_flowtable_net_hooks+0x3c5/0x730 net/netfilter/nf_tables_api.c:7232 net/netfilter/nf_tables_api.c:7232
nf_tables_newflowtable+0x2022/0x2cf0 net/netfilter/nf_tables_api.c:7430 net/netfilter/nf_tables_api.c:7430
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] net/netfilter/nfnetlink.c:652
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] net/netfilter/nfnetlink.c:652
nfnetlink_rcv+0x10e6/0x2550 net/netfilter/nfnetlink.c:652 net/netfilter/nfnetlink.c:652
__nft_release_hook() calls nft_unregister_flowtable_net_hooks() which
only unregisters the hooks, then after RCU grace period, it is
guaranteed that no packets add new entries to the flowtable (no flow
offload rules and flowtable hooks are reachable from packet path), so it
is safe to call nf_flow_table_free() which cleans up the remaining
entries from the flowtable (both software and hardware) and it unbinds
the flow_block.
Fixes: ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
Reported-by: syzbot+e918523f77e62790d6d9@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Daniel Scally [Wed, 16 Feb 2022 22:53:02 +0000 (22:53 +0000)]
platform/x86: int3472: Add terminator to gpiod_lookup_table
Without the terminator, if a con_id is passed to gpio_find() that
does not exist in the lookup table the function will not stop looping
correctly, and eventually cause an oops.
Fixes: 19d8d6e36b4b ("platform/x86: int3472: Pass tps68470_regulator_platform_data to the tps68470-regulator MFD-cell")
Signed-off-by: Daniel Scally <djrscally@gmail.com>
Link: https://lore.kernel.org/r/20220216225304.53911-5-djrscally@gmail.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Baruch Siach [Mon, 21 Feb 2022 11:45:57 +0000 (13:45 +0200)]
net: mdio-ipq4019: add delay after clock enable
Experimentation shows that PHY detect might fail when the code attempts
MDIO bus read immediately after clock enable. Add delay to stabilize the
clock before bus access.
PHY detect failure started to show after commit
7590fc6f80ac ("net:
mdio: Demote probed message to debug print") that removed coincidental
delay between clock enable and bus access.
10ms is meant to match the time it take to send the probed message over
UART at 115200 bps. This might be a far overshoot.
Fixes: 23a890d493e3 ("net: mdio: Add the reset function for IPQ MDIO driver")
Signed-off-by: Baruch Siach <baruch.siach@siklu.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jens Axboe [Mon, 21 Feb 2022 12:49:30 +0000 (05:49 -0700)]
io_uring: don't convert to jiffies for waiting on timeouts
If an application calls io_uring_enter(2) with a timespec passed in,
convert that timespec to ktime_t rather than jiffies. The latter does
not provide the granularity the application may expect, and may in
fact provided different granularity on different systems, depending
on what the HZ value is configured at.
Turn the timespec into an absolute ktime_t, and use that with
schedule_hrtimeout() instead.
Link: https://github.com/axboe/liburing/issues/531
Cc: stable@vger.kernel.org
Reported-by: Bob Chen <chenbo.chen@alibaba-inc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Tao Liu [Fri, 18 Feb 2022 14:35:24 +0000 (22:35 +0800)]
gso: do not skip outer ip header in case of ipip and net_failover
We encounter a tcp drop issue in our cloud environment. Packet GROed in
host forwards to a VM virtio_net nic with net_failover enabled. VM acts
as a IPVS LB with ipip encapsulation. The full path like:
host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
-> ipip encap -> net_failover tx -> virtio_net tx
When net_failover transmits a ipip pkt (gso_type = 0x0103, which means
SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso
did because it supports TSO and GSO_IPXIP4. But network_header points to
inner ip header.
Call Trace:
tcp4_gso_segment ------> return NULL
inet_gso_segment ------> inner iph, network_header points to
ipip_gso_segment
inet_gso_segment ------> outer iph
skb_mac_gso_segment
Afterwards virtio_net transmits the pkt, only inner ip header is modified.
And the outer one just keeps unchanged. The pkt will be dropped in remote
host.
Call Trace:
inet_gso_segment ------> inner iph, outer iph is skipped
skb_mac_gso_segment
__skb_gso_segment
validate_xmit_skb
validate_xmit_skb_list
sch_direct_xmit
__qdisc_run
__dev_queue_xmit ------> virtio_net
dev_hard_start_xmit
__dev_queue_xmit ------> net_failover
ip_finish_output2
ip_output
iptunnel_xmit
ip_tunnel_xmit
ipip_tunnel_xmit ------> ipip
dev_hard_start_xmit
__dev_queue_xmit
ip_finish_output2
ip_output
ip_forward
ip_rcv
__netif_receive_skb_one_core
netif_receive_skb_internal
napi_gro_receive
receive_buf
virtnet_poll
net_rx_action
The root cause of this issue is specific with the rare combination of
SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option.
SKB_GSO_DODGY is set from external virtio_net. We need to reset network
header when callbacks.gso_segment() returns NULL.
This patch also includes ipv6_gso_segment(), considering SIT, etc.
Fixes: cb32f511a70b ("ipip: add GSO/TSO support")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Wed, 16 Feb 2022 20:43:30 +0000 (12:43 -0800)]
tools/cgroup/slabinfo: update to work with struct slab
After the introduction of the dedicated struct slab to describe slab
pages by commit
d122019bf061 ("mm: Split slab into its own type") and
the following removal of the corresponding struct page's fields by
commit
07f910f9b729 ("mm: Remove slab from struct page") the
memcg_slabinfo tool broke. An attempt to run it produces a trace like
this:
Traceback (most recent call last):
File "/usr/bin/drgn", line 33, in <module>
sys.exit(load_entry_point('drgn==0.0.16', 'console_scripts', 'drgn')())
File "/usr/lib64/python3.9/site-packages/drgn/internal/cli.py", line 133, in main
runpy.run_path(args.script[0], init_globals=init_globals, run_name="__main__")
File "/usr/lib64/python3.9/runpy.py", line 268, in run_path
return _run_module_code(code, init_globals, run_name,
File "/usr/lib64/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "memcg_slabinfo.py", line 226, in <module>
main()
File "memcg_slabinfo.py", line 199, in main
cache = page.slab_cache
AttributeError: 'struct page' has no member 'slab_cache'
The problem can be fixed by explicitly casting struct page * to struct
slab * for slab pages. The tools works as expected with this fix, e.g.:
cred_jar 776 776 192 21 1 : tunables 0 0 0 : slabdata 547 547 0
kmalloc-cg-32 6 6 32 128 1 : tunables 0 0 0 : slabdata 9 9 0
files_cache 3 3 832 39 8 : tunables 0 0 0 : slabdata 8 8 0
kmalloc-cg-512 1 1 512 32 4 : tunables 0 0 0 : slabdata 10 10 0
task_struct 10 10 6720 4 8 : tunables 0 0 0 : slabdata 63 63 0
mm_struct 3 3 1664 19 8 : tunables 0 0 0 : slabdata 9 9 0
kmalloc-cg-16 1 1 16 256 1 : tunables 0 0 0 : slabdata 8 8 0
pde_opener 1 1 40 102 1 : tunables 0 0 0 : slabdata 8 8 0
anon_vma_chain 375 375 64 64 1 : tunables 0 0 0 : slabdata 81 81 0
radix_tree_node 3 3 584 28 4 : tunables 0 0 0 : slabdata 419 419 0
dentry 98 98 312 26 2 : tunables 0 0 0 : slabdata 1420 1420 0
btrfs_inode 3 3 2368 13 8 : tunables 0 0 0 : slabdata 730 730 0
signal_cache 3 3 1600 20 8 : tunables 0 0 0 : slabdata 17 17 0
sighand_cache 3 3 2240 14 8 : tunables 0 0 0 : slabdata 20 20 0
filp 90 90 512 32 4 : tunables 0 0 0 : slabdata 95 95 0
anon_vma 214 214 200 20 1 : tunables 0 0 0 : slabdata 162 162 0
kmalloc-cg-1k 1 1 1024 32 8 : tunables 0 0 0 : slabdata 22 22 0
pid 10 10 256 32 2 : tunables 0 0 0 : slabdata 14 14 0
kmalloc-cg-64 2 2 64 64 1 : tunables 0 0 0 : slabdata 8 8 0
kmalloc-cg-96 3 3 96 42 1 : tunables 0 0 0 : slabdata 8 8 0
sock_inode_cache 5 5 1408 23 8 : tunables 0 0 0 : slabdata 29 29 0
UNIX 7 7 1920 17 8 : tunables 0 0 0 : slabdata 21 21 0
inode_cache 36 36 1152 28 8 : tunables 0 0 0 : slabdata 680 680 0
proc_inode_cache 26 26 1224 26 8 : tunables 0 0 0 : slabdata 64 64 0
kmalloc-cg-2k 2 2 2048 16 8 : tunables 0 0 0 : slabdata 9 9 0
v2: change naming and count_partial()/count_free()/for_each_slab()
signatures to work with slabs, suggested by Matthew Wilcox
Fixes: 07f910f9b729 ("mm: Remove slab from struct page")
Reported-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Tested-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/linux-patches/Yg2cKKnIboNu7j+p@carbon.DHCP.thefacebook.com/
Greg Kroah-Hartman [Fri, 18 Feb 2022 13:13:58 +0000 (14:13 +0100)]
slab: remove __alloc_size attribute from __kmalloc_track_caller
Commit
c37495d6254c ("slab: add __alloc_size attributes for better
bounds checking") added __alloc_size attributes to a bunch of kmalloc
function prototypes. Unfortunately the change to __kmalloc_track_caller
seems to cause clang to generate broken code and the first time this is
called when booting, the box will crash.
While the compiler problems are being reworked and attempted to be
solved [1], let's just drop the attribute to solve the issue now. Once
it is resolved it can be added back.
[1] https://github.com/ClangBuiltLinux/linux/issues/1599
Fixes: c37495d6254c ("slab: add __alloc_size attributes for better bounds checking")
Cc: stable <stable@vger.kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: llvm@lists.linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20220218131358.3032912-1-gregkh@linuxfoundation.org
Linus Torvalds [Sun, 20 Feb 2022 21:07:20 +0000 (13:07 -0800)]
Linux 5.17-rc5
Linus Torvalds [Sun, 20 Feb 2022 20:50:50 +0000 (12:50 -0800)]
Merge tag 'locking_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
"Fix a NULL ptr dereference when dumping lockdep chains through
/proc/lockdep_chains"
* tag 'locking_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Correct lock_classes index mapping
Linus Torvalds [Sun, 20 Feb 2022 20:46:21 +0000 (12:46 -0800)]
Merge tag 'x86_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Fix the ptrace regset xfpregs_set() callback to behave according to
the ABI
- Handle poisoned pages properly in the SGX reclaimer code
* tag 'x86_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing
x86/sgx: Fix missing poison handling in reclaimer
Linus Torvalds [Sun, 20 Feb 2022 20:40:20 +0000 (12:40 -0800)]
Merge tag 'sched_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
"Fix task exposure order when forking tasks"
* tag 'sched_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix yet more sched_fork() races
Linus Torvalds [Sun, 20 Feb 2022 20:04:14 +0000 (12:04 -0800)]
Merge tag 'edac_urgent_for_v5.17_rc5' of git://git./linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
"Fix a long-standing struct alignment bug in the EDAC struct allocation
code"
* tag 'edac_urgent_for_v5.17_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC: Fix calculation of returned address and next offset in edac_align_ptr()